
Enrichment  The ChiSquare Test
Astronomers commonly combine the (chisquare) test with epoch folding in order to investigate possible periodic behavior in the light curve of a source. You can understand why this procedure is required by examining
the light curve below. Ask yourself, "How could we possibly even take a guess at the length of the period?" Three hundred thousand points! It won't fit in your calculator and who wants to do such a calculation by hand? It is time to bring in a mathem
atical procedure and a computer.
Part I  The Method
Our procedure will be to perform an epoch fold and then run a test. So we need to understand what is and how it can be used... Welcome to the Test!
Whenever you try to fit data by an equation, you ultimately need to know if the equation you have chosen is a "good fit" to your data. In order to do this, you need some measure of the "goodness of fit" which will allow you to determine quantitatively
if the fit is acceptable or not. In general, this measure is built upon the idea that a "good fit" of an equation to data results in the minimization of the weighted sum of square of deviations between the fitted value and the measured value. is a convenient measure of the goodness of fit of an equation (any equation) to a set of measured data. To be specific, it is the weighted sum of the squared differences between the measured and calculated values.
The variance of the fit is defined by the statistic , which can be written
where is the variance or the square of the calculated error of each point, y_{i} is the measured value of y at a given point, y(x_{i}) is the fitted value of y at
that given point.
If the fitted values of y(x_{i}) are good approximations of the measure values y_{i}, then the value of is low and a good fit can be claimed. If, however, the value of is high, the fit is not good. (We will discuss what is meant by "low" and "high" a little later.)
Note, however, that astronomers use in a very different way than statisticians. Statisticians look for a small value, where astronomers look for a large . Why is this? We can not have any idea of what the period is, what the shape of the pulse is, or any other information about the modulation before we run the test looking for the periodic behavior. What we do know is that if the peri
od we are testing is incorrect, the result of our epoch folding is something close to a flat line. So, we use a flat line (representing the average value across all the measured data) as our fitting equation. Then, if is small, we know that the data are well represented by a flat line and no modulation exists at that period. Only if is large is there some possibility that a periodic modulation at the tested period exists in
the data.
Understanding what is, let us now think again about our procedure  In order to determine if periodic behavior occurs within a set of data, and the period at which it occurs, we run the entire data set through epoch
folding at a given period and then calculate the for the resulting fold. This procedure is repeated for all the different periods we want to test.
A graph can then be generated showing the calculated for each tested period. Astronomers are then interested in looking at the data folded on the periods with large values.
Examine the graph below. What period do you think an astronomer would want to pursue?
There is a clear peak in at 41 days (the other smaller peaks at fractional values of 41 days are called aliases, which we won't go into here). Now fold the data back on that given period of 41 days and see what the
average light curve looks like. Is it some anomalous data point that is leading the test awry? No. The result is a clear, smooth periodic behavior at a period of 41 days. We have, in fact, found the orbital period
of this binary system. It takes 41 days for the neutron star to revolve once around its supergiant main sequence companion.
Lastly, note that if we in fact found the correct period, the source would exhibit it at all times. Thus, if we did not use our entire data set, but looked at small subsets of it, we should still find the modulation at 41 days in each subset. We did thi
s procedure to the GX3012 data and it clearly shows the same sort of behavior in each of the 10 oneyear subsets of data we examined.
Calculating
It is not so difficult to create a computer code which calculates the reduced for a fit to any given set of data. AP Physics or PreCalculus students taking computer programming classes may consider doing so.
Tell me about creating computer code for !
Part II  NOW YOU be the scientist...
Once you have your computer program ready to calculate
, can you determine if there is any
periodic behavior in the data set below? If so, what is the period? If not,
what are the constraints your analysis allows you to put on the lack of
periodic behavior, e.g. what range of periods could you test?
To get a copy of the DATA SET you need, you must visit our Web site at
http://imagine.gsfc.nasa.gov/docs/teachers/lessons/time/enrichment_data.html or email us.
