Imagine the Universe!
Imagine Home  |   Teachers' Corner | HERA Overview  |  

How Good is the Model?

The goal of modeling data is not just to find a model that works, but to find the best possible model. In order to do this, scientists need some kind of measure of how good a model fits the data. On the What is a Model? page, you developed a simple model to describe how the time a student studies relates to his or her test scores in your physics class. The initial model was the equation:

y = (18.3) x + 12.5

where, y is the score (in percentage points) and x is the time spent studying (in hours).

But how good is that model? Scientists and mathematicians use "goodness-of-fit" tests to describe how well the model matches the data. There are several such tests. The goal of all such tests is to minimize a parameter that characterizes how far the data lies from the model.

We'll use a simple test called least-squares fit to illustrate the principle of goodness-of-fit. Below we briefly describe the test that the XSPEC software will use when you fit the low-mass X-ray binary data. The basic principles of getting the best fit between the model and data are similar.

Least-Squares Fit

The distance between the data points and the model is called the "residuals" of a the model. In the least-squares fit method, we want to minimize the square of the residuals. By minimizing the square, we ensure that we are minimizing a positive number, no matter whether the data lies above or below the model. The residuals are illustrated in the plot below as red lines:

Graph of data set and initial straight-line model with the residuals shown

Plot of the data set for test scores versus time studied with the initial linear model shown as a black line. The red lines represent the residuals. The residuals are the distance between the data points and the modeled line.

We won't go into the mathematical detail for determining the best fit by the least-squares fit method, but check out this webpage if you're interested: Least Squares Fitting from Wolfram MathWorld.

Using their equations, we find that the best-fit straight line for the data above is given by:

y = (15.1) x + 23.5

The data with this best-fit line and residuals are shown below.

Graph of data set and best-fit straight-line model 
	with the residuals shown

The data is the same as above, but the line is now the best-fit model as found using the least-squares fit method.

You can interpret the model to mean that if you don't study at all, you will probably get a grade of about 22% on the next exam. You would need to study about 4.25 hours to ensure that you get 90% on the exam.

Of course, the model would be better with more data points. For example, a straight line might not be the best model – if most students understand the material after studying for 4.5 hours, additional study time may not increase their grades much. This type of function would not be linear.

However, no matter the final shape of the best-fit model, the above example illustrates the basics of how the process works.

Chi-squared fit

The software that you will be using to model the low-mass X-ray binary uses a test called a Chi-squared test to determine the goodness-of-fit of the data and model. This test is more involved than the least-squares fit, but works on the same principle of minimizing a parameter characterizing how far the data lie from the model.

The Chi-squared test determines the goodness-of-fit by determining by the weighted sum of the squared differences between the measured and calculated values. The test is characterized by the statistic Χ², which can be written

Χ² = ∑{(1/σi²)[yi-y(xi)]²}

where:

  • σi² is the variance or the square of the calculated error of each point
  • yi is the measured value of y at a given point
  • y(xi) is the fitted value of y at that given point

If the fitted values of y(xi) are good approximations of the measure values yi, then the value of Χ² is low and a good fit can be claimed. If, however, the value of Χ² is high, the fit is not good.

When you try different models for the low-mass X-ray binary data, you will, therefore, try to minimize Χ² to find the best fit possible.

If words seem to be missing from the articles, please read this.

Imagine the Universe! is a service of the High Energy Astrophysics Science Archive Research Center (HEASARC), Dr. Alan Smale (Director), within the Astrophysics Science Division (ASD) at NASA's Goddard Space Flight Center.

The Imagine Team
Project Leader: Dr. Barbara Mattson
Curator: Meredith Gibb
Responsible NASA Official: Phil Newman
All material on this site has been created and updated between 1997-2014.
This page last updated: Wednesday, 31-Jul-2013 09:43:26 EDT