5.7 Linear Regression


11114 twlinreg ()
illustrates the concept of linear regression

This quantlet illustrates the concept of linear regression by setting up a scatter plot and a line, and allowing the user to change the slope or intercept of the line to try to minimize the residual sum of squares. To activate this, the following should be typed in:

  twlinreg()
After this, the user should see the following windows:

11118

These are specifications for the scatter-plot diagram, the same as with the 11120 twpearson quantlet. After the specifications are entered and the OK button is clicked, the following window should appear (this is for the default values of 30 data points, 0 correlation):


11124

11127

The upper frame of the Display window contains the scatter-plot diagram, as well as a line for which we would like to minimize the residual sum of squares. The middle frame contains the equation of this line (yhat = $ \widehat{y}$), as well as the residual sum of squares ( $ \textrm{RSS}$). The lower frame contains a graph of the residuals. That is, each vertical line in the bottom graph represents the distance between one of the points and the line in the above graph. The $ \textrm{RSS}$ is the sum of the squares of these distances. The object is to find the line which gives the minimum of these sums, which can be done by changing the slope and/or intercept of the given line.

The first four selections in the Choose window are for changing the given line -- the user simply makes the appropriate choice(s), then clicks on OK to see the result. Additionally, the user can request for the following to be shown on the scatter plot:

A user who is learning linear regression for the first time can change the slope and/or intercept of the given line and see if this decreases the $ \textrm{RSS}$. The lower frame of the display gives a visual demonstration of the distances between the points and the line. Showing the regression of $ x$ on $ y$ can verify if the user has the minimum $ \textrm{RSS}$, and show how far off he/she is. Showing the regression of $ x$ on $ y$, and the total regression, shows how different the result can be if the distance is measured in two alternative ways.

The following formulas are used for the computations of $ \widehat{y}$ and $ \textrm{RSS}$, which are taken from the Gauss-Markov theorem:

$\displaystyle \widehat{y_i}$ $\displaystyle =$ $\displaystyle \widehat{\alpha} + \widehat{\beta} x_i\,,$  
       
    $\displaystyle \widehat{\beta} = \frac{\sum\limits_{i=1}^n (x_i - \bar{x})
(y_i - \bar{y})}{\sum\limits_{i=1}^n (x_i - \bar{x})^2}\,,$  
    $\displaystyle \widehat{\alpha} = \bar{y} - \widehat{\beta} \bar{x}, \quad
\bar{y} = \frac{1}{n} \sum_{i=1}^n y_i, \ \bar{x}
= \frac{1}{n}\sum_{i=1}^n x_i\,,$  
       
$\displaystyle \textrm{RSS}$ $\displaystyle =$ $\displaystyle \frac{\sum\limits_{i=1}^n (\widehat{y_i} - \bar{y})^2}
{\sum\limits_{i=1}^n (y_i - \bar{y})^2}\,.$