4.1 Simple Linear Regression


{b, bse, bstan, bpval} = 9192 linreg (x, y {,opt {,om}})
estimates the coefficients b for a linear regression of y on x and calculates the ANOVA table
gl = 9195 grlinreg (x, {,col})
creates a plot object of the regression line estimated
b = 9198 gls (x, y {,om})
calculates the generalized least squares estimator from the data x and y

In this section we consider the linear model

$\displaystyle Y=\beta_0+\beta_1 X + \varepsilon\,.$

As an example, we use the Westwood data stored in westwood.dat . All XploRe codes for this example can be found in 9207 XLGregr1.xpl . First we read the data and take a look at them.
  z=read("westwood.dat")      ; reads the data
  z                           ; shows the data
  x=z[,2]                     ; puts the x-data into x
  y=z[,3]                     ; puts the y-data into y
gives as output
  Contents of z
  [ 1,]        1       30       73 
  [ 2,]        2       20       50 
  [ 3,]        3       60      128 
  [ 4,]        4       80      170 
  [ 5,]        5       40       87 
  [ 6,]        6       50      108 
  [ 7,]        7       60      135 
  [ 8,]        8       30       69 
  [ 9,]        9       70      148 
  [10,]       10       60      132
We use the quantlet 9210 linreg for simple linear regression. Since this quantlet has four values as output, we should put them into four variables. We store them in {beta, bse, bstan, bpval}. Their meaning will be explained below.
  {beta,bse,bstan,bpval}=linreg(x,y)
           ; computes the linear regression and returns the 
                variables of beta, bse, bstan and bpval
  beta     ; shows the value of beta
gives the output
  A  N  O  V  A            SS    df     MSS     F-test   P-value
  ______________________________________________________________
  Regression          13600.000   1 13600.000  1813.333   0.0000
  Residuals              60.000   8     7.500 
  Total Variation     13660.000   9  1517.778 
                                                     
  Multiple R      = 0.99780                          
  R^2             = 0.99561                          
  Adjusted R^2    = 0.99506                          
  Standard Error  = 2.73861                          
                                                     
                                                     
  PARAMETERS        Beta      SE     StandB     t-test   P-value
  ______________________________________________________________
  b[ 0,]=        10.0000    2.5029   0.0000      3.995   0.0040
  b[ 1,]=         2.0000    0.0470   0.9978     42.583   0.0000
and
  Contents of beta
  [1,]       10 
  [2,]        2
As a result, we have the ANOVA (ANalysis Of VAriance) table and the parameters. The estimates $ (\widehat\beta_0,
\widehat\beta_1)$ of the parameters $ (\beta_0,\beta_1)$ are stored in beta[1] and beta[2] and in this example we obtain

$\displaystyle \widehat Y(x)=10+2x\,.$

It is not necessary to display the values of beta, bse, bstan, bpval separately because they are already written as Beta, SE, StandB and P-value in the parameter table created by 9213 linreg . Before considering the graphics we give a short overview of the values returned by the ANOVA and parameter tables. The meaning of the $ t$- and $ p$-values is more apparent in the multiple case. That's why they are explained in the next section.

Let us now describe how to visualize these results. In the left window we show the regression result computed by 9216 linreg . In the right window we use the quantlet 9219 grlinreg to get the graphical object directly from the data set.

  yq=(beta[1]+beta[2]*x[1:10]) ; creates a vector with the 
                               ;    estimated values of y
  data=sort(x~y)               ; creates object with the data set
  setmaskp(data,1,11,4)        ; creates a graphical object for 
                               ;    the data points
  rdata=sort(x~yq)             ; creates an object with yq
  rdata=setmask(rdata,"reset","line","red","thin")
                               ; sets the options for the 
                               ;    regression function by linreg
  regrdata=grlinreg(data,4)    ; creates the same graphical 
                               ;    object directly from the data 
  regrdata=setmask(regrdata,"reset","line","red","thin")
                               ; sets options for the regression 
                               ;    function by grlinreg
  linregplot=createdisplay(1,2); creates display with 2 windows
  show(linregplot,1,1,data,rdata)
                               ; shows rdata in the 1st window
  show(linregplot,1,2,data,regrdata)
                               ; shows regrdata in the 2nd window
  setgopt(linregplot,1,1,"title","linreg")
                               ; sets the title of the 1st window
  setgopt(linregplot,1,2,"title","grlinreg")
                               ; sets the title of the 2nd window

Figure: Linear regression of the Westwood data: Plot using the regression function computed by 9223 linreg (left) and plot using 9226 grlinreg (right). 9229 XLGregr1.xpl
\includegraphics[scale=0.425]{linreg1}

This will produce the results visible in Figure 4.1. We create a plot of the regression function by 9232 grlinreg if we are only interested in a graphical exploration of the regression line.

A second tool for our simple linear regression problem is the generalized least squares (GLS) method given by the quantlet 9235 gls . Here we only consider a model

$\displaystyle Y=bX+\varepsilon\,.$

We take the Westwood data again and assume that it has already been stored in x and y using the unit matrix as weight matrix. This example is stored in 9238 XLGregr2.xpl .

  b=gls(x,y)        ; computes the GLS fit and stores the
                    ;     coefficients in the variable b
  b                 ; shows b
shows
  Contents of b
  [1,]   2.1761
As a result, we get the parameter $ b$. In our case we find that

$\displaystyle \widehat Y(x)=2.1761\,x\;.$

Figure: Linear regression of the Westwood data: Regression line using 9242 gls . 9245 XLGregr2.xpl
\includegraphics[scale=0.425]{linreg2}

Note that we have got different results depending on the choice of the method. This is not surprising, as 9248 gls ignores the absolute value $ \beta_0$. Now we also want to visualize this result.

  yq=b*x[1:10]                 ; creates a vector with the 
                               ;    estimated values
  data=sort(x~y)               ; creates object with the data set
  setmaskp(data,1,11,8)        ; creates graphical object 
                               ;    for the data
  rdata=sort(x~yq)             ; creates object with yq
  rdata=setmask(rdata,"reset","line","red","medium")
                               ; creates graphical object for yq
  glsplot=createdisplay(1,1)   ; creates display
  show(glsplot,1,1,data,rdata) ; shows the graphical objects
  setgopt(glsplot,1,1,"title","gls")
                               ; sets the window title