17.1 Quantlets


38980 proc (y) = pname (x) - 38983 endp
defines a procedure with the input x and output y
38986 func ("file")
loads the quantlet file.xpl from the standard quantlet path

Quantlets are constructed via the XploRe editor. One opens the editor via the New item from the Programs menu. A blank window appears with the name noname.xpl.

Let us construct an example which lets us study the effect of outliers to linear least squares regression. First we generate a regression line of n points by the commands

  n = 10                       ; number of observations
  randomize(17654321)          ; sets random seed
  beta = # (1, 2)              ; defines intercept and slope
  x = matrix(n)~sort(uniform(n))   
                               ; creates design matrix     
  m = x*beta                   ; defines regression line
38994 XLGquant01.xpl

The vector m now contains the regression line $ m(x) = 1 + 2x$ at randomly selected points x. The vector $ \beta$ of regression parameters is given by beta=#(1,2). The points of x are uniformly distributed over the interval $ [0,1]$ by the command uniform(n). Note that we sorted the data with respect to the x component, which is suitable for plotting purposes.

It is now about time to save this file into a user owned file. Let us call this file myquant.xpl We save the file by the command $ <$Ctrl A$ >$ or by selecting the option Save as under Programs. We enter the text myquant (the ending ``.xpl'' is added automatically). The editor window now changes its name. Typically, it is called C:\XploRe\myquant.xpl. We execute the quantlet by clicking the Execute item or by entering $ <$Alt E$ >$. Entering m on the action line in the input window yields the following $ 10$ points:

  Contents of m
  [ 1,]   1.0858 
  [ 2,]   1.693 
  [ 3,]   1.8313 
  [ 4,]   2.1987 
  [ 5,]   2.2617 
  [ 6,]   2.3049 
  [ 7,]   2.3392 
  [ 8,]   2.6054 
  [ 9,]   2.6278 
  [10,]   2.8314

The vector m contains the values of the regression line $ 1+2x$. Let's now add some noise to the regression line and produce a plot of the data. We add the following lines and obtain a picture of the 10 data points that are scattered around the line $ m(x) = 1 + 2x$. We extracted here the second column of the design matrix x, since the first column of x is a column of ones that models the constant intercept term of the regression line:

  eps = 0.05*normal(n)       ; create obs error
  y = m + eps                ; noisy line
  d = createdisplay(1,1)      
  dat = x[,2]~y
39000 XLGquant02.xpl

If we enter now on the action line the command
  show(d, 1, 1, dat)
we obtain the following plot:

\includegraphics[scale=0.425]{obrazek1}

Let's now add the true regression line and the least squares estimated regression line to this plot. We use the command 39006 setmaskl to define the line mask and the command 39009 setmaskp to define the point mask. Using the command

  tdat = x[,2]~m
we define the matrix tdat containing the true regression line. The command
  setmaskl(tdat, (1:rows(tdat))', 1, 1, 1)
connects all points (2nd parameter) of tdat and defines a blue (3rd parameter: colorcode = 1) solid (4th parameter: type code = 1) line with a certain thickness (5th parameter: thickness code = 1). The command
  setmaskp(tdat, 0, 0, 0)
sets the data points to its minimum size 0, i.e. invisible.
  tdat = x[,2]~m
  setmaskl(tdat, (1:rows(tdat))', 1, 1, 1) 
                                  ; thin blue line
  setmaskp(tdat, 0, 0, 0)         ; reduces point size to min
  beta1 = inv(x'*x)*x'*y          ; computes LS estimate
  yhat = x*beta1
  hdat = x[,2]~yhat
  setmaskl(hdat, (1:rows(hdat))', 4, 1, 3) 
                                  ; thick red line
  setmaskp(hdat, 0, 0, 0)     
  show(d, 1, 1, dat, tdat, hdat)
39013 XLGquant03.xpl

The result is given in the following picture:

\includegraphics[scale=0.425]{obrazek2}

The true regression line is displayed as a thin blue line and the estimated regression line is shown as a thick red line. In order to create a quantlet we encapsulate these commands into a 39019 proc - 39022 endp bracket. This way we can call the quantlet from the action line once it is loaded into XploRe . We add the line

  proc() = myquant()
as the first line, indent all following commands by the Tab key or by the Format source command in the Tools menu and add as a last line the word
  endp

Altogether we should have the following in the editor window. We save this quantlet by $ <$Ctrl S$ >$ or by passing through the Programs menu item:

  proc() = myquant()
    n = 10                         ; number of observations
    randomize(17654321)            ; sets random seed
    beta =#(1, 2)                  ; defines intercept and slope
    x = matrix(n)~sort(uniform(n)) ; creates design matrix     
    m = x*beta                     ; defines regression line
    eps = 0.05*normal(n)           ; creates obs error
    y = m + eps                    ; noisy line
    d = createdisplay(1,1)     
    dat = x[,2]~y                              
    tdat = x[,2]~m
    setmaskl(tdat, (1:rows(tdat))', 1, 1, 1) 
                                   ; thin blue line
    setmaskp(tdat, 0, 0, 0)        ; reduces point size to min
    beta1 = inv(x'*x)*x'*y
    yhat = x*beta1
    hdat = x[,2]~yhat
    setmaskl(hdat, (1:rows(hdat))', 4, 1, 3) 
                                   ; thick red line
    setmaskp(hdat, 0, 0, 0)    
    show(d, 1, 1, dat, tdat, hdat)
  endp
39030 XLGquant04.xpl

If we execute this program code via the Execute item, nothing will happen since the code contains only the definition of a quantlet. The quantlet performs the desired action only if it is called. By entering the command

  myquant()
on the action line in the input window, we obtain the same picture as before. The quantlet myquant is now loaded in XploRe , and we can repeat this plot as many times as we want.

Let's now modify the quantlet so that the user of this quantlet may add another observation to the existing 10 observations. This additional observation will be the outlier whose influence on least squares regression we wish to study. We do this by allowing myquant to process an input parameter obs1 containing the coordinates $ (x, y)$ of an additional observation. We change the first line of myquant to proc() = myquant(obs1) and add the lines

  // new x-observation is
    x = x|(1~obs1[1])
after the creation of the original design matrix. The first line is a comment, and the second line adds the $ x$-coordinate of the new observation obs1 to the design matrix. Note that we also added a 1 to the first column of the design matrix in order to correctly reflect the intercept term in the least squares regression also for this 11th observation.

The second modification is given by the following two lines:

  // new y-observation
    y = m[1:n] + eps           ; noisy line
    y = y|obs1[2]
The first line is again a comment and the second line adds the normal errors eps to the first $ n$ values of the $ m$-observation. The third line adds the $ y$-value of the new observations obs1 to the response values.

We also display the outlying observation in a different way than the other observations.

  outl = obs1[1]~obs1[2]
  setmaskp(outl,4,12,8)
  show(d, 1, 1, dat[1:n], outl, tdat, hdat)
The second parameter in the 39039 setmaskp command defines black color, the third parameter defines a star and the fourth parameter sets the size of the star to 8. The 39042 show command displays the original data as black circles and the outlier as a star. We set a title of the graph with the command
  setgopt(d,1,1,"title","Least squares regression with outlier")
The command 39045 setgopt can be used to change also other attributes of the graph, e.g. limits, tickmarks, labels, etc.

If we save and execute now the quantlet, we have it ready in XploRe to be executed from the action line:

  proc() = myquant(obs1)
    n = 10                   ; number of observations
    randomize(17654321)      ; sets random seed
    beta =#(1, 2)            ; defines intercept and slope
    x = matrix(n)~sort(uniform(n))  
                             ; creates design matrix   
  // new x-observation is
    x = x|(1~obs1[1])
    m = x*beta               ; defines regression line
    eps = 0.05*normal(n)     ; creates obs error
  // new y-observation 
    y = m[1:n] + eps         ; noisy line
    y = y|obs1[2]
    d = createdisplay(1,1)     
    dat = x[,2]~y                              
    outl = obs1[1]~obs1[2]
    setmaskp(outl,0,12,8)    ; outlier is black star
    tdat = x[,2]~m
    setmaskl(tdat, (1:rows(tdat))', 1, 1, 1) 
                             ; thin blue line
    setmaskp(tdat, 0, 0, 0)  ; reduces point size to min
    beta1 = inv(x'*x)*x'*y
    yhat = x*beta1
    hdat = x[,2]~yhat
    setmaskp(hdat, 0, 0, 0)    
    setmaskl(hdat, (1:rows(hdat))', 4, 1, 3)
                             ; thick red line
    show(d, 1, 1, dat[1:n], outl, tdat, hdat)
    title="Least squares regression with outlier"
    setgopt(d,1,1,"title",title)
                             ; sets title 
  endp
39053 XLGquant05.xpl

By entering

  myquant(#(0.9,4.5))
from the action line, we obtain the following graphic which shows the effects of this outlier on the least squares regression:

\includegraphics[scale=0.425]{obrazek3}

One clearly sees the nonrobustness of the least squares estimator. The additional observation $ (0.9, 4.5)$ influences the estimated regression line. The thick red line is different from the true regression line indicated as the thin blue line.

The situation becomes even more extreme when we move the $ x$-observation of the new observation into the leverage zone outside the interval $ [0,1]$. Suppose that we call the quantlet with the new observation $ (2.3, 45)$. The $ x$-value of this new observation is clearly outside the range $ [0,1]$ of the first uniformly generated 10 design values. The $ y$-value 45 of the new observation is enormous relative to the range of the other 10 values.

  myquant(#(2.3,45))
The effect will be that the thick red line will be even more apart from the blue line. This becomes clear from the following graphic:

\includegraphics[scale=0.425]{obrazek4}

We may now leave XploRe and recall the quantlet when we restart XploRe . Suppose we have done this. How do we call our quantlet again? Assuming that XploRe is installed in the C:\XploRe directory of our computer, we use the command

  func("C:\XploRe\myquant.xpl")

Let us now do this loading of the quantlet automatically by defining a new quantlet myquant2.xpl. This quantlet contains the 39072 func command and a call to the quantlet myquant by e.g.

  myquant(#(0.9, 4.5))

If we encapsulate the code into a 39075 proc - 39078 endp bracket, we have the following quantlet

 
  proc()=myquant2()
    myquant(#(0.9, 4.5))
  endp        
  func("C:\XploRe\myquant.xpl")
  myquant2()
Executing it will reproduce the same picture as before, but note that this time the call to myquant is done from another quantlet. Let us modify this procedure further so that the user may add outlying observations interactively. Suppose we want to see the effect of adding a new observation three times. We realize this by a 39081 while - 39084 endo construction, which we run exactly three times:
  proc()=myquant2()
    ValueNames = "x=" | "y="
    defaults = 0.9 | 4.5
    i = 1
    while (i<=3)
      v = readvalue(ValueNames, defaults)
      myquant(v)
      i = i+1
    endo
  endp
  func("C:\XploRe\myquant.xpl")
  myquant2()
39088 XLGquant06.xpl

The new quantlet myquant2.xpl first loads the existing quantlet myquant.xpl by the command func("C:\XploRe\myquant.xpl"). The names of the values to be read are defined by ValueNames = "x=" | "y=", the default values are set to $ (0.9, 4.5)$ by defaults = 0.9 | 4.5. Then the loop construction with initial value i = 1 and end value 3 guarantees that the commands

  v = readvalue(ValueNames, defaults)
  myquant(v)
are executed exactly three times. If we enter for example $ (10,45)$ as the outlier value then we obtain the following graphic:

\includegraphics[scale=0.425]{obrazek5}

The 39094 while - 39097 endo construction is explained in more detail in Subsection 17.2.4. For more information on 39100 readvalue see Section 17.3.