7.2 Computing GLM Estimates

Currently six types of distributions are supported by the glm library: Binomial, Normal (Gaussian), Poisson, Gamma (includes Exponential), Inverse Gaussian and Negative Binomial (includes Geometric).

The functions in the glm library which are mainly responsible for GLM estimation are 14984 doglm (interactive, menu controlled) and 14987 glmest (noninteractive). We will explain 14990 doglm in Subsection 7.2.2 and 14993 glmest in Subsection 7.2.3.


7.2.1 Data Preparation

The estimation functions in the glm library expect at least two input parameters: A matrix x containing the observations of the explanatory variables and a vector y containing the observed responses.

The vector y should have n rows, each corresponding to one observation. The matrix x should have n rows and p columns, i.e. the rows correspond to the individual observations and the columns to the variables.

A nx1 vector of 1 can be concatenated to x to allow for a constant in the model:

  x = matrix(rows(x))~x
This is not necessary for the interactive estimation by 15038 doglm (see Subsection 7.2.2).

Neither the matrix x nor the vector y should contain missing values (NaN) or infinitesimal values (Inf,-Inf). Those should be identified by 15041 isNumber and removed by 15044 paf (or replaced by something) before the GLM estimation.


7.2.2 Interactive Estimation


15229 doglm (x, y {, opt})
starts the interactive GLM estimation tool

To have a ``real'' example let us generate some pseudo random data. Type at the command line or in an editor window

  randomize(0)
  n=100
  b=1|2
  p=rows(b)
  x=2.*uniform(n,p)
  y=x*b+normal(n)./2
15233 XLGglm01.xpl

Do not forget to call the glm library:
  library("glm")

The interactive estimation with 15240 doglm is simply invoked by

  doglm(x,y)
A selection box appears which starts the interactive fitting procedure.

15244

The first three items (Descriptive statistics, Select variables and Transform variables) offer some possibilities to select and manipulate the matrix x. We choose directly to start the fit (select Do GLM fit). The next selection box presents the available distributions for y.


15247

Simply select the appropriate distribution. Depending on the choice of the distribution a second selection box will ask for the link function. The following will be shown if Normal was selected for the distribution.


15250

Here it is possible to press just OK to accept the canonical link function. Codes such as noid and nopow in the above selection box point to the short codes for the models, see Subsection 7.2.3.

Again, depending on the choice, a third section box may appear which offers to change control parameters for the selected model. Suppose we have pressed the button for Power link. Then we will be asked if we want to change the power parameter:


15253

Note that the power corresponds to the inverse link function. Thus when power 0 is chosen, $ G$ is in fact the exponential function. Power 0.5 makes G the quadratic function, consequently. If we select power 1, we get the identity link.

We choose a power of 1 at this point. (We could have chosen this one selection box earlier by selecting the identity link already!) The underlying estimation XploRe function realizes that for the ordinary least squares regression no iteration needs to be performed and shows the estimation result quite immediately. A graphical output display appears.


15260

The output display shows in the left panel the estimation results. The chosen model (here noid which is normal with identity link) is recalled in the headline together with the sample size. The upper table gives the estimated coefficient vector b together with the estimated standard errors and $ t$-values. 15262 doglm includes automatically a constant in the model, hence the first component of b relates to this constant.

The lower table gives some statistics for this fit, such as the degrees of freedom (df), the deviance, Pearson's $ \chi^2$ statistic, the coefficient of determination $ R^2$, an Akaike criterion, and the number of distinct rows in x. The right panel shows a plot of the index x*b vs. y ( red + ) and a plot of x*b vs. the predicted regression function ( green line ). The latter is just a straight line here since the identity link function was chosen.

Besides the output display doglmOutput, two additional output objects are produced (after stopping 15274 doglm ):

doglm
A list, which contains the list elements
doglm.b
the parameter estimates,
doglm.bv
the estimated covariance,
doglm.stat
statistics (itself a list containing the different statistics as components, see Subsection 7.5.1).
doglmtxt
A string vector, which contains the contents of the left panel of the output display.
The estimation in 15277 doglm can be controlled by a number of options and parameters. All of them can be set beforehand by defining a list of options opt with 15280 glmopt . For example, with
  opt=glmopt("code","noid")
  doglm(x,y,opt)
we would specify the model for 15283 doglm and are not asked interactively anymore. The 15286 glmopt and how to set up optional parameters is shown in Section 7.4.

Alternatively we can set all optional parameters interactively from the main menu of 15289 doglm :


15293

The default settings are grouped into three fields. With selecting General settings, one can change general properties (intercept, search for replications, name for output to store several estimated models etc.). Model settings allow us to specify the model for the estimation and possible additional model parameters (e.g. power for power link). Finally Iteration settings covers everything for tuning the iterative estimation process. Note: Some of the settings are not applicable for some models (e.g. a power can be specified, although a model without power link can be chosen). Those settings are simply ignored.


7.2.3 Noninteractive Estimation


g = 15521 glmest (code, x, y {, opt})
estimates a GLM noninteractively

The function 15524 glmest provides a noninteractive way to estimate a GLM. This is useful for using GLM estimates as pilot estimates for other procedures. The standard call is quite simple, for example

  g=glmest("noid",x,y)
estimates the normal model with identity link. For 15527 glmest the short code of the model (e.g. "noid") needs always to be given, it is not an optional parameter as for 15530 doglm . The following models are supported:
Binomial
"bilo" Logistic link (Logit, canonical)
"bipro" Gaussian link (Probit)
"bicll" complementary log-log link
Poisson
"polog" Poisson with logarithm (inverse) link (canonical)
"popow" Poisson with power (inverse) link
Gamma
"gacl" reciprocal (inverse) link (canonical)
"gapow" power (inverse) link
Inverse Gaussian
"igcl" squared reciprocal (inverse) link (canonical)
"igpow" power (inverse) link
Negative Binomial
"nbcl" canonical link
"nbpow" power (inverse) link

The list of all available models can be consulted by typing glmmodels.all at the command line when the glm library is loaded. Let us go back to the example from the previous section and replace 15535 doglm by the noninteractive function 15538 glmest . The XploRe codes for this subsection can be found in the quantlet 15545 XLGglm02.xpl :

  g=glmest("noid",matrix(rows(x))~x,y)
Here, a vector of 1 is appended to x to include a constant in the model. The result of the estimation is assigned to the variable g. Thus g is a list containing the output:
g.b
the estimated parameter vector
g.bv
the estimated covariance of g.b
g.stat
contains the statistics (see Subsection 7.5.1).
In our running example g.b presents in the XploRe output window:
  Contents of b
  [1,] -0.16816 
  [2,]   1.0705 
  [3,]   2.1401

A graphical output can be created from the result of a noninteractive GLM estimation as well. This is done by

  glmout("noid",matrix(rows(x))~x,y,g.b,g.bv,g.stat)
in the current example. For more features of 15552 glmout , see Subsection 7.5.2.

Optional parameters must be given to 15555 glmest in a list of optional parameters. A detailed description of what is possible can be found in Section 7.4. For an illustration, recall our first estimation attempt with 15558 doglm . There we had chosen interactively the power parameter for the power link. In noninteractive estimation, this can be done by

  opt=glmopt("pow",1)
  g=glmest("nopow",matrix(rows(x))~x,y,opt)
The first statement creates a list opt containing the parameter opt.pow with value 1. The next line passes this option list opt to the estimation routine 15561 glmest .