Currently six types of distributions are supported by the glm library: Binomial, Normal (Gaussian), Poisson, Gamma (includes Exponential), Inverse Gaussian and Negative Binomial (includes Geometric).
The functions in the
glm
library
which are mainly responsible for GLM estimation are
doglm
(interactive, menu controlled) and
glmest
(noninteractive).
We will explain
doglm
in Subsection 7.2.2 and
glmest
in Subsection 7.2.3.
The estimation functions in the glm library expect at least two input parameters: A matrix x containing the observations of the explanatory variables and a vector y containing the observed responses.
The vector y should have n rows, each corresponding to one observation. The matrix x should have n rows and p columns, i.e. the rows correspond to the individual observations and the columns to the variables.
A nx1 vector of 1 can be concatenated to x to allow for a constant in the model:
x = matrix(rows(x))~xThis is not necessary for the interactive estimation by
Neither the matrix x nor the vector y should contain
missing values (NaN) or infinitesimal values (Inf,-Inf).
Those should be identified by
isNumber
and removed by
paf
(or replaced by something) before the GLM estimation.
|
To have a ``real'' example let us generate some pseudo random data. Type at the command line or in an editor window
randomize(0) n=100 b=1|2 p=rows(b) x=2.*uniform(n,p) y=x*b+normal(n)./2
library("glm")
The interactive estimation with
doglm
is simply invoked by
doglm(x,y)A selection box appears which starts the interactive fitting procedure.
The first three items (Descriptive statistics, Select variables and Transform variables) offer some possibilities to select and manipulate the matrix x. We choose directly to start the fit (select Do GLM fit). The next selection box presents the available distributions for y.
Simply select the appropriate distribution. Depending on the choice of the distribution a second selection box will ask for the link function. The following will be shown if Normal was selected for the distribution.
Here it is possible to press just OK to accept the canonical link function. Codes such as noid and nopow in the above selection box point to the short codes for the models, see Subsection 7.2.3.
Again, depending on the choice, a third section box may appear which offers to change control parameters for the selected model. Suppose we have pressed the button for Power link. Then we will be asked if we want to change the power parameter:
Note that the power corresponds to the inverse link function. Thus when
power 0 is chosen, is in fact the exponential function. Power
0.5 makes G the quadratic function, consequently. If we select power 1,
we get the identity link.
We choose a power of 1 at this point. (We could have chosen this one selection box earlier by selecting the identity link already!) The underlying estimation XploRe function realizes that for the ordinary least squares regression no iteration needs to be performed and shows the estimation result quite immediately. A graphical output display appears.
The output display shows in the left panel the estimation results. The
chosen model (here noid which is normal with identity
link) is recalled in the headline together with the sample size.
The upper table gives the estimated coefficient vector b together
with the estimated standard errors and -values.
doglm
includes automatically a constant in the model, hence the first component
of b relates to this constant.
The lower table gives some statistics for this fit, such as the degrees
of freedom (df), the deviance, Pearson's statistic, the coefficient
of determination
, an Akaike criterion, and the number
of distinct rows in x.
The right panel shows a plot of the index x*b vs. y
(
red +
) and a plot of x*b
vs. the predicted regression function (
green line
).
The latter is just a straight line here since the identity link function
was chosen.
Besides the output display doglmOutput, two additional output
objects are produced (after stopping
doglm
):
opt=glmopt("code","noid") doglm(x,y,opt)we would specify the model for
Alternatively we can set all optional parameters interactively from
the main menu of
doglm
:
The default settings are grouped into three fields. With selecting General settings, one can change general properties (intercept, search for replications, name for output to store several estimated models etc.). Model settings allow us to specify the model for the estimation and possible additional model parameters (e.g. power for power link). Finally Iteration settings covers everything for tuning the iterative estimation process. Note: Some of the settings are not applicable for some models (e.g. a power can be specified, although a model without power link can be chosen). Those settings are simply ignored.
|
The function
glmest
provides a noninteractive way to estimate a GLM. This is useful
for using GLM estimates as pilot estimates for other procedures.
The standard call is quite simple, for example
g=glmest("noid",x,y)estimates the normal model with identity link. For
The list of all available models can be consulted by typing
glmmodels.all at the command line when the
glm
library is loaded.
Let us go back to the example from the previous section and
replace
doglm
by the noninteractive function
glmest
. The
XploRe
codes for this subsection
can be found in the quantlet
XLGglm02.xpl
:
g=glmest("noid",matrix(rows(x))~x,y)Here, a vector of 1 is appended to x to include a constant in the model. The result of the estimation is assigned to the variable g. Thus g is a list containing the output:
Contents of b [1,] -0.16816 [2,] 1.0705 [3,] 2.1401
A graphical output can be created from the result of a noninteractive GLM estimation as well. This is done by
glmout("noid",matrix(rows(x))~x,y,g.b,g.bv,g.stat)in the current example. For more features of
Optional parameters must be given to
glmest
in a list of optional parameters. A detailed description of what
is possible can be found in Section 7.4. For an
illustration, recall our first estimation attempt with
doglm
.
There we had chosen interactively the power parameter for the power
link. In noninteractive estimation, this can be done by
opt=glmopt("pow",1) g=glmest("nopow",matrix(rows(x))~x,y,opt)The first statement creates a list opt containing the parameter opt.pow with value 1. The next line passes this option list opt to the estimation routine