To assess the estimated model it is useful to check the significance of single parameter values or of linear combinations of parameters. To compare two nested models a likelihood ratio test can be performed. Last but not least, an optimal submodel can be selected by model selection via Akaike's AIC or Schwarz' BIC.
|
The functions
doglm
and
glmest
provide a number of statistical characteristics of the estimated model
in the output component stat.
Alternatively, the function
glmstat
can be used to create the above-mentioned statistics by hand. Suppose we
have input x, y and have estimated the vector of coefficients
b with covariance bv by the model "nopow". Then the list of
statistics can be found from
stat=glmstat("nopow",x,y,b,bv)Of course, a list of options opt can be added. If options from opt have been used for the estimation, these should be included in
The following characteristics are contained in the output stat. This itself is a list and covers the components
names(stat)The function names will report all components of the list stat.
|
Recall the example
XLGglm03.xpl
, which estimated the
lizard
data.
The last line of this quantlet creates the output display
shown in Figure 7.1:
glmout("bilo",x,y,g.b,g.bv,g.stat,opt)
Note that the option list opt should also be given to
glmout
to adjust the resulting estimated curve by
the weights. In the binomial case (as in our lizard example),
the right panel shows the predicted probabilities.
For all other distributions, the estimated regression function
(the index
vs.
) will be shown.
Let us continue with the lizard example from Subsection 7.5.2. As a result of the estimation, we have an output list g containing components g.b (the estimated parameter vector), g.bv (the estimated covariance of g.b), and g.stat (contains the statistics).
The significance of coefficients can be measured by a -test.
To obtain
-values and
-values, simply calculate:
tvalue=g.b/sqrt(xdiag(g.bv)) pvalue=2.*cdfn(-abs(tvalue))
Content of object pvalue [1,] 1.2155e-06 [2,] 1.0972e-05 [3,] 0.0003059 [4,] 0.0085538 [5,] 0.3639 [6,] 0.013712which means that all except the 5th coefficient are significant (at the 5% level).
For linear hypotheses on the parameters, a Wald test can be used. Suppose
we want to test if
. This can be written
as
with
. Hence, define
A=2 ~ (-1) ~ 0.*matrix(rows(g.b)-2)' a=0 W=(A*g.b-a)'*(A*g.bv*A')*(A*g.b-a) pvalue=1-cdfc(W,1)
Content of object pvalue [1,] 0.033318i.e. the relation
|
Suppose now we have two (nested) models estimated and obtained two estimation results c (the smaller model) and g (the larger model). To compare both models, one needs to calculate the likelihood ratio test statistic.
In some cases, the distribution of this test statistic can be
derived exactly.
Otherwise, the (negative) doubled logarithm of the likelihood ratio
can be computed which has an asymptotic distribution. In this
case, the test statistic lr and the
-value pvalue
can be obtained from
glmlrtest
. Recall the lizard example,
where we estimated the full model g and the constrained model
c. Now we determine if the difference between both models is
significant. Computing
lc=c.stat.loglik lg=g.stat.loglik pc=rows(c.b) pg=rows(g.b) {lr,pvalue}=glmlrtest(lc,pc,lg,pg)
Contents of pvalue [1,] 0.37944i.e. there is no statistically significant difference between the two models.
|
In the following we generate a
matrix x
and a response y which only depends on the first two columns of
x:
randomize(0) n=200 b=1|2|0|0 p=rows(b) x=normal(n,p) y=x*b+normal(n)
x=matrix(n)~x opt=glmopt("shm",1,"fix",1|2) g=glmselect("noid",x,y,opt)
Hence, g.best will display the five best models in our example. The contents of g.best reads columnwise:
Content of object g.best [1,] 1 1 1 1 1 [2,] 2 2 2 2 2 [3,] 3 3 3 3 0 [4,] 0 0 4 4 4 [5,] 0 5 0 5 0Those components which are not in a submodel are indicated by the value 0. Hence the model selection procedure found indeed that the last two columns of x do not explain y.
The functions
glmforward
and
glmbackward
have the same functionality as
glmselect
,
except that they do a forward and backward search, respectively.