Next: References Up: 7. Generalized Linear Models Previous: 7.4 Practical Aspects

7.5 Complements and Extensions

For further reading on GLM we refer to the textbooks of [11], [27] and [19] (the latter with a special focus on STATA). [36, Chap. 7] and [15] present the topic of generalized linear models in a very compact form. [7], [2], [9], and [5] are standard references for analyzing categorical responses. We recommend the monographs of [13] and [25] for a detailed introduction to GLM with a focus on multivariate, longitudinal and spatial data. In the following sections we will shortly review some specific variants and enhancements of the GLM.

7.5.1 Weighted Regression

Prior weights can be incorporated to the generalized linear model by considering the exponential density in the form

$\displaystyle f(y_i,\theta_i,\psi) = \exp\left[\frac{w_i \{y\theta-b(\theta)\}}{a(\psi)} + c(y,\psi,w_i)\right].$

This requires to optimize the sample log-likelihood

$\displaystyle \ell(\boldsymbol{Y}\!,\boldsymbol{\mu},\psi) = \sum_{i=1}^n w_i \left\{\frac{Y_i\theta_i-b(\theta_i)}{a(\psi)} - c(Y_i,\psi,w_i)\right\}$

or its equivalent, the deviance.

The weights can be or in the simplest case that one wants to exclude specific observations from the estimation. The typical case of applying weights is the case of repeated independent realizations.

7.5.2 Overdispersion

Overdispersion may occur in one-parameter exponential families where the variance is supposed to be a function of the mean. This concerns in particular the binomial or Poisson families where we have $EY=\mu$ and ${\text{Var}}(Y)=\mu(1-\mu/k)$ or ${\text{Var}}(Y)=\mu$ , respectively. Overdispersion means that the actually observed variance from the data is larger than the variance imposed by the model. The source for this may be a lack of independence in the data or a misspecification of the model. One possible approach is to use alternative models that allows for a nuisance parameter in the variance, as an example think of the negative binomial instead of the Poisson distribution. For detailed discussions on overdispersion see [7] and [1].

7.5.3 Quasi- or Pseudo-Likelihood

Let us remark that in the case that the distribution of itself is unknown but its two first moments can be specified, the quasi-likelihood function may replace the log-likelihood function. This means we still assume that

$\displaystyle E(Y)$	$\displaystyle =$	$\displaystyle \mu,$
$\displaystyle {\text{Var}}(Y)$	$\displaystyle =$	$\displaystyle a(\psi)\,V(\mu).$

The quasi-likelihood function is defined through

$\displaystyle \ell(y,\theta,\psi) = \frac{1}{a(\psi)} \int\limits^y_{\mu(\theta)} \frac{(s-y)}{V(s)}\,\mathrm{d}s\,,$

(7.18)

see [29]. If

comes from an exponential family then the derivatives of the log-likelihood and quasi-likelihood function coincide. Thus, (7.18) establishes in fact a generalization of the likelihood approach.

7.5.4 Multinomial Responses

A multinomial model (or nominal logistic regression) is applied if the response for each observation is one out of more than two alternatives (categories). For identification one of the categories has to be chosen as reference category; without loss of generality we use here the first category. Denote by $\pi_j$ the probability $P(Y=j\vert\boldsymbol{X})$ , then we can consider the logits with respect to the first category, i.e.

$\displaystyle \textrm{logit}(\pi_j)=\log\left(\frac{\pi_j}{\pi_1}\right) =\boldsymbol{X}_j^\top\boldsymbol{\beta}_j.$

The terms $\boldsymbol{X}_j$ and $\boldsymbol{\beta}_j$ indicate that the explanatory variables and their corresponding coefficients may depend on category

. Equivalently we can define the model by

$\displaystyle P(Y=1\vert\boldsymbol{X})$	$\displaystyle =$	$\displaystyle \frac{1}{1+\sum_{k=2}^J \exp(\boldsymbol{X}_k^\top\boldsymbol{\beta}_k)}$
$\displaystyle P(Y=j\vert\boldsymbol{X})$	$\displaystyle =$	$\displaystyle \frac{\boldsymbol{X}_j^\top\boldsymbol{\beta}}{1+\sum_{k=2}^J \exp(\boldsymbol{X}_k^\top\boldsymbol{\beta}_k)}\,.$

It is easy to recognize that the logit model is a special case of the multinomial model for exactly two alternatives.

If the categories are ordered in some natural way then this additional information can be taken into account. A latent variable approach leads to the cumulative logit model or the ordered probit model. We refer here to [11, Sect. 8.4] and [18, Chap. 21] for ordinal logistic regression and ordered probit analysis, respectively.

7.5.5 Contingency Tables

The simplest form of a contingency table

Category			$\ldots$		$\sum$
Frequency			$\ldots$

with one factor and a predetermined sample size of observations is appropriately described by a multinomial distribution and can hence be fitted by the multinomial logit model introduced in Sect. 7.5.4. We could be for instance be interested in comparing the trivial model $EY_1=\ldots=EY_J=\mu$ to the model $EY_2=\mu_2,\ldots, EY_J=\mu_J$ (again we use the first category as reference). As before further explanatory variables can be included into the model.

Two-way or higher dimensional contingency tables involve a large variety of possible models. Let explain this with the help of the following two-way setup:

Category			$\ldots$		$\sum$
1	$Y_{11}$	$Y_{12}$	$\ldots$	$Y_{1J}$	$n_{1\bullet}$
2	$Y_{21}$	$Y_{22}$	$\ldots$	$Y_{2J}$	$n_{2\bullet}$
$\vdots$	$\vdots$	$\vdots$	$\ddots$	$\vdots$	$\vdots$
	$Y_{K1}$	$Y_{K2}$	$\ldots$	$Y_{KJ}$	$n_{K\bullet}$
$\sum$	$n_{\bullet 1}$	$n_{\bullet 2}$	$\ldots$	$n_{\bullet J}$

Here we assume to have two factors, one with realizations $1,\ldots,J$ , the other with realizations $1,\ldots,K$ . If the $Y_{jk}$ are independent Poisson variables with parameters $\mu_{jk}$ , then their sum is a Poisson variable with parameter $E(n)=\mu=\sum \mu_{jk}$ . The Poisson assumption implies that the number of observations is a random variable. Conditional on , the joint distribution of the $Y_{jk}$ is the multinomial distribution. Without additional explanatory variables, one is typically interested in estimating models of the type

$\displaystyle \log(EY_{jk}) = \beta_0 + \beta_j + \beta_k$

in order to compare this with the saturated model $\log(EY_{jk}) = \beta_0 + \beta_j + \beta_k + \beta_{jk}.$ If the former model holds then the two factors are independent. Another hypothetical model could be of the form $\log(EY_{jk}) = \beta_0 + \beta_j$ to check whether the second factor matters at all. As in the multinomial case, further explanatory variables can be included. This type of models is consequently termed log-linear model. For more details see for example [11, Chap. 9] and [27, Chap. 6].

7.5.6 Survival Analysis

Survival data are characterized by non-negative observations which typically have a skewed distribution. An additional complication arises due to the fact that the observation period may end before the individual fails such that censored data may occur. The exponential distribution with density $f(y,\theta)=\theta e^{-\theta y}$ is a very simple example for a survival distribution. In this special case the survivor function (the probability to survive beyond ) is given by $S(y)=e^{-\theta y}$ and the hazard function (the probability of death within and after survival up to ) equals $h(y,\theta)=\theta$ . Given additional explanatory variables this function is typically modeled by

$\displaystyle h(y,\theta) = \exp(\boldsymbol{X}^\top\boldsymbol{\beta}).$

Extensions of this model are given by using the Weibull distribution leading to non-constant hazards and Cox' proportional hazards model [8] which uses a semiparametric approach. More material on survival analysis can be found in Chap. III.12.

7.5.7 Clustered Data

Clustered data in relation to regression models mean that data from known groups (``clusters'') are observed. Often these are the result of repeated measurements on the same individuals at different time points. For example, imagine the analysis of the effect of a medical treatment on patients or the repeated surveying of households in socio-economic panel studies. Here, all observations on the same individual form a cluster. We speak of longitudinal or panel data in that case. The latter term is typically used in the econometric literature.

When using clustered data we have to take into account that observations from the same cluster are correlated. Using a model designed for independent data may lead to biased results or at least significantly reduce the efficiency of the estimates.

A simple individual model equation could be written as follows:

$\displaystyle E(Y_{ij}\vert\boldsymbol{X}_{ij})=G^{-1}(\boldsymbol{X}_{ij}^\top\boldsymbol{\beta}_{j}).$

Here

is used to denote the

th individual observation in the

th cluster. Of course more complex specifications, for example with hierarchical clusters, can be formulated as well.

There is a waste amount of literature which deals with many different possible model specifications. A comprehensive resource for linear and nonlinear mixed effect models (LME, NLME) for continuous responses is [30]. The term ``mixed'' here refers to the fact that these models include additional random and/or fixed effect components to allow for correlation within and heterogeneity between the clusters.

For generalized linear mixed models (GLMM), i.e. clustered observations with responses from GLM-type distribution, several approaches are possible. For repeated observations, [24] and [38] propose to use generalized estimating equations (GEE) which result in a quasi-likelihood estimator. They show that the correlation matrix of $\boldsymbol{Y}_j$ , the response observations from one cluster, can be replaced by a ``working correlation'' as long as the moments of $\boldsymbol{Y}_j$ are correctly specified. Useful working correlations depend on a small number of parameters. For longitudinal data an autoregressive working correlation can be used for example. For more details on GEE see also the monograph by [10]. In the econometric literature longitudinal or panel data are analyzed with a focus on continuous and binary responses. Standard references for econometric panel data analyses are [22] and [4]. Models for clustered data with complex hierarchical structure are often denoted as multilevel models. We refer to the monograph of [16] for an overview.

7.5.8 Semiparametric Generalized Linear Models

Nonparametric components can be incorporated into the GLM at different places. For example, it is possible to estimate a single index model

$\displaystyle E(Y\vert\boldsymbol{X}) = g(\boldsymbol{X}^\top \boldsymbol{\beta})$

which differs from the GLM by its unknown smooth link function $g(\bullet)$ . The parameter vector $\boldsymbol {\beta }$ in this model can then be only identified up to scale. The estimation of such models has been studied e.g. by [23], [37] and [14].

Local regression in combination with likelihood-based estimation is introduced in [26]. This concerns models of the form

$\displaystyle E(Y\vert\boldsymbol{X}) = G^{-1}\left\{m(\boldsymbol{X})\right\}\index{local regression},$

where

is an unknown smooth (possibly multidimensional) function. Further examples of semiparametric GLM are generalized additive and generalized partial linear models (GAM, GPLM). These models are able to handle (additional) nonparametric components in the function $\eta$ . For example, the GAM is specified in this simplest form by

$\displaystyle E(Y\vert\boldsymbol{X}) = G^{-1}\left\{\beta_0+\sum_{j=1}^p m_j(X_j)\right\}.$

Here the

denote univariate (or low dimensional) unknown smooth functions which have to be estimated. For their identification is should be assumed, that

. The generalized partial linear model combines a linear and a nonparametric function in the function $\eta$ and is specified as

$\displaystyle E(Y\vert\boldsymbol{X}) = G^{-1}\left\{\boldsymbol{X}_1^\top\boldsymbol{\beta}+m(\boldsymbol{X}_2) \right\}.$

**Figure 7.6:** Credit default on $\textrm {AGE}$ and $\textrm {AMOUNT}$ using a nonparametric function, *left*: surface and *right*: contours of the fitted function on $\textrm {AGE}$ and $\textrm {AMOUNT}$
$\includegraphics[width=117mm,clip]{text/3-7/abb/mm6}$

Example 13 (Semiparametric credit model)
We have fitted a generalized partial linear model as a variant of the final model from Example 12. The continuous variables $\mathrm{AGE}$ and $\mathrm{AMOUNT}$ were used as arguments for the nonparametric component. All other variables of the final model have been included to the linear part of the index function $\eta$ . Figure 7.6 shows the estimated nonparametric function of $\mathrm{AGE}$ and $\mathrm{AMOUNT}$ . Although the stepwise model selection in Example 12 indicated that there is no interaction between $\mathrm{AGE}$ and $\mathrm{AMOUNT}$ , we see now that this interaction could be in fact of some more sophisticated form. The estimation was performed using a generalization of the [34] estimator to generalized models. The local kernel weights are calculated from a Quartic (Biweight) kernel function using bandwidths approximately equal to $33.3\,{\%}$ of the ranges of $\mathrm{AGE}$ and $\mathrm{AMOUNT}$ , respectively. Details on the used kernel based estimation can be found in [33] and [28].

Some more material on semiparametric regression can be found in Chaps. III.5 and III.10 of this handbook. For a detailed introduction to semiparametric extensions of GLM we refer to the textbooks by [21], [20], [31], and [17].

Next: References Up: 7. Generalized Linear Models Previous: 7.4 Practical Aspects