next up previous contents index
Next: 7.3 Estimation Up: 7. Generalized Linear Models Previous: 7.1 Introduction

Subsections



7.2 Model Characteristics

The generalized linear model is determined by two components:

In order to define the GLM methodology as a specific class of nonlinear models (for a general approach to nonlinear regression see Chap. III.8), we assume that the distribution of $ Y$ is a member of the exponential family. The exponential family covers a large number of distributions, for example discrete distributions as the Bernoulli, binomial and Poisson which can handle binary and count data or continuous distributions as the normal, Gamma or Inverse Gaussian distribution.


7.2.1 Exponential Family

We say that a distribution is a member of the exponential family if its probability mass function (if $ Y$ discrete) or its density function (if $ Y$ continuous) has the following form:

$\displaystyle f(y,\theta,\psi) = \exp\left\{\frac{y\theta-b(\theta)}{a(\psi)} + c(y,\psi)\right\}.$ (7.1)

The functions $ a(\bullet)$, $ b(\bullet)$ and $ c(\bullet)$ will vary for different $ Y$ distributions. Our parameter of interest is $ \theta$, which is also called the canonical parameter ([27]). The additional parameter $ \psi $, that is only relevant for some of the distributions, is considered as a nuisance parameter.


Table 7.1: GLM distributions
  Range     Variance terms
  of $ y$ $ f(y)$ $ \mu(\theta)$ $ V(\mu)$ $ a(\psi)$
Bernoulli
$ B(\mu)$
$ \{0,1\}$
$ \mu^y(1-\mu)^{1-y}$ $ \frac{\displaystyle e^\theta}{\displaystyle 1+e^\theta}$ $ \mu(1-\mu)$ 1
Binomial
$ B(k,\mu)$
$ \{0,\ldots,k\}$
$ \left(\genfrac{}{}{0pt}{0}{k}{y}\right)
\mu^y(1-\mu)^{k-y}$ $ \frac{\displaystyle ke^\theta}{\displaystyle 1+e^\theta}$ $ \mu\left(1-\frac{\displaystyle \mu}{\displaystyle k}\right)$ 1
Poisson
$ P(\mu)$
$ \{0,1,2,\ldots\}$
$ \frac{\displaystyle \mu^y}{\displaystyle y!\displaystyle }e^{-\mu}$ $ \exp(\theta)$ $ \mu$ 1
Geometric
$ \mathit{Geo}(\mu)$
$ \{0,1,2,\ldots\}$
$ \left(\frac{\displaystyle\mu}{\displaystyle 1+\mu}\right)^y
\left(\frac{\displaystyle 1}{\displaystyle 1+\mu}\right)$ $ \frac{\displaystyle e^\theta{}}{\displaystyle 1-e^\theta} $ $ \mu + \mu^2$ 1
Negative Binomial
$ N{{}}B(\mu,k)$
$ \{0,1,2,\ldots\}$
$ \left(\genfrac{}{}{0pt}{0}{k+y-1}{y}\right)
\left(\frac{\displaystyle\mu}{\displaystyle k+\mu}\right)^y
\left(\frac{\displaystyle k}{\displaystyle k+\mu}\right)$ $ \frac{\displaystyle ke^\theta{}}{\displaystyle 1-e^\theta} $ $ \mu + \frac{\displaystyle \mu^2}{\displaystyle k}$ 1
Exponential
$ Exp(\mu)$
$ (0,\infty )$ $ \frac{\displaystyle 1}{\displaystyle\mu}
\exp\left(-\,\frac{\displaystyle x}{\displaystyle\mu}\right)$ $ -\,1/\theta$ $ \mu^2$ $ 1$
Gamma
$ G(\mu,\psi)$
$ (0,\infty )$ $ \frac{\displaystyle 1}{\displaystyle \mathit{\Gamma}(\psi)}
\left(\frac{\displ...
...si
\exp\left(-\,\frac{\displaystyle\psi y}{\displaystyle \mu}\right)
y^{\psi-1}$ $ -\,1/\theta$ $ \mu^2$ $ \frac{\displaystyle 1}{\displaystyle\psi}$
Normal
$ N(\mu,\psi^2)$
$ (-\infty,\infty)$ $ \frac{\displaystyle\exp\left\{-(y-\mu)^2/(2\psi^2)\right\}}%
{\displaystyle\sqrt{2\pi}\psi}$ $ \theta$ 1 $ \psi^2$
Inverse
Gaussian
$ IG(\mu,\psi^2)$
$ (0,\infty )$ $ \frac{\displaystyle\exp\left\{-(y-\mu)^2/(2\mu^2 y \psi^2)\right\}}%
{\displaystyle\sqrt{2\pi y^3}\psi}$ $ \frac{\displaystyle 1}{\displaystyle \sqrt{-2\theta}}$ $ \mu^3$ $ \psi^2$

Example 2 (Normal distribution)  
Suppose $ Y$ is normally distributed with $ Y\sim N(\mu,\sigma^2)$. The probability density function $ f(y) =
\exp\left\{-(y-\mu)^2/(2\sigma^2)\right\}/(\sqrt{2\pi}\sigma)$ can be written as in (7.1) by setting $ \theta=\mu$ and $ \psi=\sigma$ and $ a(\psi) = \psi^2$, $ b(\theta)= \theta^2/2$, and $ c(y,\psi)= -y^2/(2\psi^2) - \log(\sqrt{2\pi}\psi)$.

Example 3 (Bernoulli distribution)  
If $ Y$ is Bernoulli distributed its probability mass function is

$\displaystyle P(Y=y) = \mu^y (1-\mu)^{1-y} = \left\{\begin{array}{ll}
\mu &\qqu...
...textrm{if}\quad y=1,\\
1-\mu &\qquad \textrm{if}\quad y=0.
\end{array}\right.$

This can be transformed into $ P(Y=y) =\exp\left(y\theta\right)/(1+e^{\theta})$ using the logit transformation $ \theta = \log\left\{ \mu/(1-\mu)\right\}$ equivalent to $ \mu = e^\theta/(1+e^{\theta})$. Thus we obtain an exponential family with $ a(\psi) = 1$, $ b(\theta) = -\log(1-\mu) = \log(1+e^\theta)$, and $ c(y,\psi) = 0$.

Table 7.1 lists some probability distributions that are typically used for a GLM. For the binomial and negative binomial distribution the additional parameter $ k$ is assumed to be known. Note also that the Bernoulli, geometric and exponential distributions are special cases of the binomial, negative binomial and Gamma distributions, respectively.

7.2.2 Link Function

After having specified the distribution of $ Y$, the link function $ G$ is the second component to choose for the GLM. Recall the model notation $ \eta = \boldsymbol{X}^\top \boldsymbol{\beta} = G(\mu)$. In the case that the canonical parameter $ \theta$ equals the linear predictor $ \eta $, i.e. if

$\displaystyle \eta =\theta, $

the link function is called the canonical link function. For models with a canonical link the estimation algorithm simplifies as we will see in Sect. 7.3.3. Table 7.2 shows in its second column the canonical link functions of the exponential family distributions presented in Table 7.1.


Table 7.2: Characteristics of GLMs
  Canonical link Deviance
  $ \theta(\mu)$ $ D(\boldsymbol{y},\boldsymbol{\mu})$
Bernoulli
$ B(\mu)$
$ \log\left(\frac{\displaystyle \mu}{\displaystyle 1-\mu}\right)$ $ 2\sum \left[y_i\log\left(\frac{\displaystyle y_i}%
{\displaystyle \mu_i}\right...
...y_i)\log\left(\frac{\displaystyle 1-y_i}%
{\displaystyle 1-\mu_i}\right)\right]$
Binomial
$ B(k,\mu)$
$ \log\left(\frac{\displaystyle \mu}{\displaystyle k-\mu}\right)$ $ 2\sum \left[y_i\log\left(\frac{\displaystyle y_i}%
{\displaystyle \mu_i}\right...
...y_i)\log\left(\frac{\displaystyle k-y_i}%
{\displaystyle k-\mu_i}\right)\right]$
Poisson
$ P(\mu)$
$ \log(\mu)$ $ 2\sum \left[y_i\log\left(\frac{\displaystyle y_i}%
{\displaystyle \mu_i}\right)
-(y_i-\mu_i)\right]$
Geometric
$ \mathit{Geo}(\mu)$
$ \log\left(\frac{\displaystyle\mu}{\displaystyle 1+\mu}\right)$ $ 2\sum \left[y_i\log\left(\frac{\displaystyle y_i+y_i\mu_i}%
{\displaystyle \mu...
...t)
-\log\left(\frac{\displaystyle 1+y_i}%
{\displaystyle 1+\mu_i}\right)\right]$
Negative Binomial
$ N{{}}B(\mu,k)$
$ \log\left(\frac{\displaystyle\mu}{\displaystyle k+\mu}\right)$ $ 2\sum \left[y_i\log\left(\frac{\displaystyle y_ik+y_i\mu_i}%
{\displaystyle \m...
...\left\{\frac{\displaystyle k(k+y_i)}%
{\displaystyle k(k+\mu_i)}\right\}\right]$
Exponential
$ Exp(\mu)$
$ \frac{\displaystyle 1}{\displaystyle\mu}$ $ 2\sum \left[\frac{\displaystyle y_i-\mu_i}%
{\displaystyle \mu_i}
-\log\left(\frac{\displaystyle y_i}%
{\displaystyle \mu_i}\right)\right]$
Gamma
$ G(\mu,\psi)$
$ \frac{\displaystyle 1}{\displaystyle\mu}$ $ 2\sum \left[\frac{\displaystyle y_i-\mu_i}%
{\displaystyle \mu_i}
-\log\left(\frac{\displaystyle y_i}%
{\displaystyle \mu_i}\right)\right]$
Normal
$ N(\mu,\psi^2)$
$ \mu$ $ 2\sum \left[(y_i-\mu_i)^2\right]$
Inverse
Gaussian
$ IG(\mu,\psi^2)$
$ \frac{\displaystyle 1}{\displaystyle\mu^2}$ $ 2\sum \left[\frac{\displaystyle (y_i-\mu_i)^2}%
{\displaystyle y_i\mu_i^2}\right]$

Example 4 (Canonical link for Bernoulli $ Y$)  
For Bernoulli $ Y$ we have $ \mu=e^\theta/(1+e^\theta)$, hence the canonical link is given by the logit transformation $ \eta=\log\{\mu/(1-\mu)\}$.

What link functions could we choose apart from the canonical? For most of the models exists a number of specific link functions. For Bernoulli $ Y$, for example, any smooth cdf can be used. Typical links are the logistic and standard normal (Gaussian) cdfs which lead to logit and probit models, respectively. A further alternative for Bernoulli $ Y$ is the complementary log-log link $ \eta=\log\{-\log(1-\mu)\}$.

A flexible class of link functions for positive $ Y$ observations is the class of power functions. These links are given by the Box-Cox transformation ([6]), i.e. by $ \eta=(\mu^\lambda-1)/\lambda$ or $ \eta=\mu^\lambda$ where we set in both cases $ \eta=\log(\mu)$ for $ \lambda=0$.


next up previous contents index
Next: 7.3 Estimation Up: 7. Generalized Linear Models Previous: 7.1 Introduction