7.1 Estimating GLMs

It is known that the least squares estimator $ \widehat{\beta}$ in the classical linear model coincides with the maximum-likelihood estimator under the imposed normal distribution. By using appropriate distributional assumptions for $ Y$ in GLM, one may stay in the framework of maximum-likelihood in this case.


7.1.1 Models

For maximum-likelihood estimation, one assumes that the distribution of $ Y$ belongs to an exponential family. Exponential families cover a broad range of distributions, for example discrete as the Binomial and Poisson distribution or continuous as the Gaussian (normal) and Gamma distribution.

A distribution is said to belong to an exponential family if its probability function (if $ Y$ discrete) or its density function (if $ Y$ continuous) has the structure

$\displaystyle f(y,\theta,\phi) = \exp\left\{\frac{y\theta-b(\theta)}{a(\phi)} + c(y,\phi)\right\},$ (7.1)

with some special functions $ a(\bullet)$, $ b(\bullet)$ and $ c(\bullet)$. These functions vary for the distributions contained in this model class.

Generally speaking, we are interested in estimating $ \theta=\theta(x^T\beta)$, the canonical parameter. $ \phi$ is a nuisance parameter (as the variance $ \sigma^2$ in linear regression for example). Apart from the distribution of $ Y$, the link function is another essential part of the generalized linear model. Recall the notations

$\displaystyle \eta = x^T\beta \quad\textrm{ and }\quad \mu = G(\eta).$

For each distribution, one special link function exists, namely if

$\displaystyle x^T\beta = \eta =\theta. $

If this holds, the link function is called the canonical link function. For models with a canonical link, some theoretical and practical problems are easier to solve. Table 7.1 summarizes characteristics for some exponential functions together with canonical parameters $ \theta$ and their canonical link functions. Note that the Negative Binomial distribution only fits into the framework described above if we assume that the parameter $ k$ is known.


Table 7.1: Distribution implemented in GLM.
Notation Range $ b(\theta)$ $ \mu(\theta)$ Canonical Variance $ a(\phi)$
of $ y$ link $ \theta(\mu)$ $ V(\mu)$
Normal

$ N(\mu,\sigma^2)$
$ (-\infty,\infty)$ $ \theta^2/2$ $ \theta$ identity 1 $ \sigma^2$
Poisson

$ P(\mu)$
$ [0,\infty)$
integer
$ \exp(\theta)$ $ \exp(\theta)$ $ \log$ $ \mu$ 1
Binomial

$ B(m,\pi)$
$ [0,m]$
integer
$ m\log(1+e^\theta)$ $ \frac{\displaystyle e^\theta}{\displaystyle 1+e^\theta}$ logit $ m\pi(1-\pi)$ 1
Gamma

$ G(\mu,\nu)$
$ (0,\infty)$ $ -\log(-\theta)$ $ -\,1/\theta$ reciprocal $ \mu^2$ $ 1/\nu$
Inverse
Gaussian
$ IG(\mu,\sigma^2)$
$ (0,\infty)$ $ -(-2\theta)^{1/2}$ $ \frac{\displaystyle -1}{\displaystyle \sqrt{(-2\theta)}}$
squared
reciprocal
$ \mu^3$ $ \sigma^2$
Negative
Binomial
$ N{\!}B(\mu,k)$
$ [0,\infty)$
integer
$ \frac{\displaystyle - \log\left(1 - e^{\theta}\right)}
{\displaystyle k}$ $ \frac{\displaystyle e^\theta\!}{\displaystyle k(1-e^\theta)} $ $ \log\left(\frac{k\mu}{1+k\mu}\right) $ $ \mu + k\mu^2$ 1



7.1.2 Maximum-Likelihood Estimation

All models in the glm library are estimated by maximum-likelihood. The default numerical algorithm is the Newton-Raphson iteration (except for ordinary regression where no iteration is necessary). Optionally, a Fisher Scoring can be chosen, which uses the expectation of the Hessian matrix instead of the Hessian itself. In the case of a canonical link function, the Newton-Raphson algorithm and the Fisher scoring algorithm coincide.