7.2 Model Characteristics

We say that a distribution is a member of the exponential family if its probability mass function (if

discrete) or its density function (if

continuous) has the following form:

**Table 7.1:** GLM distributions
	Range			Variance terms
	of		$\mu(\theta)$	$V(\mu)$	$a(\psi)$
Bernoulli $B(\mu)$	$\{0,1\}$	$\mu^y(1-\mu)^{1-y}$	$\frac{\displaystyle e^\theta}{\displaystyle 1+e^\theta}$	$\mu(1-\mu)$	1
Binomial $B(k,\mu)$	$\{0,\ldots,k\}$	$\left(\genfrac{}{}{0pt}{0}{k}{y}\right) \mu^y(1-\mu)^{k-y}$	$\frac{\displaystyle ke^\theta}{\displaystyle 1+e^\theta}$	$\mu\left(1-\frac{\displaystyle \mu}{\displaystyle k}\right)$	1
Poisson $P(\mu)$	$\{0,1,2,\ldots\}$	$\frac{\displaystyle \mu^y}{\displaystyle y!\displaystyle }e^{-\mu}$	$\exp(\theta)$	$\mu$	1
Geometric $\mathit{Geo}(\mu)$	$\{0,1,2,\ldots\}$	$\left(\frac{\displaystyle\mu}{\displaystyle 1+\mu}\right)^y \left(\frac{\displaystyle 1}{\displaystyle 1+\mu}\right)$	$\frac{\displaystyle e^\theta{}}{\displaystyle 1-e^\theta}$	$\mu + \mu^2$	1
Negative Binomial $N{{}}B(\mu,k)$	$\{0,1,2,\ldots\}$	$\left(\genfrac{}{}{0pt}{0}{k+y-1}{y}\right) \left(\frac{\displaystyle\mu}{\displaystyle k+\mu}\right)^y \left(\frac{\displaystyle k}{\displaystyle k+\mu}\right)$	$\frac{\displaystyle ke^\theta{}}{\displaystyle 1-e^\theta}$	$\mu + \frac{\displaystyle \mu^2}{\displaystyle k}$	1
Exponential $Exp(\mu)$	$(0,\infty )$	$\frac{\displaystyle 1}{\displaystyle\mu} \exp\left(-\,\frac{\displaystyle x}{\displaystyle\mu}\right)$	$-\,1/\theta$	$\mu^2$
Gamma $G(\mu,\psi)$	$(0,\infty )$	$\frac{\displaystyle 1}{\displaystyle \mathit{\Gamma}(\psi)} \left(\frac{\displ... ...si \exp\left(-\,\frac{\displaystyle\psi y}{\displaystyle \mu}\right) y^{\psi-1}$	$-\,1/\theta$	$\mu^2$	$\frac{\displaystyle 1}{\displaystyle\psi}$
Normal $N(\mu,\psi^2)$	$(-\infty,\infty)$	$\frac{\displaystyle\exp\left\{-(y-\mu)^2/(2\psi^2)\right\}}% {\displaystyle\sqrt{2\pi}\psi}$	$\theta$	1	$\psi^2$
Inverse Gaussian $IG(\mu,\psi^2)$	$(0,\infty )$	$\frac{\displaystyle\exp\left\{-(y-\mu)^2/(2\mu^2 y \psi^2)\right\}}% {\displaystyle\sqrt{2\pi y^3}\psi}$	$\frac{\displaystyle 1}{\displaystyle \sqrt{-2\theta}}$	$\mu^3$	$\psi^2$

Example 3 (Bernoulli distribution)
If is Bernoulli distributed its probability mass function is

$\displaystyle P(Y=y) = \mu^y (1-\mu)^{1-y} = \left\{\begin{array}{ll} \mu &\qqu... ...textrm{if}\quad y=1,\\ 1-\mu &\qquad \textrm{if}\quad y=0. \end{array}\right.$

This can be transformed into $P(Y=y) =\exp\left(y\theta\right)/(1+e^{\theta})$ using the logit transformation $\theta = \log\left\{ \mu/(1-\mu)\right\}$ equivalent to $\mu = e^\theta/(1+e^{\theta})$ . Thus we obtain an exponential family with $a(\psi) = 1$ , $b(\theta) = -\log(1-\mu) = \log(1+e^\theta)$ , and $c(y,\psi) = 0$ .

Table 7.1 lists some probability distributions that are typically used for a GLM. For the binomial and negative binomial distribution the additional parameter

is assumed to be known. Note also that the Bernoulli, geometric and exponential distributions are special cases of the binomial, negative binomial and Gamma distributions, respectively.

7.2.2 Link Function

After having specified the distribution of

, the link function

is the second component to choose for the GLM. Recall the model notation $\eta = \boldsymbol{X}^\top \boldsymbol{\beta} = G(\mu)$ . In the case that the canonical parameter $\theta$ equals the linear predictor $\eta$ , i.e. if

the link function is called the canonical link function. For models with a canonical link the estimation algorithm simplifies as we will see in Sect. 7.3.3. Table 7.2 shows in its second column the canonical link functions of the exponential family distributions presented in Table 7.1.

**Table 7.2:** Characteristics of GLMs
	Canonical link	Deviance
	$\theta(\mu)$	$D(\boldsymbol{y},\boldsymbol{\mu})$
Bernoulli $B(\mu)$	$\log\left(\frac{\displaystyle \mu}{\displaystyle 1-\mu}\right)$	$2\sum \left[y_i\log\left(\frac{\displaystyle y_i}% {\displaystyle \mu_i}\right... ...y_i)\log\left(\frac{\displaystyle 1-y_i}% {\displaystyle 1-\mu_i}\right)\right]$
Binomial $B(k,\mu)$	$\log\left(\frac{\displaystyle \mu}{\displaystyle k-\mu}\right)$	$2\sum \left[y_i\log\left(\frac{\displaystyle y_i}% {\displaystyle \mu_i}\right... ...y_i)\log\left(\frac{\displaystyle k-y_i}% {\displaystyle k-\mu_i}\right)\right]$
Poisson $P(\mu)$	$\log(\mu)$	$2\sum \left[y_i\log\left(\frac{\displaystyle y_i}% {\displaystyle \mu_i}\right) -(y_i-\mu_i)\right]$
Geometric $\mathit{Geo}(\mu)$	$\log\left(\frac{\displaystyle\mu}{\displaystyle 1+\mu}\right)$	$2\sum \left[y_i\log\left(\frac{\displaystyle y_i+y_i\mu_i}% {\displaystyle \mu... ...t) -\log\left(\frac{\displaystyle 1+y_i}% {\displaystyle 1+\mu_i}\right)\right]$
Negative Binomial $N{{}}B(\mu,k)$	$\log\left(\frac{\displaystyle\mu}{\displaystyle k+\mu}\right)$	$2\sum \left[y_i\log\left(\frac{\displaystyle y_ik+y_i\mu_i}% {\displaystyle \m... ...\left\{\frac{\displaystyle k(k+y_i)}% {\displaystyle k(k+\mu_i)}\right\}\right]$
Exponential $Exp(\mu)$	$\frac{\displaystyle 1}{\displaystyle\mu}$	$2\sum \left[\frac{\displaystyle y_i-\mu_i}% {\displaystyle \mu_i} -\log\left(\frac{\displaystyle y_i}% {\displaystyle \mu_i}\right)\right]$
Gamma $G(\mu,\psi)$	$\frac{\displaystyle 1}{\displaystyle\mu}$	$2\sum \left[\frac{\displaystyle y_i-\mu_i}% {\displaystyle \mu_i} -\log\left(\frac{\displaystyle y_i}% {\displaystyle \mu_i}\right)\right]$
Normal $N(\mu,\psi^2)$	$\mu$	$2\sum \left[(y_i-\mu_i)^2\right]$
Inverse Gaussian $IG(\mu,\psi^2)$	$\frac{\displaystyle 1}{\displaystyle\mu^2}$	$2\sum \left[\frac{\displaystyle (y_i-\mu_i)^2}% {\displaystyle y_i\mu_i^2}\right]$

What link functions could we choose apart from the canonical? For most of the models exists a number of specific link functions. For Bernoulli

, for example, any smooth cdf can be used. Typical links are the logistic and standard normal (Gaussian) cdfs which lead to logit and probit models, respectively. A further alternative for Bernoulli

is the complementary log-log link $\eta=\log\{-\log(1-\mu)\}$ .

A flexible class of link functions for positive

observations is the class of power functions. These links are given by the Box-Cox transformation ([6]), i.e. by $\eta=(\mu^\lambda-1)/\lambda$ or $\eta=\mu^\lambda$ where we set in both cases $\eta=\log(\mu)$ for $\lambda=0$ .

7.2 Model Characteristics

7.2.1 Exponential Family

7.2.2 Link Function