7.4 Practical Aspects

To illustrate the GLM in practice we recall Example 1 on credit worthiness. The credit data set that we use [13] contains

observations on consumer credits and a variety of explanatory variables. We have selected a subset of eight explanatory variables for the following examples.

The model for credit worthiness is based on the idea that default can be predicted from the individual and loan characteristics. We consider criteria as age, information on previous loans, savings, employment and house ownership to characterize the credit applicants. Amount and duration of the loan are prominent features of the granted loans. Some descriptive statistics can be found in Table 7.3. We remark that we have categorized the durations (months) into intervals since most of the realizations are multiples of

months.

**Table 7.3:** Credit data
Variable	Yes	No	(in %)
(observed default)	30.0	70.0
PREVIOUS (no problem)	38.1	61.9
EMPLOYED ( $\ge 1$ year)	93.8	6.2
DURATION	21.6	78.4
DURATION	18.7	81.3
DURATION	22.4	77.6
DURATION $\ge 24$	23.0	77.0
SAVINGS	18.3	81.7
PURPOSE (buy a car)	28.4	71.6
HOUSE (owner)	15.4	84.6
	Min.	Max.	Mean	Std.Dev.
AMOUNT (in DM)	250	18424	3271.248	2822.752
AGE (in years)	19	75	35.542	11.353

We are at the first place interested in estimating the probability of credit default in dependence of the explanatory variables $\boldsymbol {X}$ . Recall that for binary

it holds $P(Y=1\vert\boldsymbol{X})=E(Y\vert\boldsymbol{X}).$ Our first approach is a GLM with logit link such that $P(Y=1\vert\boldsymbol{X})=\exp(\boldsymbol{X}^\top \boldsymbol{\beta})/\{1+\exp(\boldsymbol{X}^\top \boldsymbol{\beta})\}$ .

Example 7 (Credit default on $\mathrm{AGE}$ )
We initially estimate the default probability solely related to age, i.e. the model

$\displaystyle P(Y=1\vert\mathrm{AGE})=\frac{\exp(\beta_0 + \beta_1 \mathrm{AGE})}{1 +\exp(\beta_0 + \beta_1 \mathrm{AGE})}\,$

or equivalently $\mathrm{logit}\left\{P(Y=1\vert\mathrm{AGE})\right\} = \beta_0 + \beta_1 \mathrm{AGE}$ . The resulting estimates of the constant $\beta_0$ and the slope parameter $\beta_1$ are displayed in Table 7.4 together with summary statistics on the model fit.

From the table we see that the estimated coefficient of $\mathrm{AGE}$ has a negative sign. Since the link function and its inverse are strictly monotone increasing, we can conclude that the probability of default must thus be decreasing with increasing $\mathrm{AGE}$ . Figure 7.1 shows on the left frequency barplots of $\mathrm{AGE}$ separately for and . From the observed frequencies we can recognize clearly the decreasing propensity to default. The right graph in Fig. 7.1 displays the estimated probabilities $P(Y=1\vert\mathrm{AGE})$ using the fitted logit model which are indeed decreasing.

The -values ( $\sqrt{n}\,\widehat{\beta_j}/\sqrt{\widehat{\Sigma_{jj}}}$ ) show that the coefficient of $\mathrm{AGE}$ is significantly different from while the estimated constant is not. The test that is used here is an approximative -test such that $z_{1-\alpha/2}$ -quantile of the standard normal can be used as critical value. This implies that at the usual $5\%$ level we compare the absolute value of the -value with $z_{0.975}\approx 1.96$ .

A more general approach to test for the significance of $\mathrm{AGE}$ is to compare the fitted model with a model that involves only a constant default probability. Typically software packages report the deviance of this model as null deviance or similar. In our case we find a null deviance of at degrees of freedom. If we apply the LR test statistic (7.16) to compare the null deviance to the model deviance of at degrees of freedom, we find that constant model is clearly rejected at a significance level of $0.33\,{\%}$ .

**Figure 7.1:** Credit default on $\textrm {AGE}$ , *left*: frequency barplots of $\textrm {AGE}$ for (*light*) and (*dark*), *right*: estimated probabilities
$\includegraphics[width=117mm,clip]{text/3-7/abb/mm1}$

Models using different link functions cannot be directly compared as the link functions might be differently scaled. In our binary response model for example a logit or a probit link function may be reasonable. However, the variance parameter of the standard logistic distribution is $\pi^2/3$ whereas that of the standard normal is

. We therefore need to rescale one of the link functions in order to compare the resulting model fits. Figure 7.2 shows the standard logistic cdf (the inverse logit link) against the cdf of $N(0,\pi^2/3)$ . The functions in the left graph of Fig. 7.2 are hardly distinguishable. If we zoom in (right graph) we see that the logistic cdf vanishes to zero at the left boundary at a lower rate. This holds similarly for the right boundary and explains the ability of logit models to (slightly) better handle the case of extremal observations.

**Figure 7.2:** Logit (*solid*) versus appropriately rescaled probit link (*dashed*), *left*: on the range , *right*: on the range of
$\includegraphics[width=117mm,clip]{text/3-7/abb/mm2}$

Example 8 (Probit versus logit)
If we want to compare the estimated coefficients from a probit to that of the logit model we need to rescale the probit coefficients by $\pi/\sqrt{3}$ . Table 7.5 shows the results of a probit for credit default on $\mathrm{AGE}$ . The resulting rescaled coefficient for $\mathrm{AGE}$ in is of similar size as that for the logit model (see Table 7.4) while the constant is not significantly different from in both fits. The deviance and the of the probit fit are slightly larger.

A Newton-Raphson iteration (instead of the Fisher scoring reported in Table 7.5) does give somewhat different coefficients but returns nearly the same value of the deviance ( for Newton-Raphson versus for Fisher scoring).

The next two examples intend to analyze if the fit could be improved by using a nonlinear function on $\mathrm{AGE}$ instead of $\eta=\beta_0+\beta_1 \mathrm{AGE}$ . Two principally different approaches are possible:

Example 9 (Credit default on polynomial $\mathrm{AGE}$ )
We fit two logit models using second and third order terms in $\mathrm{AGE}$ . The estimated coefficients are presented in Table 7.6. A comparison of the quadratic fit and the linear fit from Example 7 using the LR test statistic (7.16) shows that the linear fit is rejected at a significance level of $3\%$ . A subsequent comparison of the quadratic against the cubic fit no significant improvement by the latter model. Thus, the quadratic term for $\mathrm{AGE}$ improves the fit whereas the cubic term does not show any further statistically significant improvement. This result is confirmed when we compare the AIC values of both models which are practically identical. Figure 7.3 shows the estimated default probabilities for the quadratic (left) and cubic $\mathrm{AGE}$ fits. We find that the curves are of similar shape.

**Figure 7.3:** Credit default on polynomial $\textrm {AGE}$ , *left*: estimated probabilities from quadratic function, *right*: estimated probabilities from cubic function
$\includegraphics[width=117mm,clip]{text/3-7/abb/mm3}$

To incorporate a possible nonlinear impact of a variable in the index function, we can alternatively categorize this variable. Another term for this is the construction of dummy variables. The most classical form of the categorization consists in using a design matrix that sets a value of

in the column corresponding to the category if the category is true and

otherwise. To obtain a full rank design matrix we omit one column for the reference category. In our example we leave out the first category which means that all estimated coefficients have to be compared to the zero coefficient of the reference category. Alternative categorization setups are given by omitting the constant, the sum coding (restrict the coefficients to sum up to

), and the Helmert coding.

Example 10 (Credit default on categorized $\mathrm{AGE}$ )
We have chosen the intervals , , $\ldots$ , as categories. Except for the last interval all of them are of the same length. The first interval is chosen for the reference such that we will estimate coefficients only for the remaining intervals.

Frequency barplots for the intervals and estimated default probabilities are displayed in Fig. 7.4. The resulting coefficients for this model are listed in Table 7.7. We see here that all coefficient estimates are negative. This means, keeping in mind that the group of youngest credit applicants is the reference, that all applicants from other age groups have an (estimated) lower default probability. However, we do not have a true decrease in the default probabilities with $\mathrm{AGE}$ since the coefficients do not form a decreasing sequence. In the range from age to we find two local minima and maxima for the estimated default probabilities.

It is interesting to note that the deviance of the categorized $\mathrm{AGE}$ fit is the smallest that we obtained up to now. This is explained by the fact that we have fitted the most flexible model here. Unfortunately, this flexibility pays with the number of parameters. The AIC criterion as a compromise between goodness-of-fit and number of parameters states that all previous fitted models are preferable. Nevertheless, categorization is a valuable tool to explore if there are nonlinear effects. A related technique is local regression smoothing which is shortly reviewed in Sect. 7.5.8.

The estimation of default probabilities and the prediction of credit default should incorporate more than only one explanatory variable. Before fitting the full model with all available information, we discuss the modeling of interaction effects.

**Figure 7.4:** Credit default on categorized $\textrm {AGE}$ , *left*: frequency barplots of categorized $\textrm {AGE}$ for (*light*) and (*dark*), *right*: estimated probabilities
$\includegraphics[width=117mm,clip]{text/3-7/abb/mm4}$

Example 11 (Credit default on $\mathrm{AGE}$ and $\mathrm{AMOUNT}$ )
The variable $\mathrm{AMOUNT}$ is the second continuous explanatory variable in the credit data set. (Recall that duration is quantitative as well but quasi-discrete.) We will therefore use $\mathrm{AGE}$ and $\mathrm{AMOUNT}$ to illustrate the effects of the simultaneous use of two explanatory variables. A very simple model is of course $\textrm{logit}\left\{P(Y=1\vert\textrm{AGE,AMOUNT})\right\} =\beta_0 + \beta_1 \textrm{AGE} + \beta_2 \textrm{AMOUNT}$ . This model, however, separates the impact of $\mathrm{AGE}$ and $\mathrm{AMOUNT}$ into additive components. The effect of having both characteristics simultaneously is modeled by adding the multiplicative interaction term $\mathrm{AGE}*\mathrm{AMOUNT}$ . On the other hand we have seen that at least $\mathrm{AGE}$ should be complemented by a quadratic term. For that reason we compare the linear interaction model $\textrm{logit}\left\{P(Y=1\vert\textrm{AGE,AMOUNT})\right\} =\beta_0 + \beta_1 \textrm{AGE} + \beta_2 \textrm{AMOUNT} + \beta_3 \textrm{AGE}*\textrm{AMOUNT}$ with a specification using quadratic terms and a third model specification using both, quadratic and interaction terms.

Table 7.8 shows the results for all three fitted models. The model with quadratic and interaction terms has the smallest AIC of the three fits. Pairwise LR tests show, however, that the largest of the three models is not significantly better than the model without the interaction term. The obtained surface on $\mathrm{AGE}$ and $\mathrm{AMOUNT}$ from the quadratic+interaction fit is displayed in Fig. 7.5.

**Table 7.8:** Credit default on $\textrm {AGE}$ and $\textrm {AMOUNT}$ (logit model)
Variable	Coefficient	-value	Coefficient	-value	Coefficient	-value
constant
$\mathrm{AGE}$
$\mathrm{AGE}**2$			$9.86\times10^{-4}$		$9.32\times10^{-4}$
$\mathrm{AMOUNT}$	$-2.80\times10^{-5}$		$-7.29\times10^{-6}$		$-1.18\times10^{-4}$
$\mathrm{AMOUNT}**2$			$1.05\times10^{-8}$		$9.51\times10^{-9}$
$\mathrm{AGE}*\mathrm{AMOUNT}$	$3.99\times10^{-6}$				$3.37\times10^{-6}$
Deviance
df
AIC
Iterations

Let us remark that interaction terms can also be defined for categorical variables. In this case interaction is modeled by including dummy variables for all possible combinations of categories. This may largely increase the number of parameters to estimate.

**Figure 7.5:** Credit default on $\textrm {AGE}$ and $\textrm {AMOUNT}$ using quadratic and interaction terms, *left*: surface and *right*: contours of the fitted $\eta$ function
$\includegraphics[width=117mm,clip]{text/3-7/abb/mm5}$

Example 12 (Credit default on the full set of explanatory variables)
In a final analysis we present now the results for the full set of variables from Table 7.3. We first estimated a logit model using all variables ( $\mathrm{AGE}$ and $\mathrm{AMOUNT}$ also with quadratic and interaction terms). Most of the estimated coefficients in the second column of Table 7.9 have the expected sign. For example, the default probability decreases if previous loan were paid back without problems, the credit applicant is employed and has some savings, and the loan is used to buy a car (rather than to invest the loan into goods which cannot serve as a security). A bit surprising is the fact that house owners seem to have higher default probabilities. This might be explained by the fact that house owners usually have additional obligations. The DURATION variable is categorized as described above. Again we have used the first category (loans up to months) as reference. Since the series of DURATION coefficients is monotone increasing, we can conclude that longer duration increases the default probability. This is also plausible.

After fitting the full model we have run an automatic stepwise model selection based on AIC. This reveals that the insignificant terms $\mathrm{AGE}*\mathrm{AMOUNT}$ and EMPLOYED should be omitted. The fitted coefficients for this final model are displayed in the fourth column of Table 7.9.

**Table 7.9:** Credit default on full set of variables (logit model)
Variable	Coefficient	-value	Coefficient	-value
constant
$\mathrm{AGE}$
$\mathrm{AGE}**2$	$8.33\times10^{-4}$		$9.35\times10^{-4}$
$\mathrm{AMOUNT}$	$-2.51\times10^{-4}$		$-1.67\times10^{-4}$
$\mathrm{AMOUNT}**2$	$1.73\times10^{-8}$		$1.77\times10^{-8}$
$\mathrm{AGE}*\mathrm{AMOUNT}$	$2.36\times10^{-6}$
PREVIOUS
EMPLOYED
DURATION
DURATION
DURATION
DURATION $\ge 24$
SAVINGS
PURPOSE
HOUSE
Deviance
df
$\mathrm{AIC}$
Iterations

Variable	Coefficients	-values
constant
$\mathrm{AGE}$ (23,28]
$\mathrm{AGE}$ (28,33]
$\mathrm{AGE}$ (33,38]
$\mathrm{AGE}$ (38,43]
$\mathrm{AGE}$ (43,48]
$\mathrm{AGE}$ (48,53]
$\mathrm{AGE}$ (53,58]
$\mathrm{AGE}$ (58,63]
$\mathrm{AGE}$ (63,68]
$\mathrm{AGE}$ (68,75]
Deviance
df
AIC
Iterations