21.2 Semi-parametric Model for Credit Rating

The logistic regression model for the estimate of the conditional probability suffers under the same restrictions as the linear regression model when estimating the general functions. In order to avoid the dependence on the special parametric form of the model and to gain more flexibility in the function estimation it is suggested to estimate $ \pi (x)$ nonparametrically, for example, with the LP-method given by (13.4) and (13.7). In doing this, however, it is not guaranteed that the function estimator will lie between 0 and 1. In order to enforce this possible, as was done in the previous section, we transform the value space of the estimated function to the interval [0,1] using a given function $ G$:

$\displaystyle \pi (x) = G(m(x)) $

where $ m(x)$ is an arbitrary real valued function that can be estimated nonparametrically. For the estimate of the default probabilities the local smoothing methods are less suitable for two reasons. First of all $ x$ is often high dimensional in the application, for example, after adding the necessary dummy variables in the example considered by Müller and Rönz (2000) it has a dimension of 61, that even by larger local neighborhoods of $ x$ of the random sample, over which the estimation occurs, there are either too few observations or too large to produce a reliable estimate of $ m(x)$. This problem can be solved by restricting ourselves to additive models

$\displaystyle \pi (x) = G (\sum^ d_{i=1} m_i (x_i ) ), $

where $ m_1 (u), \ldots, m_d (u) $ are arbitrary functions of the one-dimensional variable $ u$. It is however more critical that many of the coordinates of $ x$ take on a value of 0 or 1 in the credit rating, since they represent from the very beginning dichotomic characteristics or have been added as dummy variables for the unordered qualitative characteristics. Local smoothing functions would be suitable based on their underlying philosophy, but mainly for estimating functions with continuous arguments.

A combination of nonparametric and parametric applications offers the possibility to use the flexibility of the nonparametric method by credit rating, Müller and Rönz (2000). In doing so the influential variables are not combined in a random vector $ X_j,$ but are separated into two random vectors $ X_j \in \mathbb{R}^ p,
Z_j \in \mathbb{R}^ q$. The coordinates of $ Z_j$ represent several chosen exclusive quantitative characteristics and eventual hierarchical qualitative characteristics with sufficiently accurate subdivided value spaces. All remaining characteristics, especially the dichotomic and the dummy variables of unordered qualitative characteristics, are combined in $ X_j$. In order to estimate the default probability we consider a generalized partial linear model (GPLM = generalized partial linear model) :

$\displaystyle \P(Y_j = 1 \vert X_j = x, Z_j = z) = \pi (x, z) = G(\beta ^\top x + m (z)). $

$ G$ is again a known function with values between 0 and 1, for example, the logistic function $ \psi .$ $ \beta_1, \ldots, \beta
_p$ are unknown parameters, $ m$ is an arbitrary, unknown function that can contain an additive constant and thus can make an additional parameter $ \beta_0$ superfluous. Müller (2000) has shown in an extensive case study that the additional flexibility from the nonparametric part $ m(z)$ of the model results in a better estimate of the default probability than a pure parametric logistic regression.

There are various algorithms for estimating $ \beta$ and $ m(z),$ for example the profile likelihood method from Severini and Wong (1992) and Severini and Staniswallis (1994) or the back-fitting method from Hastie and Tibshirani (1990). Essentially they use the fact that for the known function $ m(z)$ of the parameter vector $ \beta$ can be estimated through maximization of the log-likelihood function analog to the logistic regression

$\displaystyle \log L(\beta) = \sum^ n_{j=1} [Y_j \log G(\beta ^\top X_j + m (Z_j)) +
(1-Y_j) \log \{ 1-G ( \beta ^\top X_j + m(Z_j)) \} ] $

and for known $ \beta$ the function $ m(z)$ can be estimated with local smoothing analog to the LP-Method (13.4), (13.7). Both of these optimization problems are combined in an iterative numerical algorithm.

Example 21.1  
As an example we consider the rating of consumer credit already referred to above that Müller (2000) has done with a GPLM method. The data represent a part of the extensive random sample, which is described in detail by Fahrmeir and Tutz (1994). We use a total of $ n = 564$ observations, in which 24.3 % of the cases have a problem with repaying the credit ($ Y_j = 1$). From the 5 influential variables considered, two are dichotomic; they indicate whether the customer is unemployed or not ($ X_{j1}$) and whether the customer had credit problems in the past or not ($ X_{j2}$). The remaining three variables are quantitative: the duration of the credit ($ X_{j3}$ with values between 4 and 72 months), the level of the credit (between 338 and 15653 DM) and the age of the customer (between 19 and 75 years). We will take the logarithm of the last two variables and transform them linearly so that they take on a value in the interval $ [0,1]$. The data points, as can be seen in Figure 20.1, are dispersed comparatively homogenous over a part of the plane, which makes the local smoothing easier. These transformed variables are called $ Z_{j1}$ and $ Z_{j2}$. We fit a GPLM
$\displaystyle \P(Y_j = 1$ $\displaystyle \vert$ $\displaystyle X_{j1} = x_1, X_{j2} = x_2, X_{j3} = x_3, Z_{j1} = z_1, Z_{j2} = z_2)$  
    $\displaystyle = \psi ( \sum_{k=1}^{3} \beta_k x_k + m (z_1, z_2))$  

to the data and obtain the estimates (the corresponding standard deviation is given in parentheses)

$\displaystyle \beta_1 = 0.965 \, (0.249) , \; \beta_2 = 0.746 \, (0.237) , \; \beta_3 = -0.0498 \, (0.0115) .$

Fig.: The scatter plot of the transferred variables: level of credit and age of the customers. 35640 SFEgplm.xpl
\includegraphics[width=1.2\defpicwidth]{gplmobs.ps}

The probability of default on the credit increases when the customer is unemployed respectively when the customer had repayment problems in the past. It decreases with the duration of the credit. The dependence on the transformed credit levels and ages are nonparametrically estimated. From the Figure 20.2 it is obvious to see that the estimated function $ \hat{m} (z_1, z_2)$ is clearly non-linear with a maximum by the average value of the credit level and age. The decrease in the probability of default by high levels of credit can be explained by the fact that the random sample contains only those credits that have actually been given and that the processor was essentially reluctant to give out large credits when the customer appeared to be unreliable. This effect, which is caused by the credit ratings from the past, occurs on a regular basis in credit rating. Even if a systematic, model based method was not used, that exclude the credit screening of extreme risks from the very beginning and thus these ratings no longer appear in the data. Thus it must be considered when interpreting and applying a model.

Fig.: The estimated function with respect to level of credit and age of the customers. 35644 SFEgplm.xpl
\includegraphics[width=1.2\defpicwidth]{gplmfunc.ps}