21.1 Logistic Regression

In order to judge the credit standing of a customer a series of data are in general available. For a consumer credit there are, for example, in Müller (2000): level of credit, age of the customer, duration of credit as well as information on whether the customer is unemployed or not and whether there were problems in the past with repaying a loan. For the insolvency prognoses for a small company relevant information would be, for example, in Anders (1997): age of the business, sales development from the recent past, educational degree of the entrepreneur, type of business and the information on liability.

Some influential values are quantitative such as credit volume and sales development. Others are qualitative in nature and must be transformed into numbers for estimating the default probability. For dichotomic characteristics (unemployed, employed, limited liability, unlimited liability) indicator variables are set with values of 0 and 1. For characteristics with $ d > 2$ possibilities and for categorical values $ d-1$ dummy variables are introduced, which also take on the value of 0 or 1. Coding the characteristics numerically the type of business and three clarifying variables trade, processed business, other are considered for which two Dummy variables, $ Z_1, Z_2,$ are used where $ Z_1 =
1\ (Z_2 = 1)$ if and only if the type of business is trade (processed business). When $ Z_1 = Z_2 =0,$ the firm considered belongs to one of the other types of business, for example, services. The case $ Z_1 = Z_2 = 1$ can not occur.

If the values of the qualitative characteristics are hierarchically ordered, then it is possible to represent them with an integer valued random variable. The personal impression of the processor in the bank of the economic situation of a company: very good, good, satisfactory, poor, very poor can, for example, be transformed into a number of scale: 1, 2, 3, 4, 5. Here one must be certain that every monotone transformation, i.e., where the order remains consistent, produces a different numerical code that can be used with the same justification. Instead of 1, 2, 3, 4, 5 one could also use 0, 1, 3, 6, 10 for instance. Using parametric applications such as the logistic regression one should specify the arbitrary setting of a numerical scale for the hierarchical characteristics. Through a monotone transformation of the scale better estimates can eventually be obtained for the default probabilities. Adequately flexible nonparametric and semi-parametric applications, in contrast, choose automatically a suitable scale.

In order to estimate the default probability of a credit, given the information available at the time the decision is made, we assume a random sample $ (X_1, Y_1), \ldots, (X_n, Y_n)$ is independent, identically distributed. $ X_j \in \mathbb{R}^ d $ stands for the information available at the time the credit is issued to the $ j$-th customer, where qualitative characteristics are already transformed into numerical values as described above. $ Y_j \in \{ 0,1 \} $ is the indicator variable of the credit: it has a value of 0 when the loan can be paid back without any problems and 1 when the credit partially or completely defaulted. The default probability that is to be estimated is the conditional probability that $ Y_j = 1,$ given $ X_j = x:$

$\displaystyle \pi (x) = \P(Y_j = 1 \vert X_j = x),\ x \in {\cal X}, $

where $ {\cal X} \subset \mathbb{R}^ d $ represents the value space of $ X_j$.

Since $ \pi (x)$ only takes on the values between 0 and 1 given that it is a probability, linear regression models cannot be used for the function estimator. The class of generalized linear models (GLM) can, however, be used to estimate the probabilities. Here it is assumed that

$\displaystyle \pi (x) = G(\beta _0 + \sum^ d_{i=1} x_i \beta _i) = G(\beta _0 + \beta ^\top
x).
$

$ G : \mathbb{R} \rightarrow [0,1]$ is a known function that only takes on a value between 0 and 1, the real valued parameters $ \beta _0, \ldots, \beta_d $ are unknown and need to be estimated. For the special case that $ G$ is chosen to be a logistic function $ \psi$:

$\displaystyle G(t) = \psi (t) = \frac{1}{1+e^ {-t}}, $

we obtain the model of the logistic regression: Given $ X_1, \ldots, X_n,$ the credit indicators $ Y_1, \ldots, Y_n$ are independent Bernoulli random variables with parameters $ \psi (\beta _0 + \beta ^\top X_1),
\ldots, \psi (\beta _0 + \beta ^\top X_n).$ The conditional likelihood function is thus

$\displaystyle L(\beta _0, \ldots, \beta _d) = \Pi ^ n_{j=1} [Y_j \psi (\beta_0 + \beta ^\top X_j) + (1 - Y_j) \{ 1 - \psi (\beta _0 + \beta ^\top X_j) \} ] . $

Since $ Y_j$ only takes on a value between 0 and 1, the corresponding conditional log-likelihood function is

$\displaystyle \log L(\beta_0, \ldots, \beta _d) = \sum^ n_{j=1} [Y_j \log \psi ...
...eta ^\top X_j) + (1-Y_j) \log \{ 1 - \psi (\beta _0 + \beta
^\top X_j ) \} ] . $

Through maximizing $ L$ or $ \log L$ one obtains the maximum likelihood estimator $ \hat{\beta}_0, \ldots,
\hat{\beta}_d$ of $ \beta _0, \ldots, \beta_d $ and thus the maximum likelihood estimator for the default probability in the logistic regression model:

$\displaystyle \hat{\pi} (x) = \psi (\hat{\beta}_0 + \hat{\beta}^\top x).$