In order to judge the credit standing of a customer a series of data are in general available. For a consumer credit there are, for example, in Müller (2000): level of credit, age of the customer, duration of credit as well as information on whether the customer is unemployed or not and whether there were problems in the past with repaying a loan. For the insolvency prognoses for a small company relevant information would be, for example, in Anders (1997): age of the business, sales development from the recent past, educational degree of the entrepreneur, type of business and the information on liability.
Some influential values are quantitative such as credit volume and
sales development. Others are qualitative in nature and must be
transformed into numbers for estimating the default probability.
For dichotomic characteristics (unemployed, employed, limited
liability, unlimited liability) indicator variables are set with
values of 0 and 1. For characteristics with possibilities
and for categorical values
dummy variables are introduced,
which also take on the value of 0 or 1. Coding the characteristics
numerically the type of business and three clarifying
variables trade, processed business, other are considered
for which two Dummy variables,
are used where
if and only if the type of business is trade
(processed business). When
the firm considered
belongs to one of the other types of business, for example,
services. The case
can not occur.
If the values of the qualitative characteristics are hierarchically ordered, then it is possible to represent them with an integer valued random variable. The personal impression of the processor in the bank of the economic situation of a company: very good, good, satisfactory, poor, very poor can, for example, be transformed into a number of scale: 1, 2, 3, 4, 5. Here one must be certain that every monotone transformation, i.e., where the order remains consistent, produces a different numerical code that can be used with the same justification. Instead of 1, 2, 3, 4, 5 one could also use 0, 1, 3, 6, 10 for instance. Using parametric applications such as the logistic regression one should specify the arbitrary setting of a numerical scale for the hierarchical characteristics. Through a monotone transformation of the scale better estimates can eventually be obtained for the default probabilities. Adequately flexible nonparametric and semi-parametric applications, in contrast, choose automatically a suitable scale.
In order to estimate the default probability of a credit, given
the information available at the time the decision is made, we
assume a random sample
is
independent, identically distributed.
stands for the information available at the time the credit is
issued to the
-th customer, where qualitative characteristics
are already transformed into numerical values as described above.
is the indicator variable of the credit: it
has a value of 0 when the loan can be paid back without any
problems and 1 when the credit partially or completely defaulted.
The default probability that is to be estimated is the conditional
probability that
given
Since only takes on the values between 0 and 1 given
that it is a probability, linear regression models cannot be used
for the function estimator. The class of generalized linear models
(GLM) can, however, be used to estimate the
probabilities. Here it is assumed that