21.3 Credit Ratings with Neural Networks

As with nonparametric fitting of financial time series models to the data, the neural network also provides an alternative to local smoothing, as with LP method, in estimating default probabilities. The logistic regression function $ \pi (x) = \psi (\beta_0 +
\beta^\top x) $ is nothing more than a function defined by a neural network with only one neuron in a hidden layer, when the logistic function $ \psi$ is chosen as a transfer function. Through the combination of several neurons in one or more hidden layers default probabilities can be estimated, as with the nonparametric regression analysis, with flexibility. In order to obtain estimates between 0 and 1, it is necessary to represent the function $ \nu _H (x; \delta)$ given by (18.1), for example, with a function $ G$ over the interval [0,1]. We restrict ourselves to one hidden layer with $ H$ neurons and choose $ G=
\psi,$ so that the default probability given by the neuron network has the form

$\displaystyle \pi_H (x; \vartheta) = \psi (v _0 + \sum^ H_{h=1} v _h \, \psi (w_{oh} +
\sum^ d_{j=1} w_{ih} x_i )), $

where $ \vartheta$ represents once again the parameter vector built from $ v_h$ and $ w_{ih},\ 0 \le i
\le d,\ 0 \le h \le H$. To estimate the network weights from the data we will not use the least squares method, which makes sense for the regression model with normally distributed residuals, but instead we will maximize the log-likelihood function

$\displaystyle \log L(\vartheta) = \sum ^ n_{j=1} [Y_j \log \pi _H (X_j; \vartheta) + ( 1-Y_j)
\log \{ 1-\pi_H (X_j; \vartheta) \} ] $

following the procedure used in the logistic regression. By substituting in the estimator $ \hat{\vartheta}_n$ we obtain an estimator for the default probability

$\displaystyle \hat{\pi} (x) = \pi _H (x; \hat{\vartheta}_n). $

In order to obtain an especially simple model with fewer parameters,  Anders (1997) trivially modified the method for the insolvency prognoses for small and middle sized firms and assume a default probability of the form

$\displaystyle \pi _H^ l (x; \vartheta) = \psi ( \beta ^\top x + v_0 + \sum^ H_{h=1} v_h \psi (w_{oh} + \sum^
d_{i=1} w_{ih} x_i)), $

which has obvious similarities to the general partial linear model, besides the fact that here a part of or all of the influential variables, i.e., the coordinates of $ x,$ can appear in linear as well as in nonparametric portions. The linear term $ \beta ^\top x$ can be interpreted as the value of an additional neuron whose transfer function is not the logistic function $ \psi (t),$ but the identity $ f(t) \stackrel{\mathrm{def}}{=}t$. Estimating the network from the application of a model selection technique used to find the insolvency probability is surprisingly easy. It contains in addition to a linear term only one single neuron $ (H=1).$ From the 6 input variables only 4 contribute to the linear part (Age of the business, sales development, indicator for limited liability, dummy variable for processed business), that means the other two coefficients $ \beta _i$ are 0, and only 3 (Dummy variables for processed business and for trade, indicator variable for educational degree of entrepreneur) contribute to the sigmoid part, that means the corresponding weights $ w_{i1}$ are 0. With this simple model using a validation data set, which is not used to estimate the parameters, a ratio of the correct identifications of 83.3 % was obtained for the insolvencies and of 63.3 % for the solvent firms.