10.5 Computational Results

The most significant predictors suggested by the discriminant analysis belong to profit and leverage ratios. To demonstrate the ability of an SVM to extract information from the data, we will chose two ratios from these groups: NI/TA from the profitability ratios and TL/TA from the leverage ratios. The SVMs, besides their Lagrangian formulation, can differ in two aspects: (i) their capacity that is controlled by the coefficient $ C$ in (10.12) and (ii) the complexity of classifier functions controlled in our case by the anisotropic radial basis in the Gaussian kernel transformation.

Triangles and squares in Figures 10.4-10.7 represent successful and failing companies from the training set, respectively. The intensity of the gray background corresponds to different score values $ f$. The darker the area, the higher the score and the greater is the probability of default. Most successful companies lying in the bright area have positive profitability and a reasonable leverage TL/TA of around $ 0.4$, which makes economic sense.

Figure 10.4 presents the classification results for an SVM using locally near linear classifier functions (the anisotropic radial basis is $ 100\Sigma ^{1/2}$) with the capacity fixed at $ C=1$. The discriminating rule in this case can be approximated by a linear combination of predictors and is similar to that suggested by discriminant analysis, although the coefficients of the predictors may be different.

If the complexity of classifying functions increases (the radial basis goes down to $ 2\Sigma ^{1/2}$) as illustrated in Figure 10.5, we get a more detailed picture. Now the areas of successful and failing companies become localized. If the radial basis is decreased further down to $ 0.5\Sigma ^{1/2}$ (Figure 10.6), the SVM will try to track each observation. The complexity in this case is too high for the given data set.

Figure 10.7 demonstrates the effects of high capacities ($ C=300$) on the classification results. As capacity is growing, the SVM localizes only one cluster of successful companies. The area outside this cluster is associated with approximately equally high score values.

Figure 10.4: Ratings of companies in two dimensions. The case of a low complexity of classifier functions, the radial basis is $ 100\Sigma ^{1/2}$, the capacity is fixed at $ C=1$.
\includegraphics[width=1.00\defpicwidth]{_r100c1.ps}

Figure 10.5: Ratings of companies in two dimensions; the case of an average complexity of classifier functions, the radial basis is $ 2\Sigma ^{1/2}$, the capacity is fixed at $ C=1$.
\includegraphics[width=1.0\defpicwidth]{_r2c1.ps}

Figure 10.6: Ratings of companies in two dimensions; the case of an excessively high complexity of classifier functions, the radial basis is $ 0.5\Sigma ^{1/2}$, the capacity is fixed at $ C=1$.
\includegraphics[width=1.0\defpicwidth]{_r05c1.ps}

Figure 10.7: Ratings of companies in two dimensions; the case of a high capacity ($ C=300$). The radial basis is fixed at $ 2\Sigma ^{1/2}$.
\includegraphics[width=1.0\defpicwidth]{_r2c300.ps}

Figure 10.8: Power (Lorenz) curve (Lorenz, 1905) - the cumulative default rate as a function of the percentile of companies sorted according to their score - for the training set of companies. An SVM is applied with the radial basis $ 2\Sigma ^{1/2}$ and capacity $ C=1$.
\includegraphics[width=1.0\defpicwidth]{_pcr2c1.ps}

Thus, besides estimating the scores for companies the SVM also managed to learn that there always exists a cluster of successful companies, while the cluster for bankrupt companies vanishes when the capacity is high, i.e. a company must possess certain characteristics in order to be successful and failing companies can be located elsewhere. This result was obtained without using any additional knowledge besides that contained in the training set.

The calibration of the model or estimation of the mapping $ f \rightarrow {\rm PD}$ can be illustrated by the following example (the SVM with the radial basis $ 2\Sigma ^{1/2}$ and capacity $ C=1$ will be applied). We can set three rating grades: safe, neutral and risky which correspond to the values of the score $ f<-0.0115$, $ -0.0115<f<0.0115$ and $ f>0.0115$, respectively, and calculate the total number of companies and the number of failing companies in each of the three groups. If the training set were representative of the whole population of companies, the ratio of failing to all companies in a group would give the estimated probability of default. Figure 10.8 shows the power (Lorenz) curve (Lorenz; 1905) - the cumulative default rate as a function of the percentile of companies sorted according to their score - for the training set of companies. For the abovementioned three rating grades we derive $ {\rm PD_{safe}}=0.24$, $ {\rm PD_{neutral}}=0.50$ and $ {\rm PD_{risky}}=0.76$.

If a sufficient number of observations is available, the model can also be calibrated for finer rating grades such as AAA or BB by adjusting the score values separating the groups of companies so that the estimated default probabilities within each group equal to those of the corresponding rating grades. Note, that we are calibrating the model on the grid determined by $ {\rm grad}{\rm (f)}=0$ or $ {\rm grad}\hat{\rm (PD)}=0$ and not on the orthogonal grid as in the Moody's RiskCalc model. In other words, we do not make a restrictive assumption of an independent influence of predictors as in the latter model. This can be important since, for example, the same decrease in profitability will have different consequences for high and low leveraged firms.

For multidimensional classification the results cannot be easily visualized. In this case we will use the cross-validation technique to compute the percentage of correct classifications and compare it with that for the discriminant analysis (DA). Note that both most widely used methods - the discriminant analysis and logit regression - choose only one significant at the 5% level predictor (NI/TA) when forward selection is used. Cross-validation has the following stages. One company is taken out of the sample and the SVM is trained on the remaining companies. Then the class of the out-of-the-sample company is evaluated by the SVM. This procedure is repeated for all the companies and the percentage of correct classifications is calculated.

The best percentage of correctly cross-validated companies (all available ratios were used as predictors) is higher for the SVM than for the discriminant analysis (62% vs. 60%). However, the difference is not significant at the 5% level. This indicates that the linear function might be considered as an optimal classifier for the number of observations in the data set we have. As for the direction vector of the separating hyperplane, it can be estimated differently by the SVM and DA without affecting much the accuracy since the correlation of underlying predictors is high.

Cluster center locations, as they were estimated using cluster analysis, are presented in Table 10.4. The results of the cluster analysis indicate that two clusters are likely to correspond to successful and failing companies. Note the substantial differences in the interest coverage ratios, NI/TA, EBIT/TA and TL/TA between the clusters.


Table 10.4: Cluster centre locations. There are 19 members in class {-1} - successful companies, and 65 members in class {1} - failing companies.
Cluster {-1} {1} . 
EBIT$ /$TA 0 .263 0 .015
NI$ /$TA 0 .078 -0 .027
EBIT$ /$S 0 .313 -0 .040
EBIT$ /$INT 13 .223 1 .012
TD$ /$TA 0 .200 0 .379
TL$ /$TA 0 .549 0 .752
SIZE 15 .104 15 .059
QA$ /$CL 1 .108 1 .361
CASH$ /$TA 0 .047 0 .030
WC$ /$TA 0 .126 0 .083
CA$ /$CL 1 .879 1 .813
STD$ /$TD 0 .144 0 .061
S$ /$TA 1 .178 0 .959
INV$ /$COGS 0 .173 0 .155