6.2 Data Preparation


6.2.1 General

All estimation quantlets in the gplm quantlib have as input parameters:

x
A $ n \times p$ matrix containing observations of explanatory variables for the linear part,
t
A $ n\times q$ matrix containing observations of explanatory variables for the nonparametric part,
y
A $ n \times 1 $ vector containing the observed responses.
There should be no vector of 1 concatenated to the matrix x. A constant is contained automatically in the nonparametric estimate for $ m(\bullet)$. Neither the matrices x, t nor the vector y should contain missing values (NaN) or infinite values (Inf,-Inf).


6.2.2 Credit Scoring Example

In the following, we will use credit scoring data to illustrate the GPLM estimation. For details on the file kredit.dat see Fahrmeir and Tutz (1994) or Fahrmeir and Hamerle (1984). We use a subsample on loans for cars and furniture, which has a sample size of $ n=564$ out of 1000.


Table 6.1: Descriptive statistics for credit data.
    Yes No (in %)  
$ Y$ credit worthy 75.7 24.3    
$ X_1$ previous credits o.k. 36.2 63.8    
$ X_2$ employed 77.0 23.0    
    Min Max Mean S.D.
$ X_3$ duration (months) 4 72 20.90 11.41
$ T_1$ amount (DM) 338 15653 3200.00 2467.30
$ T_2$ age (years) 19 75 34.46 10.96


Descriptive statistics for this subsample and a selection of covariates can be found in Table 6.1. The covariate previous credit o.k. indicates that previous loans were repaid without problems. The variable employed means that the person taking the loan has been employed by the same employer for at least one year.

The following XploRe code creates matrices x, t and y

  library("stats")
  file=read("kredit")  
  file=paf(file,(file[,5]>=1)&&(file[,5]<=3)) 
                                     ; purpose=car/furniture
  y=file[,1]
  x=(file[,4]>2)                     ; previous loans o.k.
  x=x~(file[,8]>2)                   ; employed (>=1 year)
  x=x~(file[,3])                     ; duration of loan
  t=(file[,6])                       ; amount of loan
  t=t~(file[,14])                    ; age of client
  xvars="previous"|"employed"|"duration"
  tvars="amount"|"age"
  summarize(y~x~t,"y"|xvars|tvars)
11821 XAGgplm01.xpl

and produces the summary statistics:
  [ 2,]           Minimum   Maximum      Mean   Median   Std.Error
  [ 3,]          --------------------------------------------------
  [ 4,] y               0         1   0.75709        1     0.42922
  [ 5,] previous        0         1    0.3617        0     0.48092
  [ 6,] employed        0         1    0.7695        1     0.42152
  [ 7,] duration        4        72    20.902       18      11.407
  [ 8,] amount        338     15653      3200     2406      2467.3
  [ 9,] age            19        75    34.463       32      10.964
Note that in the following statistical analysis we took logarithms of amount and age and transformed these values linearly to the interval $ [0,1]$.