12.2 Multiple Index Models

The single index model of the previous sections has been extended to multiple index models in various ways. For instance, popular parametric models for data with multicategorical response variables (representing the choice of individuals among more than two alternatives) are of the multi-index form. In the Multinomial Logit model, the probability that an individual will choose alternative depends on characteristics of the individual through the indices $x^T\beta_{1},\ldots,x^T\beta_{J}:$

$\displaystyle P(Y=j\vert x=x)=\frac{\exp(x^T\beta_{j})} \ {1+\sum\limits_{k=1}^J \exp(x^T\beta_{k})}\,.$

(12.13)

Quantlets to estimate this and related models can be found in the glm library.

12.2.1 Sliced Inverse Regression

{edr, eigen} = sir (x, y, h): fits a multiple index model by the method of Sliced Inverse Regression

A semiparametric multiple index model is the (SIR) model considered in Li (1991). Given a response variable and a (random) vector $x\in\mathbb{R}^p$ of explanatory variables, SIR is based on the model:

$\displaystyle Y \ = \ m(x^T\beta_1,\dots,x^T\beta_k,\varepsilon),$

(12.14)

where $\beta_{1},\dots,\beta_{k}$ are unknown projection vectors,

is unknown and assumed to be less than

, $m:\mathbb{R}^{k+1}\to \mathbb{R}$ is an unknown function, and $\varepsilon$ is the noise random variable with $E\left(\varepsilon\vert X\right)=0$ .

Model (12.14) describes the situation where the response variable depends on the -dimensional variable only through the indices $x^T\beta_1,\dots,x^T\beta_k$ . The smaller is relative to , the better able is SIR to represent the -dimensional regression of on by a parsimonious -dimensional subspace. The unknown $\beta_1,\ldots,\beta_k,$ which span this space, are called effective dimension reduction directions (EDR directions). The span is referred to as the effective dimension reduction space (EDR space).

SIR tries to find this -dimensional subspace of $\mathbb{R}^p$ by considering the inverse regression (IR) curve, i.e. $E(x\vert Y).$ Under some weak assumptions on the joint distribution of the elements of , it can be shown that the centered inverse regression $E(x\vert Y)-E(x)$ lies in the subspace formed by the $\beta_j$ s. The $\beta_j$ s are found by an eigenvalue/eigenvector decomposition of the estimated covariance matrix of the vector $E(z\vert y),$ where is a standardized version of In XploRe , this is achieved by using the 28468 sir quantlet:

  {edr, eigen} = sir(x, y, h)

It takes as inputs the data (x and y) and the parameter is related to the ``slicing'' part of sliced inverse regression. The algorithm actually works on nonoverlapping intervals (slices) of the data. There are different ways to divide the data into slices and the value of has to be set accordingly. Three cases are distinguished:

In this case the absolute value of is computed and interpreted as giving the number of elements in each slice. The number of slices is calculated as $\textrm{ \htmladdimg{xpl_3.gif} \htmladdnormallink{\tt floor}{../help/floor.html} }(N / h).$
In this case is interpreted as the number of slices. The slices are constructed such that they have equal width.
In this case is understood to give the percentage of the range of to be allocated to each slice. The number of slices is calculated as $\textrm{ \htmladdimg{xpl_3.gif} \htmladdnormallink{\tt floor}{../help/floor.html} }(1/h).$

SIR shares with other semiparametric procedures in that there is no clear-cut way of choosing the parameter

(which is not a bandwidth here). It is a good idea to try different values of

and see how the results are affected.

The outputs of 28477 sir are edr, the matrix containing estimates of the effective dimension reduction directions (i.e. the $\widehat{\beta}_j$ s) and the eigenvalues of the estimated covariance matrix of the vector $E(z\vert Y),$ $\widehat{\mathop{\hbox{Cov}}}\{E(z\vert Y)\}.$

28482 sir2 provides an alternative to 28485 sir . Usage of 28488 sir2 is very similar to 28491 sir but the details of the algorithm as well as the underlying theory are different. For further details on SIR II, see Li (1991).

12.2.2 Testing Parametric Multiple Index Models

Parametric multiple index models like the multinomial logit model (12.13) are based on rather strong functional form assumptions. If these assumptions are not valid, then maximum-likelihood estimators of the parameters of these models are inconsistent.

XploRe provides a procedure to test the functional form assumptions of parametric multiple index models:

{t, p} = hhmult (vhat, y, yhat, h): tests a parametric polychotomous response model against a semiparametric alternative

The function 28694 hhmult is a generalization of the HH-test from Subsection 12.1.7. See Werwatz (1997) for details. 28697 hhmult can only be applied for models with a multicategorical dependent variable. The null hypothesis corresponds to a parametric model such as the multinomial logit model (12.13). You first have to estimate this model (in the case of the multinomial logit model you can use 28700 glmmultlo ) and save the estimated indices in the matrix vhat and the predicted probabilities in the matrix yhat. Along with vhat and yhat, you provide the observations on the dependent variable and a vector of bandwidths h.

28703 hhmult expects that you convert the observations on the multicategorical dependent variable into a set of dummy variables, one dummy variable per category of The bandwidths are used for the nonparametric regressions of the dummy dependent variables on the estimated indices vhat. These nonparametric regressions provide an estimate of the parametric link functions, which take the following form in the multinomial logit model:

$\displaystyle F_j(v_1,\ldots,v_J)=\frac{\exp(v_{j})} {1+\sum\limits_{k=1}^J \exp(v_{k})}\,.$

(12.15)

If the parametric model is correct, then the nonparametric estimate should differ from the link function implied by the parametric model only by random sampling error. If, however, the parametric model is misspecified, then there should be significant differences between the hypothesized and estimated link functions.

All these differences are summarized into one test statistic which, asymptotically, follows a $\chi^2$ distribution. 28706 hhmult returns the value of the test statistic as well as the associated -value.