6.3 Testing the SIM

Given the computational expense of estimating single index models, it is desirable to know whether the distributional flexibility of these models justifies the extra computational cost. This implies that the performance of a specified parametric model must be compared with that of an estimated single index model as given in (5.9). To this end, Horowitz & Härdle (1994) designed a test that considers the following hypotheses:

$\displaystyle H_0$ $\displaystyle :$ $\displaystyle E(Y\vert{\boldsymbol{X}}={\boldsymbol{x}}) = G({\boldsymbol{x}}^\top{\boldsymbol{\beta}})$  
$\displaystyle H_1$ $\displaystyle :$ $\displaystyle E(Y\vert{\boldsymbol{X}}={\boldsymbol{x}}) = g({\boldsymbol{x}}^\top{\boldsymbol{\beta}})$ (6.26)

Here $ G$ (the link under $ H_0$) is a known function and $ H$ (the link under $ H_1$) an unspecified function. For example, the null hypothesis could be a logit model and the alternative a semiparametric model of SIM type.

The main idea that inspires the test relies on the fact that if the model under the null is true then a nonparametric estimation of $ E(Y\vert{\boldsymbol{X}}^\top\widehat{\boldsymbol{\beta}}= v)$ gives a correct estimate of $ F(v)$. Thus, the specification of the parametric model can be tested by comparing the nonparametric estimate of $ E(Y\vert{\boldsymbol{X}}^\top\widehat{\boldsymbol{\beta}}= v)$ with the parametric fit using the known link $ G$.

The test statistic is defined as

$\displaystyle T= \sqrt {h} \sum\limits_{i=1}^n w({\boldsymbol{X}}_i\top\widehat...
...ymbol{\beta}}) - G({\boldsymbol{X}}_i^\top\widehat{\boldsymbol{\beta}})\right\}$ (6.27)

where $ \widehat{g}_{-i}(\bullet)$ is a leave-one-out Nadaraya-Watson estimate for the regression of $ Y$ on the estimated index values, $ h$ is the bandwidth used in the kernel regression. $ w(\bullet )$ is a weight function that downweights extreme observations. In practice the weight function is defined as such that it considers only $ 90\%$ or $ 95\%$ of the central range of the index values values of $ {\boldsymbol{X}}_i^\top\widehat{\boldsymbol{\beta}}$. Horowitz & Härdle (1994) propose to take $ \widehat{\boldsymbol{\beta}}$, the estimate under $ H_0$. That is, the same index values $ {\boldsymbol{X}}_i^\top\widehat{\boldsymbol{\beta}}$ are used to compute both the parametric and the semiparametric regression values.

Let us take a closer look at the intuition behind this test statistic. The first difference term in the sum measures the deviation of the estimated regression from the true realization, that is it measures $ Y_i-E(Y\vert{\boldsymbol{X}}_i)$. If $ H_0$ holds, then this measure ought to be very small on average. If, however, the parametric model under the null fails to replicate the observed values $ Y_i$ well, then $ T$ will increase. Obviously, we reject the hypothesis that the data were generated by a parametric model if $ T$ becomes unplausibly large.

The second difference term measures the distance between the regression values obtained under the null and under the semiparametric alternative. Suppose the parametric model captures the characteristics of the data well so that $ Y_i - G( {\boldsymbol{X}}_i^\top\widehat{\boldsymbol{\beta}})$ is small. Then even if the semiparametric link deviates considerably from the parametric alternative on average, these deviations will be downweighted by the first difference term. Seen differently, the small residuals of the parametric fit are blown up by large differences in the parametric and semiparametric fits, $ \widehat{g}_{-i}({\boldsymbol{X}}_i^\top\widehat{\boldsymbol{\beta}}) - G({\boldsymbol{X}}_i^\top\widehat{\boldsymbol{\beta}})$. Thus, if $ H_0$ is true, the residuals should be small enough to accommodate possible strong differences in the alternative fits. Again, a small statistic will lead to maintaining the null hypothesis.

It can be shown that under $ H_0$ and under some suitable regularity conditions $ T$ is asymptotically distributed as a $ N(0,\sigma_T^2)$ where $ \sigma_T^2$ denotes the asymptotic sampling variance of the statistic.

For related presentations of the topic we refer to Horowitz (1993), Horowitz (1998b) and Pagan & Ullah (1999, Chapter 7).

There is a large amount of literature that investigates the efficiency bound for estimators in semiparametric models. Let us mention Begun et al. (1983) and Cosslett (1987) as being two of the first references. Newey (1990) and Newey (1994) are more recent articles. The latter treats the variance in a very general and abstract way. A comprehensive resource for efficient estimation in semiparametric models is Bickel et al. (1993).

The idea of using parametric objective functions and substituting unknown components by nonparametric estimates has first been proposed in Cosslett (1983). The maximum score estimator of Manski (1985) and the maximum rank correlation estimator from Han (1987) are of the same type. Their resulting estimates are still very close to the parametric estimates. For that reason the SLS method of Ichimura (1993) may outperform them when the parametric model is misspecified.

The pseudo maximum likelihood version of the SLS was found independently by Weisberg & Welsh (1994). They present it as a straightforward generalization of the GLM algorithm and discuss numerical details. A different idea for adding nonparametric components to the the maximum likelihood function is given by Gallant & Nychka (1987). They use a Hermite series to expand the densities in the objective function.

The different methods presented in this chapter have been compared in a simulation study by Bonneu et al. (1993). They also include a study of Bonneu & Delecroix (1992) for a slightly modified pseudo likelihood estimator.

An alternative ADE method (without weight function) was proposed by Härdle & Stoker (1989). This estimator shares the asymptotic properties of the weighted ADE, but requires for practical computation a trimming factor to guarantee that the estimated density is bounded away from zero.

EXERCISE 6.1   Discuss why and how WSLS could also be motivated by considering the log-likelihood.

EXERCISE 6.2   Recall the PMLE for the binary response model. After equation (6.11) we introduced the restriction $ E(Y\vert{\boldsymbol{X}}) = E\{Y\vert v_{\boldsymbol{\beta}}({\boldsymbol{X}})\}$. Discuss the issue of heteroscedasticity under this restriction.

EXERCISE 6.3   The main interest in this chapter has been the estimation of the parametric part. What kind of conditions are common (and typical) for the nonparametric components in estimation, respectively the substitutes used for the unknown terms? What are they good for?

EXERCISE 6.4   Show that the ADE approach without using a weight function (or equivalently $ w(\bullet)\equiv 1$) leads to the estimation of $ E \{\gradi_f({\boldsymbol{T}})m({\boldsymbol{T}})/ f({\boldsymbol{T}}) \}$.


Summary
$ \ast$
A single index model (SIM) is of the form

$\displaystyle E(Y\vert{\boldsymbol{X}})=m({\boldsymbol{X}})=g\left\{ v_{\boldsymbol{\beta}}({\boldsymbol{X}}) \right\},$

where $ v_{\boldsymbol{\beta}}(\bullet)$ is an up to the parameter vector $ {\boldsymbol{\beta}}$ known index function and $ g(\bullet)$ is an unknown smooth link function. In most applications the index function is of linear form, i.e., $ v_{\boldsymbol{\beta}}({\boldsymbol{x}})={\boldsymbol{x}}^\top {\boldsymbol{\beta}}).$
$ \ast$
Due to the nonparametric form of the link function, neither an intercept nor a scale parameter can be identified. For example, if the index is linear, we have to estimate

$\displaystyle E(Y\vert{\boldsymbol{X}})=g\left\{ {\boldsymbol{X}}^\top{\boldsymbol{\beta}}\right\}.$

There is no intercept parameter and $ {\boldsymbol{\beta}}$ can only be estimated up to unknown scale factor. To identify the slope parameter of interest, $ {\boldsymbol{\beta}}$ is usually assumed to have one component identical to 1 or to be a vector of length 1.
$ \ast$
The estimation of a SIM usually proceeds in the following steps: First the parameter $ {\boldsymbol{\beta}}$ is estimated. Then, using the index values $ \eta_i={\boldsymbol{X}}_i^\top\widehat{\boldsymbol{\beta}}$, the nonparametric link function $ g$ is estimated by an univariate nonparametric regression method.
$ \ast$
For the estimation of $ {\boldsymbol{\beta}}$, two approaches are available: Iterative methods as semiparametric least squares (SLS) or pseudo-maximum likelihood estimation (PMLE) and direct methods as (weighted) average derivative estimation (WADE/ADE). Iterative methods can easily handle mixed discrete-continuous regressors but may need to employ sophisticated routines to deal with possible local optima of the optimization criterion. Direct methods as WADE/ADE avoid the technical difficulties of an optimization but do require continuous explanatory variables. An extension of the direct approach is only possible for a small number of additional discrete variables.
$ \ast$
There are specification tests available to test whether we have a GLM or a true SIM.