3.4 Number of E.D.R. Directions

Methods have been proposed for the determination of the number of the e.d.r. directions. See, for example, Li (1992), Schott (1994), and Cook (1998). Their approaches are based on the assumption of symmetry of the distribution of the explanatory variable $ X$. We now extend the cross-validation method (Cheng and Tong (1992); Yao and Tong (1994)) to solve the above problem, having selected the explanatory variables using the referenced cross-validation method. A similar extension may be effected by using the approach of Auestad and Tjøstheim (1990), which is asymptotically equivalent to the cross-validation method

Suppose that $ \beta_1, \cdots, \beta_D $ are the e.d.r. directions, i.e. $ y = {\sl g}(\beta_1^{\top }X, \cdots,
\beta_D^{\top }X) + \varepsilon $ with $ \textrm{E}(\varepsilon\vert
X) = 0\ a.s. $. If $ D < p $, we can nominally extend the number of directions to $ p$, say $ \{ \beta_1, \cdots, \beta_{D}, \cdots,
\beta_p\}$, such that they are perpendicular to one another.

Now, the problem becomes the selection of the explanatory variables among $ \{ \beta_1^{\top } X, \cdots, \beta_p^{\top }X
\} $. However, because $ \beta_1, \cdots, \beta_p $ are unknown, we have to replace $ \beta_k $'s by their estimator $ \hat \beta_k
$'s. As we have proved that the convergence rate of $ \hat \beta_k
$'s is faster than that of the nonparametric function estimators, the replacement is justified.

Let $ \hat a_{d0,j} , \hat a_{d1,j}, \cdots \hat a_{dd,j} $ be the minimizers of

$\displaystyle \sum_{i=1,\ i \neq j}^n\{ y_i - a_{d0,j} - a_{d1,j}\hat
\beta_1^{...
... X_j) - \cdots - a_{dd,j}\hat
\beta_d^{\top } (X_i - X_j) \}^2 K^{(i,j)}_{d,h},$     (3.35)

where $ K^{(i,j)}_{d,h} = K_{h}(\hat \beta_1^{\top } (X_i -
X_j), \cdots, \hat \beta_d^{\top } (X_i - X_j)) $. Let
$\displaystyle CV(d) = n^{-1}\sum_{j=1}^n \{ y_j - \hat a_{d0,j}\}^2,\quad d = 1,\cdots, p.$      

We estimate the number of e.d.r. as
$\displaystyle \hat d = \textrm{arg}\min_{1\le d\le p} CV(d).$      

THEOREM 3.4   Suppose that the assumptions of (C1)-(C6) (in Appendix 3.9) hold. Under model (3.4), we have
$\displaystyle \lim_{n\to \infty} P(\hat d = D ) = 1.$      

In theory, we ought to select the explanatory variables among all possible combinations of $ \{ \beta_1^{\top } X, \cdots, \beta_p^{\top }X
\} $. However, in practice because $ \{ \hat
\beta_1, \cdots, \hat \beta_d \} $ have been ordered according to their contributions (see the algorithm in the next section), we need only calculate $ CV(d), d = 1, \cdots, p, $ and compare their values.

After determining the number of directions, which is usually less than $ p$, we can then search for the e.d.r. directions on a lower dimensional space, thereby reducing the effect of high dimensionality and improving the accuracy of the estimation. Denote the corresponding estimate of $ B_0
$ by $ \hat B: p\times
\hat d $. Let

$\displaystyle \tilde w_{ij} = K_h(\hat B^{\top }(X_i - X_j))/\sum_{\ell=1}^n
K_h(\hat B^{\top }(X_\ell - X_j))$     (3.36)

Re-estimate $ B_0
$ by the minimization in (3.18) with weights $ \tilde w_{ij} $ replacing $ w_{ij} $. By an abuse of notation, we denote the new estimator of $ B_0
$ by $ \hat B $ too. Replace $ \hat B $ in (3.36) by the latest $ \hat B $ and estimate $ B_0
$. Repeat this procedure until $ \hat B $ converges. Let $ \tilde B $ be the final estimator. We call it the refined MAVE (rMAVE) estimator.

THEOREM 3.5   Suppose that (C1)-(C6) (in Appendix 3.9) hold. Assume that model (3.4) is true and $ {\sl g}(\cdot) $ has derivatives of all order. Let $ r = 1 $. If $ nh^D/\log n \to \infty
$, $ h \to 0 $, then
$\displaystyle \Vert(I-\tilde B\tilde B^{\top })B_0\Vert = O_P(h^3 + h\delta_n +
h^{-1}\delta_n^2+n^{-1/2}).$      

To illustrate, we apply the procedure to models (3.20) and (3.21) to see the improvement of the rMAVE method. The mean of the estimation absolute errors against the bandwidth is shown in Figure 3.6. The improvement is significant.

Figure 3.6: (a) and (b) are the simulation results for model (3.20) and model (3.21) respectively. The curves are the means of the estimation errors using the rMAVE method. The dash lines are the smallest means of estimation errors using MAVE among all possible choice of bandwidths. The asterisks refer to the errors when using the bandwidth chosen by the cross-validation method
\includegraphics[width=1.2\defpicwidth]{d_g2.ps}

As a special case, we can estimate the direction in a single-index model by the rMAVE method (with $ r = 1 $). The root-$ n$ consistency can be achieved even if we use the bandwidth $ h \sim
n^{-1/5} $. A similar result was obtained by Härdle, Hall, and Ichimura (1993) by minimizing the sum of squares of the residuals simultaneously with respect to the direction and the bandwidth.