As we have discussed above, the e.d.r. directions can be obtained
from the relevant outer product of gradients. Further, the
proposed OPG method can achieve root- consistency. Unlike the
SIR method, the OPG method does not need strong assumptions on the
design
and can be used for more complicated models.
However, its estimators still suffer from poor performance when a
high-dimensional kernel is used in (3.12). Now, we
discuss how to improve the OPG method.
Note that all the existing methods adopt two separate steps to
estimate the directions. First estimate the regression function
and then estimate the directions based on the estimated regression
function. See for example Hall (1984), Härdle and Stoker (1989),
Carroll et al. (1997) and the OPG method above. It
is therefore not surprising that the performance of the direction
estimator suffers from the bias problem in nonparametric
estimation. Härdle, Hall, and Ichimura (1993) noticed this point
and estimated the bandwidth and the directions simultaneously in
a single-index model by minimizing the sum of squares of the
residuals. They further showed that the optimal bandwidth for the
estimation of the regression function in the sense of MISE enables
the estimator of the direction to achieve root- consistency.
Inspired by this, we propose to estimate the direction by
minimizing the mean of the conditional variance simultaneously
with respect to the regression function and the directions. As we
shall see, similar results as Härdle, Hall, and Ichimura (1993)
can be obtained and the improvement over the OPG method achieved.
In this subsection, we investigate the relation between and
. The idea was proposed by Xia et al. (2002).
Consider model (3.4). For any orthogonal matrix
, the conditional variance given
is
![]() |
![]() |
![]() |
![]() |
Based on (3.5), (3.13), and (3.15),
we can estimate the e.d.r. directions by solving the following
minimization problem.
where
. The MAVE method or the
minimization in (3.16) can be seen as a combination of
nonparametric function estimation and direction estimation, which
minimizes (3.16) simultaneously with respect to the
directions and the nonparametric regression function. As we shall
see, we benefit from this simultaneous minimization.
Note that the weights depend on . Therefore, to implement the
minimization in (3.16) is non-trivial. The weight
in (3.14) should be chosen such that the
value of
is proportional to the difference between
and
. Next, we give two choices of
.
(1) Multi-dimensional kernel. To simplify (3.16), a
natural choice is
. If our primary interest is on dimension reduction, this
multidimensional kernel will not slow down the convergence rate in
the estimation of the e.d.r. directions. This was first
observed by Härdle and Stoker (1989). See also Theorem 3.1.
For such weights, the right hand side of (3.16) does not
tend to
. However, we have
![]() |
|||
![]() |
Note that
is a projection matrix. The bias term on
the right hand side above is asymptotically non-negative.
Therefore, by the law of large numbers and Lemma 3.4,
the minimization problem (3.16) depends mainly on
The root- consistency for the MAVE estimator of e.d.r. directions
with sufficiently large order
can also be proved.
Besides the difference between the MAVE method and the other
methods as stated at the beginning of this section, we need to
address another difference between the multi-dimensional kernel
MAVE method and the OPG method or the other existing estimation
methods. The MAVE method uses the common e.d.r directions as the
prior information. Therefore, it can be expected that the MAVE
method outperforms the OPG method as well as other existing
methods. In order not to be distracted by the complexity of the
expressions using high-order local polynomial methods, we now
focus on the case
.
![]() |
Note that the convergence rate is of
if
we use the optimal bandwidth
of the regression function
estimation in the sense of MISE, in which case
. This is faster than the rate for the
other methods, which is of
. Note that the convergence
rate for the local linear estimator of the function is also
. As far as we know, if we use the optimal bandwidth
for the nonparametric function estimation in the sense
of MISE, then for all the non-MAVE methods, the convergence rate
for the estimators of directions is the same as that for the
estimators of the nonparametric functions. As a typical
illustration, consider the ADE method and the single-index model
. The direction
can be estimated as
![]() |
To illustrate this, consider the model
![]() |
(2) Inverse regression weight. If and
have an approximate 1-1 correspondence, then we can use
instead of
to produce the weights. As an example, suppose
and
is
invertible. Then we may choose
![]() |
This may be considered an alternative derivation of the SIR
method. Extension of (3.24) to more than one direction
can be stated as follows. Suppose that the first directions
have been calculated and are denoted by
respectively. To obtain the
th direction, we
need to perform
![]() |
![]() |
The result is similar to that of Zhu and Fang (1996). However, in
our simulations the method based on the minimization in
(3.25) always outperforms the SIR method. To
illustrate, we adopt the examples used in Li (1991),
![]() |
As noticed previously, the assumption of symmetry on the design can be a handicap as far as applications of the SIR method are
concerned. Interestingly, simulations show that the SIR method
sometimes works in the case of independent data even when the
assumption of symmetry is violated. However, for time series
data, we find that the SIR often fails. As a typical illustration,
consider the nonparametric times series model
![]() |
For sample size , we draw 200 samples from model
(3.28). Using the SIR method with different number of slices,
the mean of the estimated errors
is plotted in Figure 3.5 (b). The
estimation is quite poor. However, the iMAVE estimation gives
much better results, and the MAVE (
) method is even better.
Now, we make a comparison between the MAVE method and the iMAVE
method (or the SIR method). Besides the fact that the MAVE method
is applicable to an asymmetric design , the MAVE method (with
) has better performance than the iMAVE (or SIR) method
for the above models and for all the other simulations we have
done. We even tried the same models with higher dimensionality
. All our simulation results show that the iMAVE method
performs better than the SIR method and the MAVE method performs
better than both of them. Intuitively, we may think that the iMAVE
method and the SIR method should benefit from the use of
one-dimensional kernels, unlike the MAVE method, which uses a
multi-dimensional kernel. However, if the regression function
is symmetric about 0, then the SIR method and the iMAVE
method usually fails to find the directions. Furthermore, any
fluctuation in the regression function may reduce the efficiency
of the estimation. (To overcome the effect of symmetry in the
regression function, Li (1992) used a third moment method to
estimate the Hessian matrix. This method has, however, a larger
variance in practice). Another reason in addition to Theorem 3.2
why the iCMV method and the SIR method perform poorly may be the
following. Note that for the MAVE method
![]() |
|||
![]() |
Consider model (3.8). Note that is the solution of
the following minimization problem
![]() |
![]() |
![]() |
As in the nonparametric case, the choice of is
important. However, the model is now more complicated. Even if the
's are monotonic functions, we can not guarantee a 1-1
correspondence between
and
. Therefore a possible choice
is the multi-dimensional kernel, i.e.
. To improve the accuracy,
we can also use a higher order local polynomial smoothing since
![]() |
|||
![]() |
![]() |
|||
![]() |
Now, returning to the general model (3.2), suppose
is differentiable. Let
,
. By
(3.33), for
close to
we
have
![]() |
|||
![]() |
![]() |
|||
![]() |