3.5 The Algorithm

In this section, we first give some guidance on the selection of the bandwidth, the estimation of e.d.r. directions and the determination of the number of e.d.r. directions. Then we give an algorithm for the calculation of the e.d.r. directions.

We first standardize the original data. Write $ X_i = (x_{i1},
\cdots, x_{ip})^{\top },\ i = 1, 2, \cdots, n$. Let $ \bar X =
(\bar x_1, \cdots, \bar x_p)^{\top } = n^{-1}\sum_{i=1}^n X_i $ and $ x_{i,k} := ( x_{i,k} - \bar x_k)$ $ /$ $ ( \sum_{l=1}^n (
x_{l,k} - \bar x_k)^2/n)^{1/2} $, $ k = 1, 2, \cdots, p $ and $ i
= 1, 2, \cdots, n$. We use the cross-validation method to select the bandwidth $ h$. For each $ j$, let $ \hat a_{h,j} $ and $ \hat
b_{h,j}$ be the arguments of

$\displaystyle \min_{a_{h, j}, b_{h, j}} n^{-1}\sum_{i=1}^n \{ y_i - a_{h,j} -
b_{h,j}^{\top }(X_i - X_j)\}^2 K_{h, i}(X_j)$      

The bandwidth is then chosen as
$\displaystyle h_{cv} = \textrm{arg}\min_{h} \sum_{j=1}^n \{ y_j - \hat a_{h,j} \}^2.$      

With the bandwidth $ h_{cv} $, we now proceed to the calculation of the e.d.r. directions below by reference to the minimization (3.16). By (3.17) and Theorem 3.2, we can estimate the e.d.r. directions using the backfitting method. To save space, we give here the details for model (3.4) to illustrate the general idea. For any $ d $, let $ {{\cal B}}= (\beta_1,
\cdots, \beta_d) $ with $ \beta_1= \beta_2=\cdots=\beta_d = 0 $ as the initial value and $ {{\cal B}}_{l,k} = (\beta_1, \cdots, \beta_{k-1})
$ and $ {{\cal B}}_{r,k} = (\beta_{k+1},\cdots, \beta_d)$, $ k = 1, 2,
\cdots, d$. Minimize
    $\displaystyle S_{n,k} = \sum_{j=1}^n \sum_{i=1}^n \Big[ y_i - a_j - (X_i -
X_j)...
...{r,k})
\left( \begin{array}{c} c_j\\ d_j\\ e_j \end{array}\right)\Big]^2 w_{ij}$  
    $\displaystyle \textrm{subject to}:\quad {{\cal B}}^{\top }_{l,k} b = 0 \textrm{ and }
{{\cal B}}^{\top }_{r,k} b = 0,$  

where $ c_j $ is a $ (k-1)\times1$ vector, $ d_j $ a scalar and $ e_j $ a $ (d-k)\times 1$ vector. This is a typical constrained quadratic programming problem. See, for example, Rao (1973, p. 232). Let
    $\displaystyle C_j = \sum_{i=1}^n w_{ij} (X_i - X_j),\quad D_j = \sum_{i=1}^n w_{ij} (X_i - X_j)(X_i - X_j)^{\top },$  
    $\displaystyle E_j = \sum_{i=1}^n w_{ij} y_i, \quad F_j = \sum_{i=1}^n w_{ij} (X_i - X_j)y_i.$  

With $ b$ given, $ (a_j, c_j, d_j, e_j)$ which minimizes $ S_{n,k+1} $ is given by
$\displaystyle \left(\begin{array}{c} a_j\\ c_j \\ d_j\\ e_j\end{array}\right)
=...
...})^{\top } D_j ({{\cal B}}_{l,k}, b, {{\cal B}}_{r,k})
\end{array} \right)^{-1}$      
\begin{displaymath}\times \left(
\begin{array}{c} E_j\\ ({{\cal B}}_{l,k}, b, {{\cal B}}_{r,k})^{\top } F_j\end{array} \right),\end{displaymath}     (3.37)

$ j = 1,\cdots, n. $ If $ a_j, c_j $, $ d_j $ and $ e_j $ are given, then the $ b$ which minimizes $ S_{n,k+1} $ is given by
$\displaystyle \left(\begin{array}{c}b\\ \lambda\end{array}\right)
= \left(\begi...
...k}
\left(\begin{array}{c}c_j\\ e_j\end{array}\right)\}\\
0
\end{array}\right),$     (3.38)

where $ \tilde {{\cal B}}_k = ({{\cal B}}_{l,k}, {{\cal B}}_{r,k}) $ and $ A^+ $ denotes the Moore-Penrose inverse of a matrix $ A$. Therefore, we can then minimize $ S_{n,k+1} $ iteratively as follows.
0.
initial $ {{\cal B}}= 0 $ and $ k = 1$;
1.
initialize $ b$ such that $ {{\cal B}}_{l,k}^{\top } b= 0$, $ {{\cal B}}_{r,k}^{\top } b= 0$ and $ \Vert b\Vert = 1$ and repeat the following steps
a.
Calculate $ (a_j, c_j, d_j, e_j)$ as in (3.37);
b.
Calculate $ b$ as in (3.38) and let $ b := b/\Vert b\Vert;
$
2.
replace $ {{\cal B}}$ by $ ({{\cal B}}_{l,k}, b, {{\cal B}}_{r,k}) $, let $ k := k+1 $ if $ k+1\le d$, 1 otherwise, return to step 1.

Repeat steps 1 and 2 until convergence is obtained.

The above algorithm is quite efficient. Note that to search for the minimum point we do not have to use the derivative of the unknown functions as required by the Newton-Raphson method, which is hard to estimate. See for example Weisberg and Welsh (1994). Indeed, the subsequent numerical examples suggest the above algorithm has a wider domain of convergence than the Newton-Raphson based algorithm.