18.3 Sliced Inverse Regression

Sliced inverse regression (SIR) is a dimension reduction method proposed by Duan and Li (1991). The idea is to find a smooth regression function that operates on a variable set of projections. Given a response variable

and a (random) vector $X \in \mathbb{R}^p$ of explanatory variables, SIR is based on the model:

Model (18.10) describes the situation where the response variable

depends on the

-dimensional variable

only through a

-dimensional subspace. The unknown $\beta_{i}$ 's, which span this space, are called effective dimension reduction directions (EDR-directions). The span is denoted as effective dimension reduction space (EDR-space). The aim is to estimate the base vectors of this space, for which neither the length nor the direction can be identified. Only the space in which they lie is identifiable.

SIR tries to find this

-dimensional subspace of $\mathbb{R}^p$ which under the model (18.10) carries the essential information of the regression between

and

. SIR also focuses on small

, so that nonparametric methods can be applied for the estimation of

. A direct application of nonparametric smoothing to

is for high dimension

generally not possible due to the sparseness of the observations. This fact is well known as the curse of dimensionality, see Huber (1985).

The name of SIR comes from computing the inverse regression (IR) curve. That means instead of looking for $\textrm{E}\left(Y\!\mid\!X=x\right)$ , we investigate $\textrm{E}\left(X\!\mid\!Y=y\right)$ , a curve in $\mathbb{R}^p$ consisting of

one-dimensional regressions. What is the connection between the IR and the SIR model (18.10)? The answer is given in the following theorem from Li (1991).

Assumption (18.11) is equivalent to the fact that

has an elliptically symmetric distribution, see Cook and Weisberg (1991). Hall and Li (1993) have shown that assumption (18.11) only needs to hold for the EDR-directions.

It is easy to see that for the standardized variable $Z=\Sigma^{-1/2}\{X-\textrm{E}(X)\}$ the IR curve $m_1(y)= \textrm{E}(Z\!\mid\!Y=y)$ lies in $\textrm{span}(\eta_{1},\dots,\eta_{k})$ , where $\eta_{i}=\Sigma^{1/2}\beta_{i}$ . This means that the conditional expectation

is moving in $\textrm{span}(\eta_{1},\dots,\eta_{k})$ depending on

. With

orthogonal to $\textrm{span}(\eta_{1},\dots,\eta_{k})$ , it follows that

First, estimate $\Cov\{m_1(y)\}$ and then calculate the orthogonal directions of this matrix (for example, with eigenvalue/eigenvector decomposition). In general, the estimated covariance matrix will have full rank because of random variability, estimation errors and numerical imprecision. Therefore, we investigate the eigenvalues of the estimate and ignore eigenvectors having small eigenvalues. These eigenvectors $\hat\eta_{i}$ are estimates for the EDR-direction $\eta_{i}$ of

. We can easily rescale them to estimates $\hat\beta_{i}$ for the EDR-directions of

by multiplying by $\hat\Sigma^{-1/2}$ , but then they are not necessarily orthogonal. SIR is strongly related to PCA. If all of the data falls into a single interval, which means that $\widehat{\Cov}\{m_1(y)\}$ is equal to $\widehat{\Cov}(Z)$ , SIR coincides with PCA. Obviously, in this case any information about

is ignored.

The SIR Algorithm

SIR II

The idea of SIR II is to consider the conditional covariances. The principle of SIR II is the same as before: investigation of the IR curve (here the conditional covariance instead of the conditional expectation). Unfortunately, the theory of SIR II is more complicated. The assumption of the elliptical symmetrical distribution of

has to be more restrictive, i.e., assuming the normality of

Given this assumption, one can show that the vectors with the largest distance to $\Cov(Z\!\mid\!Y=y) -\textrm{E}\{\Cov(Z\!\mid\!Y=y)\}$ for all

are the most interesting for the EDR-space. An appropriate measure for the overall mean distance is, according to Li (1992),

The SIR II Algorithm

EXAMPLE 18.3 The result of SIR is visualized in four plots in Figure 18.6: the left two show the response variable versus the first respectively second direction. The upper right plot consists of a three-dimensional plot of the first two directions and the response. The last picture shows $\hat\Psi_{k}$ , the ratio of the sum of the first

eigenvalues and the sum of all eigenvalues, similar to principal component analysis.

The data are generated according to the following model:

$\begin{displaymath}y_{i} = \beta_{1}^{\top}x_{i} + (\beta_{1}^{\top}x_{i})^3 + 4 \left(\beta_{2}^{\top}x_{i}\right)^2 + \varepsilon_{i}, \end{displaymath}$

where the $x_{i}$ 's follow a three-dimensional normal distribution with zero mean, the covariance equal to the identity matrix, $\beta_{2}=(1,-1,-1)^{\top}$ , and $\beta_{1}=(1,1,1)^{\top}$ . $\varepsilon_{i}$ is standard, normally distributed and

. Corresponding to model (18.10), $m(u,v,\varepsilon) = u+u^3+v^2+\varepsilon$ . The situation is depicted in Figure 18.4 and Figure 18.5.

**Figure 18.4:** Plot of the true response versus the true indices. The monotonic and the convex shapes can be clearly seen. `MVAsirdata.xpl`
$\includegraphics[width=1\defpicwidth]{MVAsirdata1.ps}$

**Figure 18.5:** Plot of the true response versus the true indices. The monotonic and the convex shapes can be clearly seen. `MVAsirdata.xpl`
$\includegraphics[width=1\defpicwidth]{MVAsirdata2.ps}$

Both algorithms were conducted using the slicing method with elements in each slice. The goal was to find $\beta_{1}$ and $\beta _{2}$ with SIR. The data are designed such that SIR can detect $\beta_{1}$ because of the monotonic shape of $\{\beta_{1}^{\top}x_{i} + (\beta_{1}^{\top}x_{i})^3\}$ , while SIR II will search for $\beta _{2}$ , as in this direction the conditional variance on is varying.

Table 18.3: SIR: EDR-directions for simulated data.

$\hat\beta_{1}$	$\hat\beta_{2}$	$\hat\beta_{3}$
0.578	-0.723	-0.266
0.586	0.201	0.809
0.568	0.661	-0.524

**Figure 18.6:** SIR: The left plots show the response versus the estimated EDR-directions. The upper right plot is a three-dimensional plot of the first two directions and the response. The lower right plot shows the eigenvalues $\hat \lambda_i$ ( $\ast$ ) and the cumulative sum ( $\circ$ ). `MVAsirdata.xpl`
$\includegraphics[width=1\defpicwidth]{MVAsirdata3.ps}$

If we normalize the eigenvalues for the EDR-directions in Table 18.3 such that they sum up to one, the resulting vector is . As can be seen in the upper left plot of Figure 18.6, there is a functional relationship found between the first index $\hat\beta_{1}^{\top}x$ and the response. Actually, $\beta_{1}$ and $\hat\beta_{1}$ are nearly parallel, that is, the normalized inner product $\hat\beta_{1}^{\top}\beta_{1}/ \{\vert\vert\hat\beta_{1}\vert\vert\vert\vert\beta_{1}\vert\vert\} = 0.9894$ is very close to one.

The second direction along $\beta _{2}$ is probably found due to the good approximation, but SIR does not provide it clearly, because it is ``blind'' with respect to the change of variance, as the second eigenvalue indicates.

For SIR II, the normalized eigenvalues are , that is, about 69% of the variance is explained by the first EDR-direction (Table 18.4). Here, the normalized inner product of $\beta _{2}$ and $\hat\beta_{1}$ is . The estimator $\hat\beta_1$ estimates in fact $\beta_2$ of the simulated model. In this case, SIR II found the direction where the second moment varies with respect to $\beta_{2}^{\top}x$ .

**Figure 18.7:** SIR II mainly sees the direction $\beta _{2}$ . The left plots show the response versus the estimated EDR-directions. The upper right plot is a three-dimensional plot of the first two directions and the response. The lower right plot shows the eigenvalues $\hat \lambda_i$ ( $\ast$ ) and the cumulative sum ( $\circ$ ). `MVAsir2data.xpl`
$\includegraphics[width=1\defpicwidth]{MVAsir2data.ps}$

Table 18.4: SIR II: EDR-directions for simulated data.

$\hat\beta_{1}$	$\hat\beta_{2}$	$\hat\beta_{3}$
0.821	0.180	0.446
-0.442	-0.826	0.370
-0.361	-0.534	0.815

In summary, SIR has found the direction which shows a strong relation regarding the conditional expectation between $\beta_{1}^{\top}x$ and

, and SIR II has found the direction where the conditional variance is varying, namely, $\beta_{2}^{\top}x$ .

The behavior of the two SIR algorithms is as expected. In addition, we have seen that it is worthwhile to apply both versions of SIR. It is possible to combine SIR and SIR II (Schott; 1994; Li; 1991; Cook and Weisberg; 1991) directly, or to investigate higher conditional moments. For the latter it seems to be difficult to obtain theoretical results. For further details on SIR see Kötter (1996).

		$\displaystyle \textrm{E}\left(\vert\vert\left[\Cov(Z\!\mid\!Y=y) - \textrm{E}\{\Cov(Z\!\mid\!Y=y)\}\right] b\vert\vert^2\right) =$	(18.12)
	$\textstyle =$	$\displaystyle b^{\top}\textrm{E}\left(\vert\vert\Cov(Z\!\mid\!y) - \textrm{E}\{\Cov(Z\!\mid\!y)\}\vert\vert^2\right) b.$	(18.13)