# 18.3 Sliced Inverse Regression

Sliced inverse regression (SIR) is a dimension reduction method proposed by Duan and Li (1991). The idea is to find a smooth regression function that operates on a variable set of projections. Given a response variable  and a (random) vector  of explanatory variables, SIR is based on the model:

 (18.10)

where are unknown projection vectors,  is unknown and assumed to be less than , is an unknown function, and is the noise random variable with .

Model (18.10) describes the situation where the response variable depends on the -dimensional variable only through a -dimensional subspace. The unknown 's, which span this space, are called effective dimension reduction directions (EDR-directions). The span is denoted as effective dimension reduction space (EDR-space). The aim is to estimate the base vectors of this space, for which neither the length nor the direction can be identified. Only the space in which they lie is identifiable.

SIR tries to find this -dimensional subspace of which under the model (18.10) carries the essential information of the regression between and . SIR also focuses on small , so that nonparametric methods can be applied for the estimation of . A direct application of nonparametric smoothing to is for high dimension generally not possible due to the sparseness of the observations. This fact is well known as the curse of dimensionality, see Huber (1985).

The name of SIR comes from computing the inverse regression (IR) curve. That means instead of looking for , we investigate , a curve in consisting of one-dimensional regressions. What is the connection between the IR and the SIR model (18.10)? The answer is given in the following theorem from Li (1991).

THEOREM 18.1   Given the model (18.10) and the assumption
 (18.11)

the centered IR curve lies in the linear subspace spanned by the vectors , where .

Assumption (18.11) is equivalent to the fact that has an elliptically symmetric distribution, see Cook and Weisberg (1991). Hall and Li (1993) have shown that assumption (18.11) only needs to hold for the EDR-directions.

It is easy to see that for the standardized variable the IR curve lies in , where . This means that the conditional expectation is moving in depending on . With orthogonal to , it follows that

and further that

As a consequence is degenerated in each direction orthogonal to all EDR-directions of . This suggests the following algorithm.

First, estimate and then calculate the orthogonal directions of this matrix (for example, with eigenvalue/eigenvector decomposition). In general, the estimated covariance matrix will have full rank because of random variability, estimation errors and numerical imprecision. Therefore, we investigate the eigenvalues of the estimate and ignore eigenvectors having small eigenvalues. These eigenvectors are estimates for the EDR-direction of . We can easily rescale them to estimates for the EDR-directions of by multiplying by , but then they are not necessarily orthogonal. SIR is strongly related to PCA. If all of the data falls into a single interval, which means that is equal to , SIR coincides with PCA. Obviously, in this case any information about is ignored.

## The SIR Algorithm

The algorithm to estimate the EDR-directions via SIR is as follows:
1. Standardize :

2. Divide the range of into nonoverlapping intervals (slices) , . denotes the number of observations within slice , and the indicator function for this slice:

3. Compute the mean of over all slices. This is a crude estimate for the inverse regression curve :

4. Calculate the estimate for :

5. Identify the eigenvalues and eigenvectors of .
6. Transform the standardized EDR-directions back to the original scale. Now the estimates for the EDR-directions are given by

REMARK 18.1   The number of different eigenvalues unequal to zero depends on the number of slices. The rank of cannot be greater than the number of slices (the sum up to zero). This is a problem for categorical response variables, especially for a binary response--where only one direction can be found.

## SIR II

In the previous section we learned that it is interesting to consider the IR curve, that is, . In some situations however SIR does not find the EDR-direction. We overcome this difficulty by considering the conditional covariance instead of the IR curve. An example where the EDR directions are not found via the SIR curve is given below.

EXAMPLE 18.2   Suppose that and . Then because of independence and because of symmetry. Hence, the EDR-direction is not found when the IR curve is considered.

The conditional variance

offers an alternative way to find . It is a function of while is a constant.

The idea of SIR II is to consider the conditional covariances. The principle of SIR II is the same as before: investigation of the IR curve (here the conditional covariance instead of the conditional expectation). Unfortunately, the theory of SIR II is more complicated. The assumption of the elliptical symmetrical distribution of has to be more restrictive, i.e., assuming the normality of .

Given this assumption, one can show that the vectors with the largest distance to for all are the most interesting for the EDR-space. An appropriate measure for the overall mean distance is, according to Li (1992),

 (18.12) (18.13)

Equipped with this distance, we conduct again an eigensystem decomposition, this time for the above expectation . Then we take the rescaled eigenvectors with the largest eigenvalues as estimates for the unknown EDR-directions.

## The SIR II Algorithm

The algorithm of SIR II is very similar to the one for SIR, it differs in only two steps. Instead of merely computing the mean, the covariance of each slice has to be computed. The estimate for the above expectation (18.12) is calculated after computing all slice covariances. Finally, decomposition and rescaling are conducted, as before.

1. Do steps 1 to 3 of the SIR algorithm.
2. Compute the slice covariance matrix :

3. Calculate the mean over all slice covariances:

4. Compute an estimate for (18.12):

5. Identify the eigenvectors and eigenvalues of and scale back the eigenvectors. This gives estimates for the SIR II EDR-directions:

EXAMPLE 18.3   The result of SIR is visualized in four plots in Figure 18.6: the left two show the response variable versus the first respectively second direction. The upper right plot consists of a three-dimensional plot of the first two directions and the response. The last picture shows , the ratio of the sum of the first eigenvalues and the sum of all eigenvalues, similar to principal component analysis.

The data are generated according to the following model:

where the 's follow a three-dimensional normal distribution with zero mean, the covariance equal to the identity matrix, , and . is standard, normally distributed and . Corresponding to model (18.10), . The situation is depicted in Figure 18.4 and Figure 18.5.

Both algorithms were conducted using the slicing method with elements in each slice. The goal was to find and with SIR. The data are designed such that SIR can detect because of the monotonic shape of , while SIR II will search for , as in this direction the conditional variance on is varying.

Table 18.3: SIR: EDR-directions for simulated data.
 0.578 -0.723 -0.266 0.586 0.201 0.809 0.568 0.661 -0.524

If we normalize the eigenvalues for the EDR-directions in Table 18.3 such that they sum up to one, the resulting vector is . As can be seen in the upper left plot of Figure 18.6, there is a functional relationship found between the first index and the response. Actually, and are nearly parallel, that is, the normalized inner product is very close to one.

The second direction along is probably found due to the good approximation, but SIR does not provide it clearly, because it is blind'' with respect to the change of variance, as the second eigenvalue indicates.

For SIR II, the normalized eigenvalues are , that is, about 69% of the variance is explained by the first EDR-direction (Table 18.4). Here, the normalized inner product of and is . The estimator estimates in fact of the simulated model. In this case, SIR II found the direction where the second moment varies with respect to .

Table 18.4: SIR II: EDR-directions for simulated data.
 0.821 0.180 0.446 -0.442 -0.826 0.370 -0.361 -0.534 0.815

In summary, SIR has found the direction which shows a strong relation regarding the conditional expectation between and , and SIR II has found the direction where the conditional variance is varying, namely, .

The behavior of the two SIR algorithms is as expected. In addition, we have seen that it is worthwhile to apply both versions of SIR. It is possible to combine SIR and SIR II (Schott; 1994; Li; 1991; Cook and Weisberg; 1991) directly, or to investigate higher conditional moments. For the latter it seems to be difficult to obtain theoretical results. For further details on SIR see Kötter (1996).

Summary
SIR serves as a dimension reduction tool for regression problems.
Inverse regression avoids the curse of dimensionality.
The dimension reduction can be conducted without estimation of the regression function .
SIR searches for the effective dimension reduction (EDR) by computing the inverse regression IR.
SIR II bases the EDR on computing the inverse conditional variance.
SIR might miss EDR directions that are found by SIR II.