Next: 6.5 Concluding Remarks Up: 6. Dimension Reduction Methods Previous: 6.3 Nonlinear Reduction of

6.4 Linear Reduction of Explanatory Variables

Thus far, we have described dimension reduction methods for multidimensional data, where there are no distinctions among variables. However, there are times when we must analyze multidimensional data in which a variable is a response variable and others are explanatory variables. Regression analysis is usually used for the data. Dimension reduction methods of explanatory variables are introduced below.

6.4.1 Sliced Inverse Regression

Regression analysis is one of the fundamental methods used for data analysis. A response variable is estimated by a function of explanatory variables ${\boldsymbol{x}}$ , a -dimensional vector. An immediate goal of ordinary regression analysis is to find the function of ${\boldsymbol{x}}$ . When there are many explanatory variables in the data set, it is difficult to stably calculate the regression coefficients. An approach to reducing the number of explanatory variables is explanatory variable selection, and there are many studies on variable selection. Another approach is to project the explanatory variables on a lower dimensional space that nearly estimates the response variable.

Sliced Inverse Regression (SIR), which was proposed by [11], is a method that can be employed to reduce explanatory variables with linear projection. SIR finds linear combinations of explanatory variables that are a reduction for non-linear regression. The original SIR algorithm, however, cannot derive suitable results for some artificial data with trivial structures. Li also developed another algorithm, SIR2, which uses the conditional estimation $E[{\text{cov}}({\boldsymbol{x}}\vert y)]$ . However, SIR2 is also incapable of finding trivial structures for another type of data.

We hope that projection pursuit can be used for finding linear combinations of explanatory variables. A new SIR method with projection pursuit (SIRpp) is described here. We also present a numerical example of the proposed method.

6.4.2 Sliced Inverse Regression Model

SIR is based on the model (SIR model):

$\displaystyle y=f\left({\boldsymbol{\beta}}_1^{\top}{\boldsymbol{x}},{\boldsymb... ...}},\ldots,{\boldsymbol{\beta}}_K^{\top}{\boldsymbol{x}}\right)+\varepsilon {},$

(6.7)

where ${\boldsymbol{x}}$ is the vector of

explanatory variables, ${\boldsymbol{\beta}}_k$ are unknown vectors, $\varepsilon$ is independent of ${\boldsymbol{x}}$ , and

is an arbitrary unknown function on ${\boldsymbol{R}}^K$ .

The purpose of SIR is to estimate the vectors ${\boldsymbol{\beta}}_k$ for which this model holds. If we obtain ${\boldsymbol{\beta}}_k$ , we can reduce the dimension of ${\boldsymbol{x}}$ to . Hereafter, we shall refer to any linear combination of ${\boldsymbol{\beta}}_k$ as the effective dimensional reduction (e.d.r.) direction.

[11] proposed an algorithm for finding e.d.r. directions, and it was named SIR. However, we refer to the algorithm as SIR1 to distinguish it from the SIR model.

The main idea of SIR1 is to use $E[{\boldsymbol{x}}\vert y]$ . $E[{\boldsymbol{x}}\vert y]$ is contained in the space spanned by e.d.r. directions, but there is no guarantee that $E[{\boldsymbol{x}}\vert y]$ will span the space. For example, in Li, if $(X_1,X_2) \sim N(0,I_2), Y=X_1^2$ then $E[X_1\vert y]=E[X_2\vert y]=0$ .

6.4.3 SIR Model and Non-Normality

Hereafter, it is assumed that the distribution of ${\boldsymbol{x}}$ is standard normal distribution: ${\boldsymbol{x}} \sim N(0,I_p)$ . If not, standardize ${\boldsymbol{x}}$ by affine transformation. In addition, ${\boldsymbol{\beta}}_i^{\top}{\boldsymbol{\beta}}_j=\delta_{ij}, (i,j=1,2,\ldots,K)$ is presumed without loss of generality. We can choose ${\boldsymbol{\beta}}_i (i=K+1,\ldots,p)$ such that $\{{\boldsymbol{\beta}}_i\}$ $(i=1,\ldots,p)$ is a basis for ${\boldsymbol{R}}^p$ .

Since the distribution of ${\boldsymbol{x}}$ is , the distribution of $({\boldsymbol{\beta}}_1^{\top} {\boldsymbol{x}}, \ldots, {\boldsymbol{\beta}}_p^{\top} {\boldsymbol{x}})$ is also . The density function of $({\boldsymbol{\beta}}_1^{\top} {\boldsymbol{x}},\ldots,{\boldsymbol{\beta}}_p^{\top} {\boldsymbol{x}}, y)$ is

	$\displaystyle h\left({\boldsymbol{\beta}}_1^{\top} {\boldsymbol{x}},\ldots,{\boldsymbol{\beta}}_p^{\top} {\boldsymbol{x}}, y\right)$
	$\displaystyle \quad=\phi\left({\boldsymbol{\beta}}_1^{\top} {\boldsymbol{x}}\ri... ...symbol{\beta}}_K^{\top} {\boldsymbol{x}}\right)\right)^2}{2\sigma^2}\right)}{},$

where $\phi({\boldsymbol{x}})=1/\sqrt{2\pi}\exp{(-x^2/2)}$ and we assume $\varepsilon \sim N(0,\sigma^2)$ .

The conditional density function is

$\displaystyle h\left({\boldsymbol{\beta}}_1^{\top} {\boldsymbol{x}},\ldots,{\bo... ...\boldsymbol{x}},\ldots,{\boldsymbol{\beta}}_K^{\top} {\boldsymbol{x}}\right){},$

where

is a function of ${\boldsymbol{\beta}}_1^{\top} {\boldsymbol{x}},\ldots,{\boldsymbol{\beta}}_K^{\top} {\boldsymbol{x}}$ , which is not generally the normal density function.

Thus, $h({\boldsymbol{\beta}}_1^{\top} {\boldsymbol{x}},\ldots,{\boldsymbol{\beta}}_p^{\top} {\boldsymbol{x}}\mid y)$ is separated into the normal distribution part $\phi({\boldsymbol{\beta}}_{K+1}^{\top} {\boldsymbol{x}})\ldots \phi({\boldsymbol{\beta}}_p^{\top} {\boldsymbol{x}})$ and the non-normal distribution part .

Projection Pursuit is an excellent method for finding non-normal parts, so we adopt it for SIR.

6.4.4 SIRpp Algorithm

Here we show the algorithm for the SIR model with projection pursuit (SIRpp). The algorithm for the data $(y_i,{\boldsymbol{x}}_i)~(i=1,2,\ldots,n)$ is as follows:

Standardize ${\boldsymbol{x}}$ : $\tilde{{\boldsymbol{x}}_i}=\hat{\Sigma}_{{\boldsymbol{x}}{\boldsymbol{x}}}^{-1/2}({\boldsymbol{x}}_i-\bar{{\boldsymbol{x}}})(i=1,2,\ldots,n)$ , where $\hat{\Sigma}_{{\boldsymbol{x}}{\boldsymbol{x}}}$ is the sample covariance matrix and $\bar{{\boldsymbol{x}}}$ is the sample mean of ${\boldsymbol{x}}$ .
Divide the range of into slices, $I_1,\ldots,I_H$ .
Conduct a projection pursuit in dimensional space for each slice. The following projections are obtained: $({\boldsymbol{\alpha}}_1^{(h)},\ldots,{\boldsymbol{\alpha}}_K^{(h)})$ , $(h=1,\ldots,H)$ .
Let the largest eigenvectors of $\hat{V}$ be $\hat{{\boldsymbol{\eta}}}_k (k=1,\ldots,K)$ . Output $\hat{{\boldsymbol{\beta}}}_k=\hat{{\boldsymbol{\eta}}}_k\Sigma_{{\boldsymbol{x}}{\boldsymbol{x}}}^{-1/2} (k=1,2,\ldots,K)$ for the estimation of e.d.r. directions, where $\hat{V}=\sum_{h=1}^H w(h)\sum_{k=1}^K {{\boldsymbol{\alpha}}_k^{(h)}}^{\top}{\boldsymbol{\alpha}}_k^{(h)}$ .

6.4.5 Numerical Examples

Two models of the multicomponent are used:

$\displaystyle y=x_1(x_1+x_2+1)+\sigma\cdot\varepsilon{},$	(6.8)
$\displaystyle y=\sin(x_1)+\cos(x_2)+\sigma\cdot\varepsilon$	(6.9)

to generate

data, where $\sigma=0.5$ . We first generate $x_1, x_2,\varepsilon$ with

and calculate response variable

using (6.8) or (6.9). Eight variables $x_3,\ldots,x_{10}$ generated by

are added to the explanatory variables. The ideal e.d.r. directions are contained within the space spanned by two vectors $(1,0,\ldots,0)$ and $(0,1,\ldots,0)$ .

The squared multiple correlation coefficient between the projected variable ${\boldsymbol{b}}^{\top}{\boldsymbol{x}}$ and the space spanned by ideal e.d.r. directions:

$\displaystyle R^2({\boldsymbol{b}})=\max_{{\boldsymbol{\beta}}\in B} \frac{\le... ...mbol{\beta}}^{\top}\sum_{{\boldsymbol{x}}{\boldsymbol{x}}}{\boldsymbol{\beta}}}$

(6.10)

is adopted as the criterion for evaluating the effectiveness of estimated e.d.r. directions.

Table 6.1 shows the mean and the standard deviation (in parentheses) of $R^2(\hat{{\boldsymbol{\beta}}}_1)$ and $R^2(\hat{{\boldsymbol{\beta}}}_2)$ of four SIR algorithms for , and , after replicates. SIR2 cannot reduce the explanatory variables from the first example. The result of the second example is very interesting. SIR1 finds the asymmetric e.d.r. direction, but, does not find the symmetric e.d.r. direction. Conversely, SIR2 finds only the symmetric e.d.r. direction. SIRpp can detect both of the e.d.r. directions.

**Table 6.1:** Results for SIR1, SIR2, and SIRpp (Example 1)
	SIR		SIR		SIRpp
H	$R^2\left(\hat{\beta}_{1}\right)$	$R^2\left(\hat{\beta}_{2}\right)$	$R^2\left(\hat{\beta}_{1}\right)$	$R^2\left(\hat{\beta}_{2}\right)$	$R^2\left(\hat{\beta}_{1}\right)$	$R^2\left(\hat{\beta}_{2}\right)$
	0.92	0.77	0.96	0.20	0.97	0.78
	(0.04)	(0.11)	(0.03)	(0.21)	(0.02)	(0.15)
	0.93	0.81	0.92	0.10	0.95	0.79
	(0.03)	(0.09)	(0.09)	(0.12)	(0.04)	(0.13)
	0.92	0.76	0.83	0.11	0.95	0.75
	(0.04)	(0.18)	(0.19)	(0.13)	(0.07)	(0.18)

The SIRpp algorithm performs well in finding the e.d.r. directions; however, the algorithm requires more computing power. This is one part of projection pursuit for which the algorithm is time consuming.

**Figure 6.6:** Function of the example 1. Asymmetric function $y=x_1(x_1+x_2+1)+\sigma \cdot \varepsilon$
$\includegraphics[width=8cm]{text/3-6/func2d.eps}$

**Table 6.2:** Results of SIR1, SIR2, and SIRpp (Example 2)
	SIR		SIR		SIRpp
H	$R^2\left(\hat{\beta}_{1}\right)$	$R^2\left(\hat{\beta}_{2}\right)$	$R^2\left(\hat{\beta}_{1}\right)$	$R^2\left(\hat{\beta}_{2}\right)$	$R^2\left(\hat{\beta}_{1}\right)$	$R^2\left(\hat{\beta}_{2}\right)$
	0.97	0.12	0.92	0.01	0.92	0.88
	(0.02)	(0.14)	(0.04)	(0.10)	(0.05)	(0.11)
	0.97	0.12	0.90	0.05	0.88	0.84
	(0.02)	(0.15)	(0.06)	(0.07)	(0.08)	(0.13)
	0.97	0.12	0.85	0.05	0.84	0.73
	(0.02)	(0.14)	(0.09)	(0.06)	(0.10)	(0.22)

**Figure 6.7:** Function of the example 2. Function of asymmetric with respect to the axis and symmetric with respect to axis. $y=\sin(x_1)+\cos(x_2)+\sigma\cdot\varepsilon$
$\includegraphics[width=8cm]{text/3-6/func3.eps}$

Next: 6.5 Concluding Remarks Up: 6. Dimension Reduction Methods Previous: 6.3 Nonlinear Reduction of