8.3 Fitting the -dimensional Point Cloud

Subspaces of Dimension 1

Suppose that $\data{X}$ is represented by a cloud of points (variables) in $\mathbb{R}^n$ (considering each column). How can this cloud be projected into a lower dimensional space? We start as before with one dimension. In other words, we have to find a straight line , which is defined by the unit vector $v_1\in \mathbb{R}^n$ , and which gives the best fit of the initial cloud of points.

Algebraically, this is the same problem as above (replace $\data{X}$ by $\data{X}^{\top}$ and follow Section 8.2): the representation of the -th variable $x_{\column{j}}\in\mathbb{R}^n$ is obtained by the projection of the corresponding point onto the straight line $G_{1}$ or the direction $v_{1}$ . Hence we have to find such that $\sum_{j=1}^p\Vert p_{ x_{\column{j}} }\Vert^2$ is maximized, or equivalently, we have to find the unit vector which maximizes $(\data{X}^{\top}v_1)^{\top}(\data{X}v_1)=v_1^{\top}(\data{X}\data{X}^{\top})v_1$ . The solution is given by Theorem 2.5.

THEOREM 8.3

is the eigenvector of $\data{X}\data{X}^{\top}$ corresponding to the largest eigenvalue $\mu_1$ of $\data{X}\data{X}^{\top}.$

Representation of the Cloud on

The coordinates of the variables on are given by $w_{1} = \data{X}^{\top}v_1$ , the first factorial axis. The variables are now represented by a linear combination of the original individuals $x_{1},\ldots ,x_{n}$ , whose coefficients are given by the vector $v_{1}$ , i.e., for $j=1,\ldots , p$

$\begin{displaymath} w_{1j}=v_{11}x_{1j}+\ldots+v_{1n}x_{nj}. \end{displaymath}$

(8.5)

Subspaces of Dimension $q\ (q\le n)$

The representation of the variables in a subspace of dimension is done in the same manner as for the individuals above. The best subspace is generated by the orthonormal eigenvectors $v_1,v_2,\ldots ,v_q$ of $\data{X}\data{X}^{\top}$ associated with the eigenvalues $\mu _1\ge \mu _2\ge \ldots \ge \mu _q$ . The coordinates of the variables on the -th factorial axis are given by the factorial variables $w_{k} = \data{X}^{\top}v_k,\ k=1,\ldots ,q$ . Each factorial variable $w_{k} = (w_{k1}, w_{k2}, \ldots, w_{kp})^{\top}$ is a linear combination of the original individuals $x_1, x_2, \ldots, x_n$ whose coefficients are given by the elements of the -th vector $v_k: w_{kj} = \sum_{m=1}^n v_{km} x_{mj}$ . The representation in a subspace of dimension is depicted in Figure 8.5.

**Figure:** Representation of the variables $x_{\column{1}},\ldots, x_{\column{p}}$ as a two-dimensional point cloud.
$\includegraphics[width=0.85\defpicwidth]{fig35w.ps}$

Summary

$\ast$: The -dimensional point cloud of variables can be graphically represented by projecting each element into spaces of smaller dimensions.
$\ast$: The first factor direction is $v_{1}$ and defines a line $G_{1}$ through the origin. The vector $v_{1}$ equals the eigenvector of $\data{X}\data{X}^{\top}$ corresponding to the largest eigenvalue of $\data{X}\data{X}^{\top}$ . The coordinates for representing the point cloud on a straight line are $w_{1} = \data{X}^{\top}v_{1}$ .
$\ast$: The second factor direction is $v_{2}$ , where $v_{2}$ denotes the eigenvector of $\data{X}\data{X}^{\top}$ corresponding to its second largest eigenvalue. The coordinates for representing the point cloud on a plane are given by $w_{1} = \data{X}^{\top}v_{1}$ and $w_{2} = \data{X}^{\top}v_{2}$ .
$\ast$: The factor directions $1, \ldots, q$ are $v_{1}, \ldots, v_{q}$ , which denote the eigenvectors of $\data{X}\data{X}^{\top}$ corresponding to the largest eigenvalues. The coordinates for representing the point cloud of variables on a -dimensional subspace are given by $w_{1} = \data{X}^{\top}v_{1}, \ldots, w_{q} = \data{X}^{\top}v_{q}$ .