8.3 Fitting the $n$-dimensional Point Cloud

Subspaces of Dimension 1

Suppose that $\data{X}$ is represented by a cloud of $p$ points (variables) in $\mathbb{R}^n$ (considering each column). How can this cloud be projected into a lower dimensional space? We start as before with one dimension. In other words, we have to find a straight line $G_1$, which is defined by the unit vector $v_1\in \mathbb{R}^n$, and which gives the best fit of the initial cloud of $p$ points.

Algebraically, this is the same problem as above (replace $\data{X}$ by $\data{X}^{\top}$ and follow Section 8.2): the representation of the $j$-th variable $x_{\column{j}}\in\mathbb{R}^n$ is obtained by the projection of the corresponding point onto the straight line $G_{1}$ or the direction $v_{1}$. Hence we have to find $v_1$ such that $\sum_{j=1}^p\Vert p_{ x_{\column{j}} }\Vert^2$ is maximized, or equivalently, we have to find the unit vector $v_1$ which maximizes $(\data{X}^{\top}v_1)^{\top}(\data{X}v_1)=v_1^{\top}(\data{X}\data{X}^{\top})v_1$. The solution is given by Theorem 2.5.

THEOREM 8.3   $v_1$ is the eigenvector of $\data{X}\data{X}^{\top}$ corresponding to the largest eigenvalue $\mu_1$ of $\data{X}\data{X}^{\top}. $

Representation of the Cloud on $G_1$

The coordinates of the $p$ variables on $G_1$ are given by $w_{1} = \data{X}^{\top}v_1$, the first factorial axis. The $p$ variables are now represented by a linear combination of the original individuals $x_{1},\ldots ,x_{n}$, whose coefficients are given by the vector $v_{1}$, i.e., for $j=1,\ldots , p$

\begin{displaymath}
w_{1j}=v_{11}x_{1j}+\ldots+v_{1n}x_{nj}.
\end{displaymath} (8.5)

Subspaces of Dimension $q\ (q\le n)$

The representation of the $p$ variables in a subspace of dimension $q$ is done in the same manner as for the $n$ individuals above. The best subspace is generated by the orthonormal eigenvectors $v_1,v_2,\ldots ,v_q$ of $\data{X}\data{X}^{\top}$ associated with the eigenvalues $\mu _1\ge \mu _2\ge \ldots \ge \mu _q$. The coordinates of the $p$ variables on the $k$-th factorial axis are given by the factorial variables $w_{k} = \data{X}^{\top}v_k,\ k=1,\ldots ,q$. Each factorial variable $w_{k} = (w_{k1}, w_{k2}, \ldots, w_{kp})^{\top} $ is a linear combination of the original individuals $x_1, x_2, \ldots, x_n$ whose coefficients are given by the elements of the $k$-th vector $v_k: w_{kj} = \sum_{m=1}^n v_{km} x_{mj}$. The representation in a subspace of dimension $q=2$ is depicted in Figure 8.5.

Figure: Representation of the variables $x_{\column{1}},\ldots,
x_{\column{p}}$ as a two-dimensional point cloud.
\includegraphics[width=0.85\defpicwidth]{fig35w.ps}

Summary
$\ast$
The $n$-dimensional point cloud of variables can be graphically represented by projecting each element into spaces of smaller dimensions.
$\ast$
The first factor direction is $v_{1}$ and defines a line $G_{1}$ through the origin. The vector $v_{1}$ equals the eigenvector of $\data{X}\data{X}^{\top}$ corresponding to the largest eigenvalue of $\data{X}\data{X}^{\top}$. The coordinates for representing the point cloud on a straight line are $w_{1} = \data{X}^{\top}v_{1}$.
$\ast$
The second factor direction is $v_{2}$, where $v_{2}$ denotes the eigenvector of $\data{X}\data{X}^{\top}$ corresponding to its second largest eigenvalue. The coordinates for representing the point cloud on a plane are given by $w_{1} = \data{X}^{\top}v_{1}$ and $w_{2} = \data{X}^{\top}v_{2}$.
$\ast$
The factor directions $1, \ldots, q$ are $v_{1}, \ldots, v_{q}$, which denote the eigenvectors of $\data{X}\data{X}^{\top}$ corresponding to the $q$ largest eigenvalues. The coordinates for representing the point cloud of variables on a $q$-dimensional subspace are given by $ w_{1} = \data{X}^{\top}v_{1}, \ldots, w_{q} = \data{X}^{\top}v_{q}$.