In this section is represented by a cloud
of
points in
(considering each row).
The question is how to project this
point cloud onto a space of lower dimension.
To begin consider the simplest
problem, namely finding a subspace of dimension
.
The problem boils down to finding a straight line
through the origin.
The direction of this line can be defined by a unit
vector
.
Hence, we are searching for the vector
which gives the ``best'' fit of the initial cloud of
points.
The situation is depicted in Figure 8.3.
The representation of the -th individual
on this line is obtained by the projection of the
corresponding
point onto
, i.e., the projection point
. We know from (2.42) that
the coordinate of
on
is given by
The solution is given by Theorem 2.5 (using
and
in the theorem).
Note that if the data have been centered, i.e.,
, then
, where
is the centered data matrix, and
is the covariance matrix.
Thus Theorem 8.1 says that we are searching for a maximum
of the quadratic form (8.3) w.r.t. the covariance matrix
.
The coordinates of the individuals on
are given by
.
is called the first factorial
variable or the first factor and
the first factorial axis.
The
individuals,
, are now represented by a
new factorial variable
. This factorial variable is a linear
combination of the original variables
whose coefficients are given
by the vector
, i.e.,
![]() |
(8.4) |
If we approximate the individuals by a plane (dimension 2),
it can be shown via Theorem 2.5 that this space contains
.
The plane is determined by the best linear fit
(
) and a unit vector
orthogonal to
which maximizes the quadratic form
under the constraints
The unit vector characterizes a second line,
,
on which the points are projected.
The coordinates of the
individuals on
are given by
. The variable
is called the
second factorial variable or the second factor.
The representation of the
individuals in
two-dimensional space (
vs.
)
is shown in Figure 8.4.
In the case of dimensions the task is again to minimize
(8.2) but with projection points in a
-dimensional
subspace.
Following the same argument as above, it can be shown
via Theorem 2.5 that this best
subspace is generated by
, the
orthonormal
eigenvectors of
associated with the corresponding eigenvalues
.
The coordinates of the
individuals on the
-th factorial
axis,
, are given by the
-th factorial variable
for
.
Each factorial variable
is a linear combination of the original variables
whose coefficients are given by the elements of the
-th vector
.