In practice the PC transformation has to be replaced by the
respective estimators:
becomes
,
is replaced by
, etc.
If
denotes the first eigenvector of
,
the first principal component is given by
.
More generally if
is the
spectral decomposition of
, then the PCs are
obtained by
![\begin{displaymath}
\data{Y} = (\data{X}-\undertilde 1_{n}\overline x^{\top})\data{G}.
\end{displaymath}](mvahtmlimg2682.gif) |
(9.10) |
Note that with the centering matrix
and
we can write
where
is the matrix of eigenvalues of
.
Hence the variance of
equals the eigenvalue
!
The PC technique is sensitive
to scale changes. If we multiply one variable by a scalar we
obtain different eigenvalues and eigenvectors. This is due to the fact that
an eigenvalue decomposition is performed on of the covariance matrix and not on
the correlation matrix (see Section 9.5).
The following warning is therefore important:
1mm
The PC transformation should be applied to data
that have approximately the same scale in each variable.
EXAMPLE 9.2
Let us apply this technique to the bank data set.
In this example we do not standardize the data.
Figure
9.3 shows some PC plots of the bank data set. The
genuine and counterfeit bank notes are marked by ``o'' and ``+'' respectively.
Recall that the mean vector of
is
The vector of eigenvalues of
![$\data{S}$](mvahtmlimg687.gif)
is
The eigenvectors
![$g_j$](mvahtmlimg2692.gif)
are given by the columns of the matrix
The first column of
![$\data{G}$](mvahtmlimg2694.gif)
is the first eigenvector
and gives the weights used in the linear combination
of the original data in the first PC.
EXAMPLE 9.3
To see how sensitive the PCs are to a change in the scale of the variables,
assume that
![$X_1, X_2, X_3$](mvahtmlimg2413.gif)
and
![$X_6$](mvahtmlimg104.gif)
are measured in
![$cm$](mvahtmlimg2695.gif)
and that
![$X_4$](mvahtmlimg11.gif)
and
![$X_5$](mvahtmlimg12.gif)
remain in
![$mm$](mvahtmlimg2696.gif)
in the bank data set.
This leads to:
The covariance matrix can be obtained from
![$S$](mvahtmlimg2080.gif)
in (
3.4)
by dividing rows 1, 2, 3, 6 and columns 1, 2, 3, 6 by 10.
We obtain:
which clearly differs from Example
9.2.
Only the first two eigenvectors are given:
Comparing these results to the first two columns of
![$\data{G}$](mvahtmlimg2694.gif)
from Example
9.2, a completely different story is revealed.
Here the first component is dominated by
![$X_4$](mvahtmlimg11.gif)
(lower margin)
and the second by
![$X_5$](mvahtmlimg12.gif)
(upper margin),
while all of the other variables have much less weight.
The results are shown in Figure
9.4.
Section
9.5 will show how to select
a reasonable standardization of the
variables when the scales are too different.
Figure 9.4:
Principal components of the rescaled bank data.
MVApcabankr.xpl
|
Summary
![$\ast$](mvahtmlimg108.gif)
-
The scale of the variables should be roughly the same for PC transformations.
![$\ast$](mvahtmlimg108.gif)
-
For the practical implementation of
principal components
analysis (PCA) we replace
by
the mean
and
by the empirical covariance
.
Then we compute the eigenvalues
and the eigenvectors
of
.
The graphical representation of the PCs
is obtained by plotting the first PC vs. the second
(and eventually vs. the third).
![$\ast$](mvahtmlimg108.gif)
-
The components of the eigenvectors
are the weights
of the original variables in the PCs.