9.5 Normalized Principal Components Analysis

In certain situations the original variables can be heterogeneous w.r.t. their variances. This is particularly true when the variables are measured on heterogeneous scales (such as years, kilograms, dollars, ...). In this case a description of the information contained in the data needs to be provided which is robust w.r.t. the choice of scale. This can be achieved through a standardization of the variables, namely

\begin{displaymath}
\data{X}_{S} = \data{H}\data{X}\data{D}^{-1/2}
\end{displaymath} (9.19)

where $\data{D}= \mathop{\hbox{diag}}(s_{X_{1}X_{1}},\ldots,s_{X_{p}X_{p}})$. Note that $\overline{x}_{S} = 0$ and $\data{S}_{\data{X}_{S}} = \data{R}$, the correlation matrix of $\data{X}$. The PC transformations of the matrix $\data{X}_{S}$ are refereed to as the Normalized Principal Components (NPCs). The spectral decomposition of $\data{R}$ is
\begin{displaymath}
\data{R} =\data{G}_{\data{R}}\data{L}_{\data{R}}\data{G}_{\data{R}}^{\top},
\end{displaymath} (9.20)

where $\data{L}_{\data{R}}= \mathop{\hbox{diag}}(\ell^\data{R}_{1},\ldots,\ell^\data{R}_{p})$ and $\ell_{1}^\data{R} \ge \ldots \ge \ell_{p}^\data{R}$ are the eigenvalues of $\data{R}$ with corresponding eigenvectors $g_{1}^\data{R},\ldots,g_{p}^\data{R}$ (note that here $ \sum_{j=1}^p \ell_{j}^\data{R}= \mathop{\hbox{tr}}(\data{R}) = p $).

The NPCs, $Z_{j}$, provide a representation of each individual, and is given by

\begin{displaymath}
\data{Z} = \data{X}_{S}\data{G}_{\data{R}}
= (z_{1},\ldots,z_{p}).
\end{displaymath} (9.21)

After transforming the variables, once again, we have that
$\displaystyle \overline{z}$ $\textstyle =$ $\displaystyle 0,$ (9.22)
$\displaystyle \data{S}_{\data{Z}}$ $\textstyle =$ $\displaystyle \data{G}_{\data{R}}^{\top}\data {S}_{\data{X}_{S}}\data{G}_{\data...
...
= \data{G}_{\data{R}}^{\top}\data{R}\data{G}_{\data{R}} =
\data{L}_{\data{R}}.$ (9.23)

1mm
\begin{picture}(2.00,2.00)
\par\linethickness{1.0pt}\put(0.00,0.00){\line(1,0){1...
...\line(1,-2){5.00}}
\put(5.00,4.00){\makebox(0,0)[cc]{\LARGE\bf !}}
\end{picture}
The NPCs provide a perspective similar to that of the PCs, but in terms of the relative position of individuals, NPC gives each variable the same weight (with the PCs the variable with the largest variance received the largest weight).

Computing the covariance and correlation between $X_i$ and $Z_{j}$ is straightforward:

$\displaystyle \data{S}_{X_{S},Z}$ $\textstyle =$ $\displaystyle \frac{1}{n}\data{X}_{S}^{\top}\data{Z}
= \data{G}_{\data{R}}\data{L}_{\data{R}},$ (9.24)
$\displaystyle \data{R}_{{X}_{S},{Z}}$ $\textstyle =$ $\displaystyle \data{G}_{\data{R}}\data{L}_{\data{R}}
\data{L}_{\data{R}}^{-1/2} = \data{G}_{\data{R}}
\data{L}_{\data{R}}^{1/2}.$ (9.25)

The correlations between the original variables $X_i$ and the NPCs $Z_j$ are:
\begin{displaymath}
r_{X_i Z_j}=r_{X_{si} Z_j}=\sqrt{\ell_j} g_{R,ij}
\end{displaymath} (9.26)


\begin{displaymath}
\sum_{j=1}^pr^2_{X_i Z_j}=1
\end{displaymath} (9.27)

(compare this to (9.15) and (9.16)). The resulting NPCs, the $Z_j$, can be interpreted in terms of the original variables and the role of each PC in explaining the variation in variable $X_i$ can be evaluated.