9. Principal Components Analysis

Chapter 8 presented the basic geometric tools needed to produce a lower dimensional description of the rows and columns of a multivariate data matrix. Principal components analysis has the same objective with the exception that the rows of the data matrix ${\data{X}}$ will now be considered as observations from a

-variate random variable

. The principle idea of reducing the dimension of

is achieved through linear combinations. Low dimensional linear combinations are often easier to interpret and serve as an intermediate step in a more complex data analysis. More precisely one looks for linear combinations which create the largest spread among the values of

. In other words, one is searching for linear combinations with the largest variances.

Section 9.1 introduces the basic ideas and technical elements behind principal components. No particular assumption will be made on except that the mean vector and the covariance matrix exist. When reference is made to a data matrix ${\data{X}}$ in Section 9.2, the empirical mean and covariance matrix will be used. Section 9.3 shows how to interpret the principal components by studying their correlations with the original components of . Often analyses are performed in practice by looking at two-dimensional scatterplots. Section 9.4 develops inference techniques on principal components. This is particularly helpful in establishing the appropriate dimension reduction and thus in determining the quality of the resulting lower dimensional representations. Since principal component analysis is performed on covariance matrices, it is not scale invariant. Often, the measurement units of the components of are quite different, so it is reasonable to standardize the measurement units. The normalized version of principal components is defined in Section 9.5. In Section 9.6 it is discovered that the empirical principal components are the factors of appropriate transformations of the data matrix. The classical way of defining principal components through linear combinations with respect to the largest variance is described here in geometric terms, i.e., in terms of the optimal fit within subspaces generated by the columns and/or the rows of ${\data{X}}$ as was discussed in Chapter 8. Section 9.9 concludes with additional examples.