10.3 Factor Scores and Strategies

Up to now strategies have been presented for factor analysis that have concentrated on the estimation of loadings and communalities and on their interpretations. This was a logical step since the factors $F$ were considered to be normalized random sources of information and were explicitely addressed as nonspecific (common factors). The estimated values of the factors, called the factor scores, may also be useful in the interpretation as well as in the diagnostic analysis. To be more precise, the factor scores are estimates of the unobserved random vectors $F_l$, $l=1,\dots,k$, for each individual $x_i$, $i=1,\dots,n$. Johnson and Wichern (1998) describe three methods which in practice yield very similar results. Here, we present the regression method which has the advantage of being the simplest technique and is easy to implement.

The idea is to consider the joint distribution of $(X-\mu)$ and $F$, and then to proceed with the regression analysis presented in Chapter 5. Under the factor model (10.4), the joint covariance matrix of $(X-\mu)$ and $F$ is:

\begin{displaymath}
\mathop{\mathit{Var}}\begin{pmatrix}X-\mu \\ F \end{pmatrix}...
...& {\data{Q}}\\
{\data{Q}}^\top & {\data{I}}_k
\end{pmatrix}.
\end{displaymath} (10.18)

Note that the upper left entry of this matrix equals $\Sigma$ and that the matrix has size $(p+k)\times (p+k)$.

Assuming joint normality, the conditional distribution of $F\vert X$ is multinormal, see Theorem 5.1, with

\begin{displaymath}
E(F\vert X=x)={\data{Q}}^\top\Sigma^{-1}(X-\mu)
\end{displaymath} (10.19)

and using (5.7) the covariance matrix can be calculated:
\begin{displaymath}
\mathop{\mathit{Var}}(F\vert X=x)={\data{I}}_k-{\data{Q}}^\top\Sigma^{-1}{\data{Q}}.
\end{displaymath} (10.20)

In practice, we replace the unknown ${\data{Q}}$, $\Sigma$ and $\mu$ by corresponding estimators, leading to the estimated individual factor scores:
\begin{displaymath}
\widehat f_i=\widehat{\data{Q}}^\top {\data{S}}^{-1}(x_i-\overline x).
\end{displaymath} (10.21)

We prefer to use the original sample covariance matrix ${\data {S}}$ as an estimator of $\Sigma$, instead of the factor analysis approximation $\widehat{\data{Q}}\widehat{\data{Q}}^\top+\widehat\Psi$, in order to be more robust against incorrect determination of the number of factors.

The same rule can be followed when using ${\data{R}}$ instead of ${\data {S}}$. Then (10.18) remains valid when standardized variables, i.e., ${Z}={\data{D}}_\Sigma^{-1/2} ({X}-\mu)$, are considered if ${\data{D}}_\Sigma = \mathop{\hbox{diag}}(\sigma_{11},\dots,\sigma_{pp})$. In this case the factors are given by

\begin{displaymath}
\widehat f_i=\widehat{\data{Q}}^\top {\data{R}}^{-1}(z_i),
\end{displaymath} (10.22)

where $z_i={\data{D}}_S^{-1/2}(x_i-\overline x)$, $\widehat{\data{Q}}$ is the loading obtained with the matrix ${\data{R}}$, and ${\data{D}}_S=\mathop{\hbox{diag}}(s_{11},\dots,s_{pp})$.

If the factors are rotated by the orthogonal matrix ${\data{G}}$, the factor scores have to be rotated accordingly, that is

\begin{displaymath}
\widehat f_i^*={\data{G}}^\top \widehat f_i.
\end{displaymath} (10.23)

A practical example is presented in Section 10.4 using the Boston Housing data.

Practical Suggestions

No one method outperforms another in the practical implementation of factor analysis. However, by applying the tâtonnement process, the factor analysis view of the data can be stabilized. This motivates the following procedure.

  1. Fix a reasonable number of factors, say $k=2$ or $3$, based on the correlation structure of the data and/or screeplot of eigenvalues.
  2. Perform several of the presented methods, including rotation. Compare the loadings, communalities, and factor scores from the respective results.
  3. If the results show significant deviations, check for outliers (based on factor scores), and consider changing the number of factors $k$.
For larger data sets, cross-validation methods are recommended. Such methods involve splitting the sample into a training set and a validation data set. On the training sample one estimates the factor model with the desired methodology and uses the obtained parameters to predict the factor scores for the validation data set. The predicted factor scores should be comparable to the factor scores obtained using only the validation data set. This stability criterion may also involve the loadings and communalities.

Factor Analysis versus PCA

Factor analysis and principal component analysis use the same set of mathematical tools (spectral decomposition, projections, $\dots$). One could conclude, on first sight, that they share the same view and strategy and therefore yield very similar results. This is not true. There are substantial differences between these two data analysis techniques that we would like to describe here.

The biggest difference between PCA and factor analysis comes from the model philosophy. Factor analysis imposes a strict structure of a fixed number of common (latent) factors whereas the PCA determines $p$ factors in decreasing order of importance. The most important factor in PCA is the one that maximizes the projected variance. The most important factor in factor analysis is the one that (after rotation) gives the maximal interpretation. Often this is different from the direction of the first principal component.

From an implementation point of view, the PCA is based on a well-defined, unique algorithm (spectral decomposition), whereas fitting a factor analysis model involves a variety of numerical procedures. The non-uniqueness of the factor analysis procedure opens the door for subjective interpretation and yields therefore a spectrum of results. This data analysis philosophy makes factor analysis difficult especially if the model specification involves cross-validation and a data-driven selection of the number of factors.