The aim of factor analysis is to explain the outcome of
variables in the data matrix using fewer
variables, the so-called factors.
Ideally all the information in
can be reproduced by a smaller number of
factors. These factors are interpreted as latent (unobserved)
common characteristics of the observed
.
The case just described occurs when every observed
can be written as
The spectral decomposition of is given by
.
Suppose that only the first eigenvalues are
positive, i.e.,
. Then
the (singular) covariance matrix can be written as
Note that the covariance matrix of model (10.2) can be written as
(10.3) |
It is common praxis in factor analysis to split the
influences of the factors into common and specific ones.
There are, for example, highly informative factors
that are common to all of the components of and factors that
are specific to certain components.
The factor analysis model used in praxis is a generalization of (10.2):
Define
The generalized factor model (10.4) together with the assumptions given in (10.5) constitute the orthogonal factor model.
|
Note that (10.4) implies for the components of
that
Assume that a factor model with factors was found to be reasonable, i.e., most of the (co)variations of the measures in were explained by the fixed latent factors. The next natural step is to try to understand what these factors represent. To interpret , it makes sense to compute its correlations with the original variables first. This is done for and for to obtain the matrix . The sequence of calculations used here are in fact the same that were used to interprete the PCs in the principal components analysis.
The following covariance between and is obtained via (10.5),
Returning to the psychology example where are the observed scores to different intelligence tests (the WAIS data set in Table B.12 provides an example), we would expect a model with one factor to produce a factor that is positively correlated with all of the components in . For this example the factor represents the overall level of intelligence of an individual. A model with two factors could produce a refinement in explaining the variations of the scores. For example, the first factor could be the same as before (overall level of intelligence), whereas the second factor could be positively correlated with some of the tests, , that are related to the individual's ability to think abstractly and negatively correlated with other tests, , that are related to the individual's practical ability. The second factor would then concern a particular dimension of the intelligence stressing the distinctions between the ``theoretical'' and ``practical'' abilities of the individual. If the model is true, most of the information coming from the scores can be summarized by these two latent factors. Other practical examples are given below.
What happens if we change the scale of to with
?
If the -factor model (10.6) is true for with
, , then, since
The factor loadings are not unique! Suppose that is an orthogonal
matrix. Then in (10.4) can also be written as
This implies that, if a -factor of with factors and loadings is true, then the -factor model with factors and loadings is also true. In practice, we will take advantage of this non-uniqueness. Indeed, referring back to Section 2.6 we can conclude that premultiplying a vector by an orthogonal matrix corresponds to a rotation of the system of axis, the direction of the first new axis being given by the first row of the orthogonal matrix. It will be shown that choosing an appropriate rotation will result in a matrix of loadings that will be easier to interpret. We have seen that the loadings provide the correlations between the factors and the original variables, therefore, it makes sense to search for rotations that give factors that are maximally correlated with various groups of variables.
From a numerical point of view, the non-uniqueness is a drawback. We have to find loadings and specific variances satisfying the decomposition , but no straightforward numerical algorithm can solve this problem due to the multiplicity of the solutions. An acceptable technique is to impose some chosen constraints in order to get--in the best case--an unique solution to the decomposition. Then, as suggested above, once we have a solution we will take advantage of the rotations in order to obtain a solution that is easier to interprete.
An obvious question is: what kind of constraints should we impose in order to
eliminate the non-uniqueness problem? Usually, we impose additional
constraints where
How many parameters does the model (10.7) have without constraints?
If , then the model is undetermined: there are infinitly many solutions to (10.7). This means that the number of parameters of the factorial model is larger than the number of parameters of the original model, or that the number of factors is ``too large'' relative to . In some cases : there is an unique solution to the problem (except for rotation). In practice we usually have that :there are more equations than parameters, thus an exact solution does not exist. In this case approximate solutions are used. An approximation of , for example, is . The last case is the most interesting since the factorial model has less parameters than the original one. Estimation methods are introduced in the next section.
Evaluating the degrees of freedom, , is particularly important, because it already gives an idea of the upper bound on the number of factors we can hope to identify in a factor model. For instance, if , we could not identify a factor model with 2 factors (this results in which has infinitly many solutions). With , only a one factor model gives an approximate solution (). When , models with 1 and 2 factors provide approximate solutions and a model with 3 factors results in an unique solution (up to the rotations) since . A model with 4 or more factors would not be allowed, but of course, the aim of factor analysis is to find suitable models with a small number of factors, i.e., smaller than . The next two examples give more insights into the notion of degrees of freedom.
The solution in Example 10.1
may be unique (up to a rotation), but it is not proper in the sense that
it cannot be interpreted statistically.
Exercise 10.5 gives an example where the specific
variance is negative.
1mm
Even in the case of a unique solution , the solution may be
inconsistent with statistical interpretations.