10.4 Boston Housing
To illustrate how to implement factor analysis we will use the
Boston housing data set and the by now well known set of
transformations. Once again, the variable
(Charles River
indicator) will be excluded. As before,
standardized variables are used and the analysis is based on the correlation
matrix.
Table:
Estimated factor loadings, communalities, and specific variances, MLM.
MVAfacthous.xpl
|
|
Estimated factor |
|
Specific |
|
|
loadings |
Communalities |
variances |
|
 |
 |
 |
 |
 |
1 |
crime |
0.9295 |
0.1653 |
0.1107 |
0.9036 |
0.0964 |
2 |
large lots |
0.5823 |
0.0379 |
0.2902 |
0.4248 |
0.5752 |
3 |
nonretail acres |
0.8192 |
0.0296 |
0.1378 |
0.6909 |
0.3091 |
5 |
nitric oxides |
0.8789 |
0.0987 |
0.2719 |
0.8561 |
0.1439 |
6 |
rooms |
0.4447 |
0.5311 |
0.0380 |
0.4812 |
0.5188 |
7 |
prior 1940 |
0.7837 |
0.0149 |
0.3554 |
0.7406 |
0.2594 |
8 |
empl. centers |
0.8294 |
0.1570 |
0.4110 |
0.8816 |
0.1184 |
9 |
accessibility |
0.7955 |
0.3062 |
0.4053 |
0.8908 |
0.1092 |
10 |
tax-rate |
0.8262 |
0.1401 |
0.2906 |
0.7867 |
0.2133 |
11 |
pupil/teacher |
0.5051 |
0.1850 |
0.1553 |
0.3135 |
0.6865 |
12 |
blacks |
0.4701 |
0.0227 |
0.1627 |
0.2480 |
0.7520 |
13 |
lower status |
0.7601 |
0.5059 |
0.0070 |
0.8337 |
0.1663 |
14 |
value |
0.6942 |
0.5904 |
0.1798 |
0.8628 |
0.1371 |
|
Table:
Estimated factor loadings, communalities, and specific variances, MLM, varimax rotation.
MVAfacthous.xpl
|
|
Estimated factor |
|
Specific |
|
|
loadings |
Communalities |
variances |
|
 |
 |
 |
 |
 |
1 |
crime |
0.8413 |
0.0940 |
0.4324 |
0.9036 |
0.0964 |
2 |
large lots |
0.3326 |
0.1323 |
0.5447 |
0.4248 |
0.5752 |
3 |
nonretail acres |
0.6142 |
0.1238 |
0.5462 |
0.6909 |
0.3091 |
5 |
nitric oxides |
0.5917 |
0.0221 |
0.7110 |
0.8561 |
0.1439 |
6 |
rooms |
0.3950 |
0.5585 |
0.1153 |
0.4812 |
0.5188 |
7 |
prior 1940 |
0.4665 |
0.1374 |
0.7100 |
0.7406 |
0.2594 |
8 |
empl. centers |
0.4747 |
0.0198 |
0.8098 |
0.8816 |
0.1184 |
9 |
accessibility |
0.8879 |
0.2874 |
0.1409 |
0.8908 |
0.1092 |
10 |
tax-rate |
0.8518 |
0.1044 |
0.2240 |
0.7867 |
0.2133 |
11 |
pupil/teacher |
0.5090 |
0.2061 |
0.1093 |
0.3135 |
0.6865 |
12 |
blacks |
0.4834 |
0.0418 |
0.1122 |
0.2480 |
0.7520 |
13 |
lower status |
0.6358 |
0.5690 |
0.3252 |
0.8337 |
0.1663 |
14 |
value |
0.6817 |
0.6193 |
0.1208 |
0.8628 |
0.1371 |
|
Figure:
Factor analysis for Boston housing data, MLM after varimax rotation.
MVAfacthous.xpl
|
In Section 10.3, we described a practical implementation
of factor analysis. Based on principal components,
three factors were chosen and factor analysis was applied using the
maximum likelihood method (MLM), the principal factor method (PFM),
and the principal component method (PCM). For illustration, the MLM
will be presented with and without varimax rotation.
Table 10.2 gives the MLM factor loadings without rotation
and Table 10.3 gives the varimax version of this analysis.
The corresponding graphical representations of the loadings are displayed in
Figures 10.2 and 10.3. We can see
that the varimax does not significantly change the interpretation
of the factors obtained by the MLM.
Factor 1 can be roughly interpreted as a ``quality of life factor''
because it is
positively correlated with variables like
and negatively correlated
with
, both having low specific variances. The second factor may be
interpreted as a ``residential factor'', since it is highly correlated with
variables
, and
. The most striking difference between
the results with and without varimax rotation can be seen by comparing
the lower left corners
of Figures 10.2 and 10.3. There is
a clear separation of the variables in the varimax version of the MLM.
Given this arrangement of the variables in Figure 10.3,
we can interpret factor 3 as an employment factor, since we observe high
correlations with
and
.
Table:
Estimated factor loadings, communalities, and specific variances, PCM, varimax rotation.
MVAfacthous.xpl
|
|
Estimated factor |
|
Specific |
|
|
loadings |
Communalities |
variances |
|
 |
 |
 |
 |
 |
1 |
crime |
0.9164 |
0.0152 |
0.2357 |
0.8955 |
0.1045 |
2 |
large lots |
0.6772 |
0.0762 |
0.4490 |
0.6661 |
0.3339 |
3 |
nonretail acres |
0.8614 |
0.1321 |
0.1115 |
0.7719 |
0.2281 |
5 |
nitric oxides |
0.9172 |
0.0573 |
0.0874 |
0.8521 |
0.1479 |
6 |
rooms |
0.3590 |
0.7896 |
0.1040 |
0.7632 |
0.2368 |
7 |
prior 1940 |
0.8392 |
0.0008 |
0.2163 |
0.7510 |
0.2490 |
8 |
empl. centers |
0.8928 |
0.1253 |
0.2064 |
0.8554 |
0.1446 |
9 |
accessibility |
0.7562 |
0.0927 |
0.4616 |
0.7935 |
0.2065 |
10 |
tax-rate |
0.7891 |
0.0370 |
0.4430 |
0.8203 |
0.1797 |
11 |
pupil/teacher |
0.4827 |
0.3911 |
0.1719 |
0.4155 |
0.5845 |
12 |
blacks |
0.4499 |
0.0368 |
0.5612 |
0.5188 |
0.4812 |
13 |
lower status |
0.6925 |
0.5843 |
0.0035 |
0.8209 |
0.1791 |
14 |
value |
0.5933 |
0.6720 |
0.1895 |
0.8394 |
0.1606 |
|
Figure:
Factor analysis for Boston housing data, PCM after varimax rotation.
MVAfacthous.xpl
|
Table:
Estimated factor loadings, communalities, and specific variances, PFM, varimax rotation.
MVAfacthous.xpl
|
|
Estimated factor |
|
Specific |
|
|
loadings |
Communalities |
variances |
|
 |
 |
 |
 |
 |
1 |
crime |
0.8579 |
0.0270 |
0.4175 |
0.9111 |
0.0889 |
2 |
large lots |
0.2953 |
0.2168 |
0.5756 |
0.4655 |
0.5345 |
3 |
nonretail acres |
0.5893 |
0.2415 |
0.5666 |
0.7266 |
0.2734 |
5 |
nitric oxides |
0.6050 |
0.0892 |
0.6855 |
0.8439 |
0.1561 |
6 |
rooms |
0.2902 |
0.6280 |
0.1296 |
0.4954 |
0.5046 |
7 |
prior 1940 |
0.4702 |
0.1741 |
0.6733 |
0.7049 |
0.2951 |
8 |
empl. centers |
0.4988 |
0.0414 |
0.7876 |
0.8708 |
0.1292 |
9 |
accessibility |
0.8830 |
0.1187 |
0.1479 |
0.8156 |
0.1844 |
10 |
tax-rate |
0.8969 |
0.0136 |
0.1666 |
0.8325 |
0.1675 |
11 |
pupil/teacher |
0.4590 |
0.2798 |
0.1412 |
0.3090 |
0.6910 |
12 |
blacks |
0.4812 |
0.0666 |
0.0856 |
0.2433 |
0.7567 |
13 |
lower status |
0.5433 |
0.6604 |
0.3193 |
0.8333 |
0.1667 |
14 |
value |
0.6012 |
0.7004 |
0.0956 |
0.8611 |
0.1389 |
|
Figure:
Factor analysis for Boston housing data,
PFM after varimax rotation.
MVAfacthous.xpl
|
We now turn to the PCM and PFM analyses. The results are presented in
Tables 10.4 and 10.5 and in
Figures 10.4 and 10.5.
We would like to focus on the PCM, because this
3-factor model yields only one specific variance (unexplained variation)
above 0.5. Looking at Figure 10.4, it turns out that
factor 1 remains a ``quality of life factor'' which is clearly visible
from the clustering of
,
,
and
on the right-hand
side of the graph, while the variables
,
,
,
and
are on the left-hand side. Again, the second factor is
a ``residential factor'',
clearly demonstrated by the location of variables
,
,
,
and
. The interpretation of the third factor is more difficult because
all of the loadings (except for
) are very small.