10.4 Boston Housing

To illustrate how to implement factor analysis we will use the Boston housing data set and the by now well known set of transformations. Once again, the variable $X_4$ (Charles River indicator) will be excluded. As before, standardized variables are used and the analysis is based on the correlation matrix.


Table: Estimated factor loadings, communalities, and specific variances, MLM. 37953 MVAfacthous.xpl
    Estimated factor   Specific
    loadings Communalities variances
  $\hat q_1$ $\hat q_2$ $\hat q_3$ $\hat{h}_j^2$ $\hat{\psi}_{jj}=1-\hat{h}_{j}^2$
1 crime 0.9295 $-$0.1653 0.1107 0.9036 0.0964
2 large lots $-$0.5823 $-$0.0379 0.2902 0.4248 0.5752
3 nonretail acres 0.8192 0.0296 $-$0.1378 0.6909 0.3091
5 nitric oxides 0.8789 $-$0.0987 $-$0.2719 0.8561 0.1439
6 rooms $-$0.4447 $-$0.5311 $-$0.0380 0.4812 0.5188
7 prior 1940 0.7837 0.0149 $-$0.3554 0.7406 0.2594
8 empl. centers $-$0.8294 0.1570 0.4110 0.8816 0.1184
9 accessibility 0.7955 $-$0.3062 0.4053 0.8908 0.1092
10 tax-rate 0.8262 $-$0.1401 0.2906 0.7867 0.2133
11 pupil/teacher 0.5051 0.1850 0.1553 0.3135 0.6865
12 blacks $-$0.4701 $-$0.0227 $-$0.1627 0.2480 0.7520
13 lower status 0.7601 0.5059 $-$0.0070 0.8337 0.1663
14 value $-$0.6942 $-$0.5904 $-$0.1798 0.8628 0.1371


Figure: Factor analysis for Boston housing data, MLM. 37957 MVAfacthous.xpl
\includegraphics[width=0.75\defpicwidth]{MVAfacthousmlmo.ps}


Table: Estimated factor loadings, communalities, and specific variances, MLM, varimax rotation. 37960 MVAfacthous.xpl
    Estimated factor   Specific
    loadings Communalities variances
  $\hat q_1$ $\hat q_2$ $\hat q_3$ $\hat{h}_j^2$ $\hat{\psi}_{jj}=1-\hat{h}_{j}^2$
1 crime 0.8413 $-$0.0940 $-$0.4324 0.9036 0.0964
2 large lots $-$0.3326 $-$0.1323 0.5447 0.4248 0.5752
3 nonretail acres 0.6142 0.1238 $-$0.5462 0.6909 0.3091
5 nitric oxides 0.5917 0.0221 $-$0.7110 0.8561 0.1439
6 rooms $-$0.3950 $-$0.5585 0.1153 0.4812 0.5188
7 prior 1940 0.4665 0.1374 $-$0.7100 0.7406 0.2594
8 empl. centers $-$0.4747 0.0198 0.8098 0.8816 0.1184
9 accessibility 0.8879 $-$0.2874 $-$0.1409 0.8908 0.1092
10 tax-rate 0.8518 $-$0.1044 $-$0.2240 0.7867 0.2133
11 pupil/teacher 0.5090 0.2061 $-$0.1093 0.3135 0.6865
12 blacks $-$0.4834 $-$0.0418 0.1122 0.2480 0.7520
13 lower status 0.6358 0.5690 $-$0.3252 0.8337 0.1663
14 value $-$0.6817 $-$0.6193 0.1208 0.8628 0.1371


Figure: Factor analysis for Boston housing data, MLM after varimax rotation. 37964 MVAfacthous.xpl
\includegraphics[width=0.75\defpicwidth]{MVAfacthousmlm.ps}

In Section 10.3, we described a practical implementation of factor analysis. Based on principal components, three factors were chosen and factor analysis was applied using the maximum likelihood method (MLM), the principal factor method (PFM), and the principal component method (PCM). For illustration, the MLM will be presented with and without varimax rotation.

Table 10.2 gives the MLM factor loadings without rotation and Table 10.3 gives the varimax version of this analysis. The corresponding graphical representations of the loadings are displayed in Figures 10.2 and 10.3. We can see that the varimax does not significantly change the interpretation of the factors obtained by the MLM. Factor 1 can be roughly interpreted as a ``quality of life factor'' because it is positively correlated with variables like $X_{11}$ and negatively correlated with $X_8$, both having low specific variances. The second factor may be interpreted as a ``residential factor'', since it is highly correlated with variables $X_6$, and $X_{13}$. The most striking difference between the results with and without varimax rotation can be seen by comparing the lower left corners of Figures 10.2 and 10.3. There is a clear separation of the variables in the varimax version of the MLM. Given this arrangement of the variables in Figure 10.3, we can interpret factor 3 as an employment factor, since we observe high correlations with $X_8$ and $X_5$.


Table: Estimated factor loadings, communalities, and specific variances, PCM, varimax rotation. 37967 MVAfacthous.xpl
    Estimated factor   Specific
    loadings Communalities variances
  $\hat q_1$ $\hat q_2$ $\hat q_3$ $\hat{h}_j^2$ $\hat{\psi}_{jj}=1-\hat{h}_{j}^2$
1 crime 0.9164 0.0152 0.2357 0.8955 0.1045
2 large lots $-$0.6772 0.0762 0.4490 0.6661 0.3339
3 nonretail acres 0.8614 $-$0.1321 $-$0.1115 0.7719 0.2281
5 nitric oxides 0.9172 0.0573 $-$0.0874 0.8521 0.1479
6 rooms $-$0.3590 0.7896 0.1040 0.7632 0.2368
7 prior 1940 0.8392 $-$0.0008 $-$0.2163 0.7510 0.2490
8 empl. centers $-$0.8928 $-$0.1253 0.2064 0.8554 0.1446
9 accessibility 0.7562 0.0927 0.4616 0.7935 0.2065
10 tax-rate 0.7891 $-$0.0370 0.4430 0.8203 0.1797
11 pupil/teacher 0.4827 $-$0.3911 0.1719 0.4155 0.5845
12 blacks $-$0.4499 0.0368 $-$0.5612 0.5188 0.4812
13 lower status 0.6925 $-$0.5843 0.0035 0.8209 0.1791
14 value $-$0.5933 0.6720 $-$0.1895 0.8394 0.1606


Figure: Factor analysis for Boston housing data, PCM after varimax rotation. 37971 MVAfacthous.xpl
\includegraphics[width=0.75\defpicwidth]{MVAfacthouspcm.ps}


Table: Estimated factor loadings, communalities, and specific variances, PFM, varimax rotation. 37974 MVAfacthous.xpl
    Estimated factor   Specific
    loadings Communalities variances
  $\hat q_1$ $\hat q_2$ $\hat q_3$ $\hat{h}_j^2$ $\hat{\psi}_{jj}=1-\hat{h}_{j}^2$
1 crime 0.8579 $-$0.0270 $-$0.4175 0.9111 0.0889
2 large lots $-$0.2953 0.2168 0.5756 0.4655 0.5345
3 nonretail acres 0.5893 $-$0.2415 $-$0.5666 0.7266 0.2734
5 nitric oxides 0.6050 $-$0.0892 $-$0.6855 0.8439 0.1561
6 rooms $-$0.2902 0.6280 0.1296 0.4954 0.5046
7 prior 1940 0.4702 $-$0.1741 $-$0.6733 0.7049 0.2951
8 empl. centers $-$0.4988 0.0414 0.7876 0.8708 0.1292
9 accessibility 0.8830 0.1187 $-$0.1479 0.8156 0.1844
10 tax-rate 0.8969 $-$0.0136 $-$0.1666 0.8325 0.1675
11 pupil/teacher 0.4590 $-$0.2798 $-$0.1412 0.3090 0.6910
12 blacks $-$0.4812 0.0666 0.0856 0.2433 0.7567
13 lower status 0.5433 $-$0.6604 $-$0.3193 0.8333 0.1667
14 value $-$0.6012 0.7004 0.0956 0.8611 0.1389


Figure: Factor analysis for Boston housing data, PFM after varimax rotation. 37978 MVAfacthous.xpl
\includegraphics[width=0.75\defpicwidth]{MVAfacthouspfm.ps}

We now turn to the PCM and PFM analyses. The results are presented in Tables 10.4 and 10.5 and in Figures 10.4 and 10.5. We would like to focus on the PCM, because this 3-factor model yields only one specific variance (unexplained variation) above 0.5. Looking at Figure 10.4, it turns out that factor 1 remains a ``quality of life factor'' which is clearly visible from the clustering of $X_5$, $X_3$, $X_{10}$ and $X_1$ on the right-hand side of the graph, while the variables $X_8$, $X_2$, $X_{14}$, $X_{12}$ and $X_6$ are on the left-hand side. Again, the second factor is a ``residential factor'', clearly demonstrated by the location of variables $X_6$, $X_{14}$, $X_{11}$, and $X_{13}$. The interpretation of the third factor is more difficult because all of the loadings (except for $X_{12}$) are very small.