|
We have motivated the transformation of the variables of the Boston housing
data many times before.
Now we illustrate the cluster algorithm with the transformed data
excluding
(Charles River indicator). Among the various algorithms,
the results from the
Ward algorithm are presented since this algorithm
gave the most sensible results. In order
to be coherent with our previous analysis, we standardize each variable.
The dendrogram of the Ward method is displayed in Figure 11.7.
Two dominant clusters are visible. A further refinement of say, 4
clusters, could be considered at a lower level of distance.
To interprete the two clusters, we present
the mean values and their respective standard errors of the
thirteen
variables by group in Table 11.3.
Comparing the mean values for both groups shows that all the
differences in the means are individually significant and that
cluster one corresponds to housing districts with better living quality
and higher house prices, whereas cluster two corresponds to less favored districts
in Boston. This can be confirmed, for instance, by a lower crime rate,
a higher proportion of residential land, lower proportion of blacks,
etc. for cluster one.
Cluster two is identified by a higher proportion of older houses, a higher
pupil/teacher ratio and a higher percentage of the lower status population.
This interpretation is underlined by visual inspection of all the variables presented on scatterplot matrices in Figures 11.8 and 11.9. For example, the lower right boxplot of Figure 11.9 and the correspondingly colored clusters in the last row confirm the role of each variable in determining the clusters. This interpretation perfectly coincides with the previous PC analysis (Figure 9.11). The quality of life factor is clearly visible in Figure 11.10, where cluster membership is distinguished by the shape and color of the points graphed according to the first two principal components. Clearly, the first PC completely separates the two clusters and corresponds, as we have discussed in Chapter 9, to a quality of life and house indicator.