18.4 Boston Housing

Coming back to the Boston housing data set, we compare the results of exploratory projection pursuit on the original data ${\data{X}}
$ and the transformed data $\widehat{\data{X}}$ motivated in Section 1.8. So we exclude $X_4$ (indicator of Charles River) from the present analysis.

The aim of this analysis is to see from a different angle whether our proposed transformations yield more normal distributions and whether it will yield data with less outliers. Both effects will be visible in our projection pursuit analysis.

We first apply the Jones and Sibson index to the non-transformed data with 50 randomly chosen 13-dimensional directions. Figure 18.8 displays the results in the following form.

Figure: Projection Pursuit with the Sibson-Jones index with 13 original variables. 51859 MVAppsib.xpl
\includegraphics[width=1\defpicwidth]{MVAppsibbh2.ps}

In the lower part, we see the values of the Jones and Sibson index. It should be constant for 13-dimensional normal data. We observe that this is clearly not the case. In the upper part of Figure 18.8 we show the standard normal density as a green curve and two densities corresponding to two extreme index values. The red, slim curve corresponds to the maximal value of the index among the 50 projections. The blue curve, which is close to the normal, corresponds to the minimal value of the Jones and Sibson index. The corresponding values of the indices have the same color in the lower part of Figure 18.8. Below the densities, a jitter plot shows the distribution of the projected points $\alpha^\top x_i$ ($i=1,\dots,506$). We conclude from the outlying projection in the red distribution that several points are in conflict with the normality assumption.

Figure: Projection Pursuit with the Sibson-Jones index with 13 transformed variables. 51863 MVAppsib.xpl
\includegraphics[width=1\defpicwidth]{MVAppsibbh1.ps}

Figure 18.9 presents an analysis with the same design for the transformed data. We observe in the lower part of the figure values that are much lower for the Jones and Sibson index (by a factor of 10) with lower variability which suggests that the transformed data is closer to the normal. (``Closeness'' is interpreted here in the sense of the Jones and Sibson index.) This is confirmed by looking to the upper part of Figure 18.9 which has a significantly less outlying structure than in Figure 18.8.