7.3 Boston Housing
Returning to the Boston housing data set, we are now in a position
to test if the means of the variables vary according to their location,
for example, when they are located in a district with high valued houses.
In Chapter 1, we built 2 groups of observations according
to the value of
being less than or equal to the median of
(a group of 256 districts) and greater than the median (a group of 250
districts).
In what follows, we use the transformed variables motivated in
Section 1.8.
Testing the equality of the means from the two groups was proposed in
a multivariate setup, so we restrict the analysis to the variables
,
and
to see if the differences between the two groups
that were identified in Chapter 1 can be confirmed by a formal test.
As in Test Problem 8, the hypothesis to be tested is
is not known. The
-statistic given in (7.13) is equal to
126.30, which is much higher than the critical value
.
Therefore, we reject the hypothesis of equal means.
To see which component,
, or
,
is responsible for
this rejection, take a look at the simultaneous confidence
intervals defined in (7.14):
These confidence intervals confirm that all of the
are significantly different from
zero (note there is a negative effect for
:
weighted distances to employment centers)
MVAsimcibh.xpl
.
We could also check if the factor ``being bounded by the river''
(variable
) has some effect on the other variables. To do this compare
the means of
.
There are two groups:
districts bounded by the river and
districts not bounded by the river. Test Problem 8
(
) is applied again with
.
The resulting test statistic,
, is highly significant
(
).
The simultaneous confidence intervals indicate that only
(the
value of the houses) is responsible for the hypothesis being rejected!
At a significance level of 0.95
In Chapter 3 a linear model was proposed that explained the
variations of the price
by the variations of the other variables.
Using the same procedure that was shown in Testing Problem 7, we
are in a position to test a set of linear restrictions on the vector of
regression coefficients
.
The model we estimated in Section 3.7 provides the following
(
MVAlinregbh.xpl
):
Variable |
 |
 |
 |
p-value |
constant |
4.1769 |
0.3790 |
11.020 |
0.0000 |
 |
0.0146 |
0.0117 |
1.254 |
0.2105 |
 |
0.0014 |
0.0056 |
0.247 |
0.8051 |
 |
0.0127 |
0.0223 |
0.570 |
0.5692 |
 |
0.1100 |
0.0366 |
3.002 |
0.0028 |
 |
0.2831 |
0.1053 |
2.688 |
0.0074 |
 |
0.4211 |
0.1102 |
3.822 |
0.0001 |
 |
0.0064 |
0.0049 |
1.317 |
0.1885 |
 |
0.1832 |
0.0368 |
4.977 |
0.0000 |
 |
0.0684 |
0.0225 |
3.042 |
0.0025 |
 |
0.2018 |
0.0484 |
4.167 |
0.0000 |
 |
0.0400 |
0.0081 |
4.946 |
0.0000 |
 |
0.0445 |
0.0115 |
3.882 |
0.0001 |
 |
0.2626 |
0.0161 |
16.320 |
0.0000 |
Recall that the estimated residuals
did not
show a big departure from normality, which means that the testing procedure
developed above can be used.
- First a global test of significance for the regression
coefficients is performed,
This is obtained by defining
and
so that
is equivalent to
where
.
Based on the observed values
. This is highly significant (
), thus
we reject
. Note that under
where
.
- Since we are interested in the effect that being located close to the
river has on the value of the houses, the second test is
. This is done by fixing
and
to obtain the equivalent hypothesis
.
The result
is again significant:
(
) with a
-value
of 0.0028. Note that this is the same
-value obtained in the individual
test
in Chapter 3, computed using a different setup.
- A third test notices the fact that some of the regressors in the full
model (3.57) appear to be insignificant (that is they have
high individual
-values). It can be confirmed from a joint test
if the corresponding reduced model, formulated by deleting the
insignificant variables, is rejected by the data. We want to test
. Hence,
and
. The test statistic is 0.9344, which is not significant
for
. Given that the
-value is equal to 0.44,
we cannot reject the null hypothesis nor the corresponding reduced model.
The value of
under the null hypothesis is
A possible reduced model is
Estimating this reduced model using OLS, as was done in Chapter 3,
provides the results shown in Table 7.1.
Table:
Linear Regression for Boston Housing Data Set.
MVAlinreg2bh.xpl
Variable |
 |
 |
 |
p-value |
const |
4.1582 |
0.3628 |
11.462 |
0.0000 |
 |
0.1087 |
0.0362 |
2.999 |
0.0028 |
 |
0.3055 |
0.0973 |
3.140 |
0.0018 |
 |
0.4668 |
0.1059 |
4.407 |
0.0000 |
 |
0.1855 |
0.0327 |
5.679 |
0.0000 |
 |
0.0492 |
0.0183 |
2.690 |
0.0074 |
 |
0.2096 |
0.0446 |
4.705 |
0.0000 |
 |
0.0410 |
0.0078 |
5.280 |
0.0000 |
 |
0.0481 |
0.0112 |
4.306 |
0.0000 |
 |
0.2588 |
0.0149 |
17.396 |
0.0000 |
|
Note that the reduced model has
which is very close to
obtained from the full model. Clearly, including variables
, and
does not provide valuable information in
explaining the variation of
, the price of the houses.