|
The main statistics presented so far can be computed for the data matrix
from our Boston Housing data set.
The sample means and the sample medians of each variable are
displayed in Table 3.3.
The table also provides the unbiased estimates of the variance of each
variable and the corresponding standard deviations.
The comparison of the means and the medians confirms the assymmetry of
the components of
that was pointed out in Section 1.8.
The (unbiased) sample covariance matrix is given by
the following matrix
:
Analyzing confirms most of the comments made from examining
the scatterplot matrix in Chapter 1.
In particular, the correlation between
(the value of the house) and all the other variables is given by the last row
(or column) of
. The highest correlations (in absolute values) are in
decreasing order
etc.
Using the Fisher's Z-transform on each of the
correlations between and
the other variables would confirm that all are significantly different from zero, except
the correlation between
and
(the indicator variable for the Charles River). We know, however,
that the correlation and Fisher's Z-transform are
not appropriate for binary variable.
The same descriptive statistics can be calculated for
the transformed variables
(transformations were motivated in Section 1.8).
The results are given in Table 3.4
|
If we want to explain the variations of the price
by the variation
of all the other variables
we could estimate
the linear model
|
The value of (0.765) and
(0.759) show that most of the variance of
is explained by the linear model (3.57).
Again we see that the variations of
are mostly explained by (in decreasing
order of the absolute value of the
-statistic)
and
. The other variables
and
seem to have little
influence on the variations of
. This will be confirmed by the testing
procedures that will be developed in Chapter 7.