Let us first summarize some properties which were
already derived in the previous chapter.
Often it is interesting to partition
into sub-vectors
and
. The following theorem tells us how to correct
to obtain a vector which is independent of
.
THEOREM 5.1
Let
![$X = {X_1\choose X_2} \sim N_p(\mu ,\Sigma)$](mvahtmlimg1622.gif)
,
![$X_1\in \mathbb{R}^r$](mvahtmlimg1623.gif)
,
![$X_2\in \mathbb{R}^{p-r}$](mvahtmlimg1624.gif)
. Define
![$X_{2.1}=X_2-\Sigma _{21}\Sigma _{11}^{-1}X_1$](mvahtmlimg1625.gif)
from the partitioned covariance matrix
Then
![$\displaystyle X_1\sim N_r(\mu _1,\Sigma_{11}),$](mvahtmlimg1627.gif) |
|
|
(5.5) |
![$\displaystyle X_{2.1}\sim N_{p-r}(\mu_{2.1},\Sigma_{22.1})$](mvahtmlimg1628.gif) |
|
|
(5.6) |
are independent with
![\begin{displaymath}
\mu_{2.1}=\mu_2-\Sigma_{21}\Sigma_{11}^{-1}\mu_1,\quad \Sigma_{22.1}
=\Sigma_{22}-\Sigma_{21}\Sigma_{11}^{-1}\Sigma_{12}.
\end{displaymath}](mvahtmlimg1629.gif) |
(5.7) |
PROOF:
Then, by (5.2)
and
are both normal. Note that
Recall that
.
Hence
!
Using (5.2) again we also have the joint distribution of (
), namely
With this block diagonal structure of the covariance matrix, the joint pdf of
(
) can easily be factorized into
from which the independence between
and
follows.
The next two corollaries are direct consequences of Theorem 5.1.
The independence of two linear transforms of a multinormal
can be
shown via the following corollary.
COROLLARY 5.2
If
![$ X \sim N_p (\mu, \Sigma) $](mvahtmlimg1377.gif)
and given some matrices
![$\data{A}$](mvahtmlimg319.gif)
and
![$\data{B}$](mvahtmlimg477.gif)
, then
![$\data{A}X$](mvahtmlimg1641.gif)
and
![$\data{B}X$](mvahtmlimg1642.gif)
are independent if and only if
![$\data{A}\Sigma
\data{B}^{\top}=0$](mvahtmlimg1643.gif)
.
The following theorem is also useful. It generalizes Theorem 4.6.
The proof is left as an exercise.
THEOREM 5.2
If
![$ X \sim N_p (\mu, \Sigma) $](mvahtmlimg1377.gif)
,
![$\data{A}(q\times p)$](mvahtmlimg1644.gif)
,
![$c\in \mathbb{R}^q$](mvahtmlimg1645.gif)
and
![$q\leq p$](mvahtmlimg1646.gif)
, then
![$Y = \data{A} X +c$](mvahtmlimg1397.gif)
is a
![$q$](mvahtmlimg788.gif)
-variate Normal, i.e.,
The conditional distribution of
given
is given by the next theorem.
THEOREM 5.3
The conditional distribution
of
![$X_2$](mvahtmlimg13.gif)
given
![$X_1=x_1$](mvahtmlimg1109.gif)
is normal with mean
![$\mu _2+\Sigma _{21}\Sigma
_{11}^{-1}(x_1-\mu _1)$](mvahtmlimg1648.gif)
and covariance
![$\Sigma _{22.1}$](mvahtmlimg1649.gif)
, i.e.,
![\begin{displaymath}
(X_2\mid X_1=x_1)\sim N_{p-r}(\mu _2+\Sigma _{21}\Sigma _{11}^{-1}(x_1-\mu
_1), \Sigma _{22.1}). \end{displaymath}](mvahtmlimg1650.gif) |
(5.8) |
PROOF:
Since
,
for a fixed value of
,
is equivalent to
plus
a constant term:
which has the normal distribution
.
Note that the conditional mean of
is a linear function of
and that the conditional variance does not depend on the particular
value of
. In the following example we consider a specific distribution.
EXAMPLE 5.1
Suppose that
![$p=2$](mvahtmlimg298.gif)
,
![$r=1$](mvahtmlimg1655.gif)
,
![$\mu ={\displaystyle \left( {0 \atop 0} \right)}$](mvahtmlimg1656.gif)
and
![$\Sigma ={\displaystyle \left( {1 \atop -0.8}\ {-0.8 \atop 2} \right)}$](mvahtmlimg1657.gif)
.
Then
![$\Sigma _{11}=1$](mvahtmlimg1658.gif)
,
![$\Sigma _{21}=-0.8$](mvahtmlimg1659.gif)
and
![$\Sigma _{22.1}=\Sigma _{22}-\Sigma _{21}\Sigma _{11}^{-1}\Sigma
_{12}=2-(0.8)^2=1.36$](mvahtmlimg1660.gif)
.
Hence the marginal pdf of
![$X_1$](mvahtmlimg14.gif)
is
and the conditional pdf of
![$(X_2\mid X_1=x_1)$](mvahtmlimg1662.gif)
is given by
As mentioned above, the conditional mean of
![$(X_2\mid X_1)$](mvahtmlimg1654.gif)
is linear in
![$X_{1}$](mvahtmlimg2.gif)
. The shift in the density of
![$(X_2\mid X_1)$](mvahtmlimg1654.gif)
can be seen in
Figure
5.1.
Sometimes it will be useful to reconstruct a joint distribution from
the marginal distribution of
and the conditional distribution
. The following theorem
shows under which conditions this can be easily done in the multinormal
framework.
THEOREM 5.4
If
![$ X_1 \sim N_r(\mu_1, \Sigma_{11})$](mvahtmlimg1666.gif)
and
![$\left(X_2\vert X_1 = x_1\right) \sim N_{p-r}
({\cal{A}}x_1+b, \Omega)$](mvahtmlimg1667.gif)
where
![$\Omega$](mvahtmlimg1668.gif)
does not depend on
![$x_1$](mvahtmlimg1210.gif)
, then
![$X = {X_1\choose X_2} \sim N_p(\mu ,\Sigma)$](mvahtmlimg1622.gif)
, where
EXAMPLE 5.2
Consider the following random variables
Using Theorem (
5.4), where
![${\cal{A}}=(2\quad 1)^{\top}$](mvahtmlimg1671.gif)
,
![$b=(0\quad 1)^{\top}$](mvahtmlimg1672.gif)
and
![$\Omega={\cal{I}}_2$](mvahtmlimg1673.gif)
, we easily obtain the following result:
In particular, the marginal distribution of
![$X_2$](mvahtmlimg13.gif)
is
thus conditional on
![$X_1$](mvahtmlimg14.gif)
, the two components of
![$X_2$](mvahtmlimg13.gif)
are independent but
marginally they are not!
Note that the marginal mean vector and covariance matrix of
could have
also been computed directly by using (4.28)-(4.29).
Using the derivation above, however, provides us with useful properties:
we have multinormality!
Conditional Approximations
As we saw in Chapter 4 (Theorem 4.3), the
conditional expectation
is the mean squared error (MSE) best
approximation of
by a function of
. We have in this case that
![\begin{displaymath}
X_2 = E(X_2\vert X_1) + U = \mu_2 + \Sigma_{21} \Sigma_{11}^{-1} (X_1 - \mu_1) + U.
\end{displaymath}](mvahtmlimg1676.gif) |
(5.9) |
Hence, the best approximation of
by
is the
linear approximation that can be written as:
![\begin{displaymath}
X_2 = \beta_0 + {\cal{B}}\, X_1 + U
\end{displaymath}](mvahtmlimg1678.gif) |
(5.10) |
with
,
and
.
Consider now the particular case where
. Now
and
is a row vector
of dimension
![\begin{displaymath}
X_2 = \beta_0 + \beta^{\top}\, X_1 + U.
\end{displaymath}](mvahtmlimg1687.gif) |
(5.11) |
This means, geometrically speaking, that
the best MSE approximation of
by a function of
is hyperplane.
The marginal variance of
can be decomposed via (5.11):
![\begin{displaymath}
\sigma_{22} = \beta^{\top} \Sigma_{11} \beta + \sigma_{22.1} = \sigma_{21}
\Sigma_{11}^{-1} \sigma_{12} + \sigma_{22.1}.
\end{displaymath}](mvahtmlimg1688.gif) |
(5.12) |
The ratio
![\begin{displaymath}
\rho^2_{2.1 \ldots r} = \frac{\sigma_{21} \Sigma_{11}^{-1} \sigma_{12}}
{\sigma_{22}}
\end{displaymath}](mvahtmlimg1689.gif) |
(5.13) |
is known as the square of the multiple correlation between
and the
variables
. It is the percentage of the variance of
which is
explained by the linear approximation
. The last
term in (5.12) is the residual variance of
. The square of the
multiple correlation corresponds to the coefficient of determination
introduced in Section 3.4, see (3.39), but here it is defined
in terms of the r.v.
and
. It can be shown that
is also the
maximum correlation attainable between
and a linear combination of the
elements of
, the optimal linear combination being precisely given by
. Note, that when
, the multiple
correlation
coincides with the usual simple correlation
between
and
.
EXAMPLE 5.3
Consider the ``classic blue'' pullover example (Example
3.15) and suppose
that
![$X_1$](mvahtmlimg14.gif)
(sales),
![$X_2$](mvahtmlimg13.gif)
(price),
![$X_3$](mvahtmlimg221.gif)
(advertisement) and
![$X_4$](mvahtmlimg11.gif)
(sales
assistants) are normally distributed with
(These are in fact the sample mean and the sample covariance matrix but in this
example we pretend that they are the true parameter values.)
The conditional distribution of
given
is thus an
univariate normal with mean
and variance
The linear approximation of the sales
![$(X_1)$](mvahtmlimg1699.gif)
by the price
![$(X_2)$](mvahtmlimg1700.gif)
,
advertisement
![$(X_3)$](mvahtmlimg1701.gif)
and sales assistants
![$(X_4)$](mvahtmlimg1702.gif)
is provided by the
conditional mean above.(Note that this coincides with the results of Example
3.15 due to the particular choice of
![$\mu$](mvahtmlimg575.gif)
and
![$\Sigma$](mvahtmlimg869.gif)
). The quality of the
approximation is given by the multiple correlation
![${\rho_{1.234}^2}=\frac{\sigma_{12}\Sigma_{22}^{-1}\sigma_{21}}{\sigma_{11}}=
0.907$](mvahtmlimg1703.gif)
. (Note again that this coincides with the coefficient of determination
![$r^2$](mvahtmlimg851.gif)
found in Example
3.15).
This example also illustrates the concept of partial
correlation. The correlation matrix between the 4 variables is given by
so that the correlation between
![$X_1$](mvahtmlimg14.gif)
(sales) and
![$X_2$](mvahtmlimg13.gif)
(price) is
![$-0.168.$](mvahtmlimg1705.gif)
We can compute the conditional distribution of
![$(X_1,X_2)$](mvahtmlimg1228.gif)
given
![$(X_3, X_4)$](mvahtmlimg166.gif)
, which is a bivariate normal with mean:
and covariance matrix:
In particular, the last covariance matrix allows the partial
correlation between
![$X_1$](mvahtmlimg14.gif)
and
![$X_2$](mvahtmlimg13.gif)
to be computed
for a fixed level of
![$X_3$](mvahtmlimg221.gif)
and
![$X_4$](mvahtmlimg11.gif)
:
so that in this particular example with a fixed level of advertisement and sales
assistance, the negative correlation between price and sales is more important than the marginal one.
Summary
![$\ast$](mvahtmlimg108.gif)
-
If
, then a linear transformation
,
, where
,
has distribution
.
![$\ast$](mvahtmlimg108.gif)
-
Two linear transformations
and
with
are independent if and only if
.
![$\ast$](mvahtmlimg108.gif)
-
If
and
are partitions of
,
then the conditional distribution of
given
is
again normal.
![$\ast$](mvahtmlimg108.gif)
-
In the multivariate normal case,
is independent of
if and
only if
.
![$\ast$](mvahtmlimg108.gif)
-
The conditional expectation of
is a linear function if
.
![$\ast$](mvahtmlimg108.gif)
-
The multiple correlation coefficient is defined as
![$\ast$](mvahtmlimg108.gif)
-
The multiple correlation coefficient is the percentage of the variance of
explained by
the linear approximation
.