5.1 Elementary Properties of the Multinormal

Let us first summarize some properties which were already derived in the previous chapter.

Often it is interesting to partition $X$ into sub-vectors $X_{1}$ and $X_{2}$. The following theorem tells us how to correct $X_{2}$ to obtain a vector which is independent of $X_{1}$.

THEOREM 5.1   Let $X = {X_1\choose X_2} \sim N_p(\mu ,\Sigma)$, $X_1\in \mathbb{R}^r$, $X_2\in \mathbb{R}^{p-r}$. Define $X_{2.1}=X_2-\Sigma _{21}\Sigma _{11}^{-1}X_1$ from the partitioned covariance matrix

\begin{displaymath}\Sigma = \left( \begin{array}{cc}
\Sigma_{11}&\Sigma_{12}\\ \Sigma_{21}&\Sigma_{22}
\end{array} \right).\end{displaymath}

Then
$\displaystyle X_1\sim N_r(\mu _1,\Sigma_{11}),$     (5.5)
$\displaystyle X_{2.1}\sim N_{p-r}(\mu_{2.1},\Sigma_{22.1})$     (5.6)

are independent with
\begin{displaymath}
\mu_{2.1}=\mu_2-\Sigma_{21}\Sigma_{11}^{-1}\mu_1,\quad \Sigma_{22.1}
=\Sigma_{22}-\Sigma_{21}\Sigma_{11}^{-1}\Sigma_{12}.
\end{displaymath} (5.7)

PROOF:

\begin{eqnarray*}
X_1&=&\data{A}X\quad \textrm{ with } \quad \data{A}=[\ \data{I...
...\data{B}=[\ -\Sigma_{21}
\Sigma_{11}^{-1}\ ,\ \data{I}_{p-r}\ ].
\end{eqnarray*}



Then, by (5.2) $X_1$ and $X_{2.1}$ are both normal. Note that

\begin{eqnarray*}
\mathop{\mathit{Cov}}(X_{1},X_{2.1})={\data{A}\Sigma \data{B}^...
...gma_{21} \Sigma_{11}^{-1} \right)^{\top}
+\Sigma_{12} \right).
}
\end{eqnarray*}



Recall that $\Sigma _{21}=\left (\Sigma _{12}\right )^{\top}$. Hence $\data{A}\Sigma \data{B}^{\top}
=-\Sigma _{11}\Sigma _{11}^{-1}\Sigma _{12}+\Sigma
_{12}\equiv 0$ !

Using (5.2) again we also have the joint distribution of ($X_1,X_{2.1}$), namely

\begin{displaymath}\left( X_1 \atop X_{2.1} \right)=\left( \data{A} \atop \data{...
...Sigma_{11} & 0 \\
0 & \Sigma_{22.1}\end{array} \right)\right).\end{displaymath}

With this block diagonal structure of the covariance matrix, the joint pdf of ($X_1,X_{2.1}$) can easily be factorized into

\begin{eqnarray*}
f(x_1,x_{2.1}) &=& \vert 2\pi \Sigma_{11}\vert^{-\frac{1}{2}}\...
...-\mu_{2.1})^{\top}
\Sigma_{22.1}^{-1}(x_{2.1}-\mu_{2.1})\right\}
\end{eqnarray*}



from which the independence between $X_1$ and $X_{2.1}$ follows.${\Box}$

The next two corollaries are direct consequences of Theorem 5.1.

COROLLARY 5.1   Let $X ={\displaystyle \left( {X_1 \atop X_2} \right)}\sim N_p(\mu,\Sigma)$, $\Sigma = \left( \begin{array}{cc}
\Sigma_{11}&\Sigma_{12}\\ \Sigma_{21}&\Sigma_{22}
\end{array} \right)$. $\Sigma_{12}=0$ if and only if $X_1$ is independent of $X_2$.

The independence of two linear transforms of a multinormal $X$ can be shown via the following corollary.

COROLLARY 5.2   If $ X \sim N_p (\mu, \Sigma) $ and given some matrices $\data{A}$ and $\data{B}$ , then $\data{A}X$ and $\data{B}X$ are independent if and only if $\data{A}\Sigma
\data{B}^{\top}=0$.

The following theorem is also useful. It generalizes Theorem 4.6. The proof is left as an exercise.

THEOREM 5.2   If $ X \sim N_p (\mu, \Sigma) $, $\data{A}(q\times p)$, $c\in \mathbb{R}^q$ and $q\leq p$, then $Y = \data{A} X +c$ is a $q$-variate Normal, i.e.,

\begin{displaymath}Y\sim N_q(\data{A}\mu+c, \data{A}\Sigma\data{A}^{\top}).\end{displaymath}

The conditional distribution of $X_2$ given $X_1$ is given by the next theorem.

THEOREM 5.3   The conditional distribution of $X_2$ given $X_1=x_1$ is normal with mean $\mu _2+\Sigma _{21}\Sigma
_{11}^{-1}(x_1-\mu _1)$ and covariance $\Sigma _{22.1}$, i.e.,
\begin{displaymath}
(X_2\mid X_1=x_1)\sim N_{p-r}(\mu _2+\Sigma _{21}\Sigma _{11}^{-1}(x_1-\mu
_1), \Sigma _{22.1}). \end{displaymath} (5.8)

PROOF:
Since $X_2 = X_{2.1}+\Sigma_{21}\Sigma_{11}^{-1}X_1$, for a fixed value of $X_1=x_1$, $X_2$ is equivalent to $X_{2.1}$ plus a constant term:

\begin{displaymath}(X_2\vert X_1 = x_1) = (X_{2.1}+\Sigma_{21}\Sigma_{11}^{-1}x_1),\end{displaymath}

which has the normal distribution $N(\mu_{2.1}+
\Sigma_{21}\Sigma_{11}^{-1}x_1,\Sigma_{22.1})$.${\Box}$

Note that the conditional mean of $(X_2\mid X_1)$ is a linear function of $X_1$ and that the conditional variance does not depend on the particular value of $X_1$. In the following example we consider a specific distribution.

EXAMPLE 5.1   Suppose that $p=2$, $r=1$, $\mu ={\displaystyle \left( {0 \atop 0} \right)}$ and $\Sigma ={\displaystyle \left( {1 \atop -0.8}\ {-0.8 \atop 2} \right)}$. Then $\Sigma _{11}=1$, $\Sigma _{21}=-0.8$ and $\Sigma _{22.1}=\Sigma _{22}-\Sigma _{21}\Sigma _{11}^{-1}\Sigma
_{12}=2-(0.8)^2=1.36$. Hence the marginal pdf of $X_1$ is

\begin{displaymath}f_{X_1}(x_1) = \frac{1 }{ \sqrt {2\pi }}\
\exp \left (-\frac{x_1^2 }{ 2}
\right )\end{displaymath}

and the conditional pdf of $(X_2\mid X_1=x_1)$ is given by

\begin{displaymath}f(x_2\mid x_1) = \frac{ 1}{\sqrt {2\pi (1.36)}}\ \exp \left
\{-\frac{(x_2+0.8x_1)^2 }{2\times (1.36) }\right\}.\end{displaymath}

As mentioned above, the conditional mean of $(X_2\mid X_1)$ is linear in $X_{1}$. The shift in the density of $(X_2\mid X_1)$ can be seen in Figure 5.1.

Figure 5.1: Shifts in the conditional density. 19531 MVAcondnorm.xpl
\includegraphics[width=1\defpicwidth]{MVAcondnorm.ps}

Sometimes it will be useful to reconstruct a joint distribution from the marginal distribution of $X_1$ and the conditional distribution $(X_2\vert X_1)$. The following theorem shows under which conditions this can be easily done in the multinormal framework.

THEOREM 5.4   If $ X_1 \sim N_r(\mu_1, \Sigma_{11})$ and $\left(X_2\vert X_1 = x_1\right) \sim N_{p-r}
({\cal{A}}x_1+b, \Omega)$ where $\Omega$ does not depend on $x_1$, then $X = {X_1\choose X_2} \sim N_p(\mu ,\Sigma)$, where

\begin{eqnarray*}
&& \mu = {\mu_1 \choose {\cal{A}}\mu_1 +b} \\
&& \Sigma = \le...
...mega + {\cal{A}}\Sigma_{11}{\cal{A}}^{\top} \end{array} \right).
\end{eqnarray*}



EXAMPLE 5.2   Consider the following random variables

\begin{eqnarray*}
&&X_1 \sim N_1(0,1),\\
&&X_2 \vert X_1=x_1 \sim N_2\left(
\l...
...\left(\begin{array}{cc}
1 & 0\\
0 & 1\end{array}\right)\right).
\end{eqnarray*}



Using Theorem (5.4), where ${\cal{A}}=(2\quad 1)^{\top}$, $b=(0\quad 1)^{\top}$ and $\Omega={\cal{I}}_2$, we easily obtain the following result:

\begin{eqnarray*}
X = \left( \begin{array}{c}
X_1\\
X_2\end{array}\right) \sim ...
...c}
1 & 2& 1\\
2 & 5& 2 \\
1 & 2& 2 \end{array}\right) \right).
\end{eqnarray*}



In particular, the marginal distribution of $X_2$ is

\begin{eqnarray*}
X_2 \sim N_2\left(
\left(\begin{array}{c}
0\\
1\end{array}\r...
...\left(\begin{array}{cc}
5 & 2\\
2 & 2\end{array}\right)\right),
\end{eqnarray*}



thus conditional on $X_1$, the two components of $X_2$ are independent but marginally they are not!

Note that the marginal mean vector and covariance matrix of $X_2$ could have also been computed directly by using (4.28)-(4.29). Using the derivation above, however, provides us with useful properties: we have multinormality!


Conditional Approximations

As we saw in Chapter 4 (Theorem 4.3), the conditional expectation $E(X_2\vert X_1)$ is the mean squared error (MSE) best approximation of $X_2$ by a function of $X_1$. We have in this case that

\begin{displaymath}
X_2 = E(X_2\vert X_1) + U = \mu_2 + \Sigma_{21} \Sigma_{11}^{-1} (X_1 - \mu_1) + U.
\end{displaymath} (5.9)

Hence, the best approximation of $X_2\in \mathbb{R}^{p-r}$ by $X_1 \in \mathbb{R}^{r} $ is the linear approximation that can be written as:
\begin{displaymath}
X_2 = \beta_0 + {\cal{B}}\, X_1 + U
\end{displaymath} (5.10)

with ${\cal{B}} = \Sigma_{21} \Sigma_{11}^{-1}$, $\beta_0 = \mu_2 - B\mu_1$ and $U \sim N(0,\Sigma_{22.1})$.

Consider now the particular case where $r = p-1$. Now $X_2 \in \mathbb{R}$ and ${\cal{B}}$ is a row vector $\beta^{\top}$ of dimension $ (1 \times r)$

\begin{displaymath}
X_2 = \beta_0 + \beta^{\top}\, X_1 + U.
\end{displaymath} (5.11)

This means, geometrically speaking, that the best MSE approximation of $X_2$ by a function of $X_1$ is hyperplane. The marginal variance of $X_2$ can be decomposed via (5.11):
\begin{displaymath}
\sigma_{22} = \beta^{\top} \Sigma_{11} \beta + \sigma_{22.1} = \sigma_{21}
\Sigma_{11}^{-1} \sigma_{12} + \sigma_{22.1}.
\end{displaymath} (5.12)

The ratio
\begin{displaymath}
\rho^2_{2.1 \ldots r} = \frac{\sigma_{21} \Sigma_{11}^{-1} \sigma_{12}}
{\sigma_{22}}
\end{displaymath} (5.13)

is known as the square of the multiple correlation between $X_2$ and the $r$ variables $X_1$. It is the percentage of the variance of $X_2$ which is explained by the linear approximation $\beta_0 + \beta^{\top} X_1$. The last term in (5.12) is the residual variance of $X_2$. The square of the multiple correlation corresponds to the coefficient of determination introduced in Section 3.4, see (3.39), but here it is defined in terms of the r.v. $X_1$ and $X_2$. It can be shown that $\rho_{2.1 \ldots r}$ is also the maximum correlation attainable between $X_2$ and a linear combination of the elements of $X_1$, the optimal linear combination being precisely given by $\beta^{\top} X_1$. Note, that when $r=1$, the multiple correlation $\rho_{2.1}$ coincides with the usual simple correlation $\rho_{X_2 X_1}$ between $X_2$ and $X_1$.

EXAMPLE 5.3   Consider the ``classic blue'' pullover example (Example 3.15) and suppose that $X_1$ (sales), $X_2$ (price), $X_3$ (advertisement) and $X_4$ (sales assistants) are normally distributed with

\begin{displaymath}
{\mu}=
\left( \begin{array}{c}
172.7\\ 104.6\\ 104.0\\ 93...
...\\
271,44 & {-91.58} & 210.30 & 177.36 \end{array} \right).
\end{displaymath}

(These are in fact the sample mean and the sample covariance matrix but in this example we pretend that they are the true parameter values.)

The conditional distribution of $X_1$ given $(X_2, X_3, X_4)$ is thus an univariate normal with mean

\begin{displaymath}
{\mu_1+\sigma_{12}\Sigma_{22}^{-1}\left( \begin{array}{c}
...
...ay} \right)} =
{65.670 - 0.216 X_2 + 0.485 X_3 + 0.844 X_4}
\end{displaymath}

and variance

\begin{displaymath}
\sigma_{11.2}=\sigma_{11}-\sigma_{12}\Sigma_{22}^{-1}\sigma_{21}=96.761
\end{displaymath}

The linear approximation of the sales $(X_1)$ by the price $(X_2)$, advertisement $(X_3)$ and sales assistants $(X_4)$ is provided by the conditional mean above.(Note that this coincides with the results of Example 3.15 due to the particular choice of $\mu$ and $\Sigma$). The quality of the approximation is given by the multiple correlation ${\rho_{1.234}^2}=\frac{\sigma_{12}\Sigma_{22}^{-1}\sigma_{21}}{\sigma_{11}}=
0.907$. (Note again that this coincides with the coefficient of determination $r^2$ found in Example 3.15).

This example also illustrates the concept of partial correlation. The correlation matrix between the 4 variables is given by

\begin{displaymath}
{P}=
\left( \begin{array}{rrrr}
1 & {-0.168} & 0.867 & 0...
...0.308\\
0.633 & {-0.464} & 0.308 & 1
\end{array} \right),
\end{displaymath}

so that the correlation between $X_1$ (sales) and $X_2$ (price) is $-0.168.$ We can compute the conditional distribution of $(X_1,X_2)$ given $(X_3, X_4)$, which is a bivariate normal with mean:

\begin{displaymath}
{ \mu_1 \choose \mu_2 } + \left( \begin{array}{cc}
\sigma_...
... X_4 \\
153.644 + 0.085 X_3 - 0.617 X_4 \end{array} \right)
\end{displaymath}

and covariance matrix:

\begin{displaymath}
\left( \begin{array}{cc}
\sigma_{11} & \sigma_{12} \\ \sig...
...ray}{cc}
104.006 & \\ -33.574 & 155.592 \end{array} \right).
\end{displaymath}

In particular, the last covariance matrix allows the partial correlation between $X_1$ and $X_2$ to be computed for a fixed level of $X_3$ and $X_4$:

\begin{displaymath}
\rho_{{X_1X_2}\mid{X_3X_4}} = \frac{-33.574}{\sqrt{104.006 * 155.592}} = -0.264,
\end{displaymath}

so that in this particular example with a fixed level of advertisement and sales assistance, the negative correlation between price and sales is more important than the marginal one.

Summary
$\ast$
If $X\sim N_{p}(\mu,\Sigma)$, then a linear transformation $\data{A}X + c$, $\data{A}(q\times p)$, where $c\in \mathbb{R}^q$, has distribution $ N_{q}(\data{A}\mu+c, \data{A} \Sigma \data{A}^{\top}) $.
$\ast$
Two linear transformations $\data{A}X$ and $\data{B}X$ with $X\sim N_{p}(\mu,\Sigma)$ are independent if and only if $\data{A}\Sigma
\data{B}^{\top}=0$.
$\ast$
If $X_{1}$ and $X_{2}$ are partitions of $X\sim N_{p}(\mu,\Sigma)$, then the conditional distribution of $X_{2}$ given $X_{1}=x_{1}$ is again normal.
$\ast$
In the multivariate normal case, $X_1$ is independent of $X_2$ if and only if $\Sigma_{12}=0$.
$\ast$
The conditional expectation of $(X_2\vert X_1)$ is a linear function if $\left( {X_1 \atop X_2}
\right) \sim N_p(\mu , \Sigma)$.
$\ast$
The multiple correlation coefficient is defined as $\rho^2_{2.1 \ldots r} = \frac{\sigma_{21} \Sigma_{11}^{-1}\sigma_{12}}{\sigma_{22}}.$
$\ast$
The multiple correlation coefficient is the percentage of the variance of $X_2$ explained by the linear approximation $\beta_0 + \beta^{\top} X_1$.