5.1 Elementary Properties of the Multinormal

Let us first summarize some properties which were already derived in the previous chapter.

The pdf of $X\sim N_{p}(\mu,\Sigma)$ is

$\begin{displaymath}f(x) = \vert 2 \pi \Sigma \vert^{-1/2} \exp \left \{ -\frac{1}{2} (x- \mu)^{\top} \Sigma^{-1}(x- \mu) \right \}. \end{displaymath}$ (5.1)

The expectation is $E(X)= \mu$ , the covariance can be calculated as $\Var(X)= \linebreak E(X-\mu)(X-\mu)^{\top}=\Sigma$ .
Linear transformations turn normal random variables into normal random variables. If $X \sim N_p (\mu, \Sigma)$ and $\data{A}(p\times p), c \in \mathbb{R}^p$ , then $Y = \data{A} X +c$ is -variate Normal, i.e.,

$\begin{displaymath} Y \sim N_p ( \data{A} \mu +c, \data{A} \Sigma \data{A}^{\top}). \end{displaymath}$ (5.2)
If $X \sim N_p (\mu, \Sigma)$ , then the Mahalanobis transformation is

$\begin{displaymath} Y = \Sigma ^{-1/2}(X-\mu )\sim N_p(0,\data{I}_p) \end{displaymath}$ (5.3)

and it holds that

$\begin{displaymath} Y^{\top}Y = (X-\mu )^{\top}\ \Sigma ^{-1}(X-\mu ) \sim \chi ^2_p. \end{displaymath}$ (5.4)

Often it is interesting to partition into sub-vectors $X_{1}$ and $X_{2}$ . The following theorem tells us how to correct $X_{2}$ to obtain a vector which is independent of $X_{1}$ .

THEOREM 5.1 Let $X = {X_1\choose X_2} \sim N_p(\mu ,\Sigma)$ , $X_1\in \mathbb{R}^r$ , $X_2\in \mathbb{R}^{p-r}$ . Define $X_{2.1}=X_2-\Sigma _{21}\Sigma _{11}^{-1}X_1$ from the partitioned covariance matrix

$\begin{displaymath}\Sigma = \left( \begin{array}{cc} \Sigma_{11}&\Sigma_{12}\\ \Sigma_{21}&\Sigma_{22} \end{array} \right).\end{displaymath}$

Then

$\displaystyle X_1\sim N_r(\mu _1,\Sigma_{11}),$			(5.5)
$\displaystyle X_{2.1}\sim N_{p-r}(\mu_{2.1},\Sigma_{22.1})$			(5.6)

are independent with

$\begin{displaymath} \mu_{2.1}=\mu_2-\Sigma_{21}\Sigma_{11}^{-1}\mu_1,\quad \Sigma_{22.1} =\Sigma_{22}-\Sigma_{21}\Sigma_{11}^{-1}\Sigma_{12}. \end{displaymath}$

(5.7)

PROOF:

$\begin{eqnarray*} X_1&=&\data{A}X\quad \textrm{ with } \quad \data{A}=[\ \data{I... ...\data{B}=[\ -\Sigma_{21} \Sigma_{11}^{-1}\ ,\ \data{I}_{p-r}\ ]. \end{eqnarray*}$

Then, by (5.2)

and $X_{2.1}$ are both normal. Note that

$\begin{eqnarray*} \mathop{\mathit{Cov}}(X_{1},X_{2.1})={\data{A}\Sigma \data{B}^... ...gma_{21} \Sigma_{11}^{-1} \right)^{\top} +\Sigma_{12} \right). } \end{eqnarray*}$

Recall that $\Sigma _{21}=\left (\Sigma _{12}\right )^{\top}$ . Hence $\data{A}\Sigma \data{B}^{\top} =-\Sigma _{11}\Sigma _{11}^{-1}\Sigma _{12}+\Sigma _{12}\equiv 0$ !

Using (5.2) again we also have the joint distribution of ( $X_1,X_{2.1}$ ), namely

$\begin{displaymath}\left( X_1 \atop X_{2.1} \right)=\left( \data{A} \atop \data{... ...Sigma_{11} & 0 \\ 0 & \Sigma_{22.1}\end{array} \right)\right).\end{displaymath}$

With this block diagonal structure of the covariance matrix, the joint pdf of ( $X_1,X_{2.1}$ ) can easily be factorized into

$\begin{eqnarray*} f(x_1,x_{2.1}) &=& \vert 2\pi \Sigma_{11}\vert^{-\frac{1}{2}}\... ...-\mu_{2.1})^{\top} \Sigma_{22.1}^{-1}(x_{2.1}-\mu_{2.1})\right\} \end{eqnarray*}$

from which the independence between

and $X_{2.1}$ follows. ${\Box}$

The next two corollaries are direct consequences of Theorem 5.1.

COROLLARY 5.1 Let $X ={\displaystyle \left( {X_1 \atop X_2} \right)}\sim N_p(\mu,\Sigma)$ , $\Sigma = \left( \begin{array}{cc} \Sigma_{11}&\Sigma_{12}\\ \Sigma_{21}&\Sigma_{22} \end{array} \right)$ . $\Sigma_{12}=0$ if and only if

is independent of

The independence of two linear transforms of a multinormal can be shown via the following corollary.

COROLLARY 5.2 If $X \sim N_p (\mu, \Sigma)$ and given some matrices $\data{A}$ and $\data{B}$ , then $\data{A}X$ and $\data{B}X$ are independent if and only if $\data{A}\Sigma \data{B}^{\top}=0$ .

The following theorem is also useful. It generalizes Theorem 4.6. The proof is left as an exercise.

THEOREM 5.2 If $X \sim N_p (\mu, \Sigma)$ , $\data{A}(q\times p)$ , $c\in \mathbb{R}^q$ and $q\leq p$ , then $Y = \data{A} X +c$ is a

-variate Normal, i.e.,

$\begin{displaymath}Y\sim N_q(\data{A}\mu+c, \data{A}\Sigma\data{A}^{\top}).\end{displaymath}$

The conditional distribution of given is given by the next theorem.

THEOREM 5.3 The conditional distribution of

given

is normal with mean $\mu _2+\Sigma _{21}\Sigma _{11}^{-1}(x_1-\mu _1)$ and covariance $\Sigma _{22.1}$ , i.e.,

$\begin{displaymath} (X_2\mid X_1=x_1)\sim N_{p-r}(\mu _2+\Sigma _{21}\Sigma _{11}^{-1}(x_1-\mu _1), \Sigma _{22.1}). \end{displaymath}$

(5.8)

PROOF:
Since $X_2 = X_{2.1}+\Sigma_{21}\Sigma_{11}^{-1}X_1$ , for a fixed value of , is equivalent to $X_{2.1}$ plus a constant term:

$\begin{displaymath}(X_2\vert X_1 = x_1) = (X_{2.1}+\Sigma_{21}\Sigma_{11}^{-1}x_1),\end{displaymath}$

which has the normal distribution $N(\mu_{2.1}+ \Sigma_{21}\Sigma_{11}^{-1}x_1,\Sigma_{22.1})$ . ${\Box}$

Note that the conditional mean of $(X_2\mid X_1)$ is a linear function of and that the conditional variance does not depend on the particular value of . In the following example we consider a specific distribution.

EXAMPLE 5.1 Suppose that

, $\mu ={\displaystyle \left( {0 \atop 0} \right)}$ and $\Sigma ={\displaystyle \left( {1 \atop -0.8}\ {-0.8 \atop 2} \right)}$ . Then $\Sigma _{11}=1$ , $\Sigma _{21}=-0.8$ and $\Sigma _{22.1}=\Sigma _{22}-\Sigma _{21}\Sigma _{11}^{-1}\Sigma _{12}=2-(0.8)^2=1.36$ . Hence the marginal pdf of

$\begin{displaymath}f_{X_1}(x_1) = \frac{1 }{ \sqrt {2\pi }}\ \exp \left (-\frac{x_1^2 }{ 2} \right )\end{displaymath}$

and the conditional pdf of $(X_2\mid X_1=x_1)$ is given by

$\begin{displaymath}f(x_2\mid x_1) = \frac{ 1}{\sqrt {2\pi (1.36)}}\ \exp \left \{-\frac{(x_2+0.8x_1)^2 }{2\times (1.36) }\right\}.\end{displaymath}$

As mentioned above, the conditional mean of $(X_2\mid X_1)$ is linear in $X_{1}$ . The shift in the density of $(X_2\mid X_1)$ can be seen in Figure 5.1.

**Figure 5.1:** Shifts in the conditional density. `MVAcondnorm.xpl`
$\includegraphics[width=1\defpicwidth]{MVAcondnorm.ps}$

Sometimes it will be useful to reconstruct a joint distribution from the marginal distribution of and the conditional distribution $(X_2\vert X_1)$ . The following theorem shows under which conditions this can be easily done in the multinormal framework.

THEOREM 5.4 If $X_1 \sim N_r(\mu_1, \Sigma_{11})$ and $\left(X_2\vert X_1 = x_1\right) \sim N_{p-r} ({\cal{A}}x_1+b, \Omega)$ where $\Omega$ does not depend on

, then $X = {X_1\choose X_2} \sim N_p(\mu ,\Sigma)$ , where

$\begin{eqnarray*} && \mu = {\mu_1 \choose {\cal{A}}\mu_1 +b} \\ && \Sigma = \le... ...mega + {\cal{A}}\Sigma_{11}{\cal{A}}^{\top} \end{array} \right). \end{eqnarray*}$

EXAMPLE 5.2 Consider the following random variables

$\begin{eqnarray*} &&X_1 \sim N_1(0,1),\\ &&X_2 \vert X_1=x_1 \sim N_2\left( \l... ...\left(\begin{array}{cc} 1 & 0\\ 0 & 1\end{array}\right)\right). \end{eqnarray*}$

Using Theorem (5.4), where ${\cal{A}}=(2\quad 1)^{\top}$ , $b=(0\quad 1)^{\top}$ and $\Omega={\cal{I}}_2$ , we easily obtain the following result:

$\begin{eqnarray*} X = \left( \begin{array}{c} X_1\\ X_2\end{array}\right) \sim ... ...c} 1 & 2& 1\\ 2 & 5& 2 \\ 1 & 2& 2 \end{array}\right) \right). \end{eqnarray*}$

In particular, the marginal distribution of

$\begin{eqnarray*} X_2 \sim N_2\left( \left(\begin{array}{c} 0\\ 1\end{array}\r... ...\left(\begin{array}{cc} 5 & 2\\ 2 & 2\end{array}\right)\right), \end{eqnarray*}$

thus conditional on

, the two components of

are independent but marginally they are not!

Note that the marginal mean vector and covariance matrix of could have also been computed directly by using (4.28)-(4.29). Using the derivation above, however, provides us with useful properties: we have multinormality!

Conditional Approximations

As we saw in Chapter 4 (Theorem 4.3), the conditional expectation $E(X_2\vert X_1)$ is the mean squared error (MSE) best approximation of by a function of . We have in this case that

$\begin{displaymath} X_2 = E(X_2\vert X_1) + U = \mu_2 + \Sigma_{21} \Sigma_{11}^{-1} (X_1 - \mu_1) + U. \end{displaymath}$

(5.9)

Hence, the best approximation of $X_2\in \mathbb{R}^{p-r}$ by $X_1 \in \mathbb{R}^{r}$ is the linear approximation that can be written as:

$\begin{displaymath} X_2 = \beta_0 + {\cal{B}}\, X_1 + U \end{displaymath}$

(5.10)

with ${\cal{B}} = \Sigma_{21} \Sigma_{11}^{-1}$ , $\beta_0 = \mu_2 - B\mu_1$ and $U \sim N(0,\Sigma_{22.1})$ .

Consider now the particular case where . Now $X_2 \in \mathbb{R}$ and ${\cal{B}}$ is a row vector $\beta^{\top}$ of dimension $(1 \times r)$

$\begin{displaymath} X_2 = \beta_0 + \beta^{\top}\, X_1 + U. \end{displaymath}$

(5.11)

This means, geometrically speaking, that the best MSE approximation of

by a function of

is hyperplane. The marginal variance of

can be decomposed via (5.11):

$\begin{displaymath} \sigma_{22} = \beta^{\top} \Sigma_{11} \beta + \sigma_{22.1} = \sigma_{21} \Sigma_{11}^{-1} \sigma_{12} + \sigma_{22.1}. \end{displaymath}$

(5.12)

The ratio

$\begin{displaymath} \rho^2_{2.1 \ldots r} = \frac{\sigma_{21} \Sigma_{11}^{-1} \sigma_{12}} {\sigma_{22}} \end{displaymath}$

(5.13)

is known as the square of the multiple correlation between

and the

variables

. It is the percentage of the variance of

which is explained by the linear approximation $\beta_0 + \beta^{\top} X_1$ . The last term in (5.12) is the residual variance of

. The square of the multiple correlation corresponds to the coefficient of determination introduced in Section 3.4, see (3.39), but here it is defined in terms of the r.v.

and

. It can be shown that $\rho_{2.1 \ldots r}$ is also the maximum correlation attainable between

and a linear combination of the elements of

, the optimal linear combination being precisely given by $\beta^{\top} X_1$ . Note, that when

, the multiple correlation $\rho_{2.1}$ coincides with the usual simple correlation $\rho_{X_2 X_1}$ between

and

EXAMPLE 5.3 Consider the ``classic blue'' pullover example (Example 3.15) and suppose that

(sales),

(price),

(advertisement) and

(sales assistants) are normally distributed with

$\begin{displaymath} {\mu}= \left( \begin{array}{c} 172.7\\ 104.6\\ 104.0\\ 93... ...\\ 271,44 & {-91.58} & 210.30 & 177.36 \end{array} \right). \end{displaymath}$

(These are in fact the sample mean and the sample covariance matrix but in this example we pretend that they are the true parameter values.)

The conditional distribution of given is thus an univariate normal with mean

$\begin{displaymath} {\mu_1+\sigma_{12}\Sigma_{22}^{-1}\left( \begin{array}{c} ... ...ay} \right)} = {65.670 - 0.216 X_2 + 0.485 X_3 + 0.844 X_4} \end{displaymath}$

and variance

$\begin{displaymath} \sigma_{11.2}=\sigma_{11}-\sigma_{12}\Sigma_{22}^{-1}\sigma_{21}=96.761 \end{displaymath}$

The linear approximation of the sales

by the price

, advertisement

and sales assistants

is provided by the conditional mean above.(Note that this coincides with the results of Example 3.15 due to the particular choice of $\mu$ and $\Sigma$ ). The quality of the approximation is given by the multiple correlation ${\rho_{1.234}^2}=\frac{\sigma_{12}\Sigma_{22}^{-1}\sigma_{21}}{\sigma_{11}}= 0.907$ . (Note again that this coincides with the coefficient of determination

found in Example 3.15).

This example also illustrates the concept of partial correlation. The correlation matrix between the 4 variables is given by

$\begin{displaymath} {P}= \left( \begin{array}{rrrr} 1 & {-0.168} & 0.867 & 0... ...0.308\\ 0.633 & {-0.464} & 0.308 & 1 \end{array} \right), \end{displaymath}$

so that the correlation between

(sales) and

(price) is

We can compute the conditional distribution of

given

, which is a bivariate normal with mean:

$\begin{displaymath} { \mu_1 \choose \mu_2 } + \left( \begin{array}{cc} \sigma_... ... X_4 \\ 153.644 + 0.085 X_3 - 0.617 X_4 \end{array} \right) \end{displaymath}$

and covariance matrix:

$\begin{displaymath} \left( \begin{array}{cc} \sigma_{11} & \sigma_{12} \\ \sig... ...ray}{cc} 104.006 & \\ -33.574 & 155.592 \end{array} \right). \end{displaymath}$

In particular, the last covariance matrix allows the partial correlation between

and

to be computed for a fixed level of

and

$\begin{displaymath} \rho_{{X_1X_2}\mid{X_3X_4}} = \frac{-33.574}{\sqrt{104.006 * 155.592}} = -0.264, \end{displaymath}$

so that in this particular example with a fixed level of advertisement and sales assistance, the negative correlation between price and sales is more important than the marginal one.

Summary

$\ast$: If $X\sim N_{p}(\mu,\Sigma)$ , then a linear transformation $\data{A}X + c$ , $\data{A}(q\times p)$ , where $c\in \mathbb{R}^q$ , has distribution $N_{q}(\data{A}\mu+c, \data{A} \Sigma \data{A}^{\top})$ .
$\ast$: Two linear transformations $\data{A}X$ and $\data{B}X$ with $X\sim N_{p}(\mu,\Sigma)$ are independent if and only if $\data{A}\Sigma \data{B}^{\top}=0$ .
$\ast$: If $X_{1}$ and $X_{2}$ are partitions of $X\sim N_{p}(\mu,\Sigma)$ , then the conditional distribution of $X_{2}$ given $X_{1}=x_{1}$ is again normal.
$\ast$: In the multivariate normal case, is independent of if and only if $\Sigma_{12}=0$ .
$\ast$: The conditional expectation of $(X_2\vert X_1)$ is a linear function if $\left( {X_1 \atop X_2} \right) \sim N_p(\mu , \Sigma)$ .
$\ast$: The multiple correlation coefficient is defined as $\rho^2_{2.1 \ldots r} = \frac{\sigma_{21} \Sigma_{11}^{-1}\sigma_{12}}{\sigma_{22}}.$
$\ast$: The multiple correlation coefficient is the percentage of the variance of explained by the linear approximation $\beta_0 + \beta^{\top} X_1$ .