5.2 The Wishart Distribution

The Wishart distribution (named after its discoverer) plays a prominent role in the analysis of estimated covariance matrices. If the mean of $X \sim N_p (\mu, \Sigma)$ is known to be $\mu =0$ , then for a data matrix $\data{X}(n\times p)$ the estimated covariance matrix is proportional to $\data{X}^{\top}\data{X}$ . This is the point where the Wishart distribution comes in, because $\data{M}(p\times p) = \data{X}^{\top}\data{X}=\sum_{i=1}^n x_i x_i^\top$ has a Wishart distribution $W_p(\Sigma ,n)$ .

EXAMPLE 5.4 Set

, then for $X\sim N_1(0,\sigma ^2)$ the data matrix of the observations

$\begin{displaymath}\data{X}=( x_1,\ldots,x_n)^{\top} \quad \textrm{with} \quad \data{M} = \data{X}^{\top}\data{X} = \sum_{i=1}^n x_{i}x_{i} \end{displaymath}$

leads to the Wishart distribution $W_1(\sigma ^2,n)=\sigma^2\chi^2_n$ . The one-dimensional Wishart distribution is thus in fact a $\chi^2$ distribution.

When we talk about the distribution of a matrix, we mean of course the joint distribution of all its elements. More exactly: since $\data{M} = \data{X}^{\top}\data{X}$ is symmetric we only need to consider the elements of the lower triangular matrix

$\begin{displaymath} \data{M} = \left( \begin{array}{cccc} m_{11} &&& \\ m_{21} &... ...ts & \\ m_{p1} & m_{p2} & \ldots & m_{pp} \end{array}\right). \end{displaymath}$

(5.14)

Hence the Wishart distribution is defined by the distribution of the vector

$\begin{displaymath} (m_{11}, \ldots, m_{p1}, m_{22}, \ldots, m_{p2}, \ldots, m_{pp})^{\top}. \end{displaymath}$

(5.15)

Linear transformations of the data matrix $\data{X}$ also lead to Wishart matrices.

THEOREM 5.5 If $\data{M}\sim W_p(\Sigma ,n)$ and $\data{B}(p\times q)$ , then the distribution of $\data{B}^{\top}\data{MB}$ is Wishart $W_q( \data{B}^{\top}\Sigma \data{B},n)$ .

With this theorem we can standardize Wishart matrices since with ${\data{B}} = \Sigma ^{-1/2}$ the distribution of $\Sigma ^{-1/2} \data{M}\Sigma ^{-1/2}$ is $W_p(\data{I},n)$ . Another connection to the $\chi^2$ -distribution is given by the following theorem.

THEOREM 5.6 If $\data{M}\sim W_p(\Sigma ,m)$ , and $a\in \mathbb{R}^p$ with $a^{\top}\Sigma a\ne 0$ , then the distribution of ${\displaystyle \frac{a^{\top} \data{M}a}{a^{\top}\Sigma a }}$ is $\chi ^2_m$ .

This theorem is an immediate consequence of Theorem 5.5 if we apply the linear transformation $x\mapsto a^{\top}x$ . Central to the analysis of covariance matrices is the next theorem.

THEOREM 5.7 (Cochran) Let $\data{X}(n\times p)$ be a data matrix from a $N_p(0, \Sigma)$ distribution and let $\data{C}(n\times n)$ be a symmetric matrix.

(a)

$\data{X}^{\top}\data{C}\data{X}$ has the distribution of weighted Wishart random variables, i.e.

$\begin{displaymath}\data{X}^{\top}\data{C}\data{X}= \sum_{i=1}^n \lambda _iW_p(\Sigma ,1),\end{displaymath}$

where $\lambda_i$ , $i=1,\dots,n$ , are the eigenvalues of $\data{C}$ .

(b)

$\data{X}^{\top}\data{C}\data{X}$ is Wishart if and only if $\data{C}^2=\data{C}$ . In this case

$\begin{displaymath}\data{X}^{\top}\data{C}\data{X}\sim W_p(\Sigma ,r),\end{displaymath}$

and $r = \mathop{\rm {rank}}(\data{C})=\mathop{\hbox{tr}}(\data{C}).$

(c)

$n \data{S}=\data{X}^{\top}\data{HX}$ is distributed as $W_p(\Sigma,n-1)$ (note that $\data{S}$ is the sample covariance matrix).

(d)

$\bar{x}$ and $\data{S}$ are independent.

The following properties are useful:

If $\data{M}\sim W_p(\Sigma ,n)$ , then $E(\data{M}) =n\Sigma$ .
If $\data{M}_i$ are independent Wishart $W_p(\Sigma,n_i)$ i = $1,\cdots,k$ , then $\data{M}=\sum_{i=1}^{k}\data{M}_i \sim W_p(\Sigma,n)$ where $n=\sum_{i=1}^{k}n_i$ .
The density of $W_p(\Sigma,n-1)$ for a positive definite ${\data{M}}$ is given by:

$\begin{displaymath} f_{\Sigma,n-1}({\data{M}})= \frac{\vert{\data{M}}\vert^{\fra... ...rt^{\frac{1}{2}(n-1)}\prod_{i=1}^p\Gamma\{ \frac{n-i}{2} \} }, \end{displaymath}$ (5.16)

where $\Gamma$ is the gamma function, see Feller (1966).

For further details on the Wishart distribution see Mardia et al. (1979).

Summary

$\ast$: The Wishart distribution is a generalization of the $\chi^2$ -distribution. In particular $W_{1}(\sigma^2,n) = \sigma^2 \chi^2_{n}$ .
$\ast$: The empirical covariance matrix $\data{S}$ has a $\frac{1}{n} W_{p}(\Sigma,n-1)$ distribution.
$\ast$: In the normal case, $\bar{x}$ and $\data{S}$ are independent.
$\ast$: For $\data{M}\sim W_p(\Sigma ,m),\ \frac{a^{\top} \data{M}a}{a^{\top}\Sigma a } \sim \chi ^2_m$ .