5.2 The Wishart Distribution

The Wishart distribution (named after its discoverer) plays a prominent role in the analysis of estimated covariance matrices. If the mean of $ X \sim N_p (\mu, \Sigma) $ is known to be $\mu =0$, then for a data matrix $\data{X}(n\times p)$ the estimated covariance matrix is proportional to $ \data{X}^{\top}\data{X}$. This is the point where the Wishart distribution comes in, because $\data{M}(p\times p) = \data{X}^{\top}\data{X}=\sum_{i=1}^n x_i x_i^\top$ has a Wishart distribution $W_p(\Sigma ,n)$.

EXAMPLE 5.4   Set $p=1$, then for $X\sim N_1(0,\sigma ^2)$ the data matrix of the observations

\begin{displaymath}\data{X}=( x_1,\ldots,x_n)^{\top}
\quad \textrm{with} \quad
\data{M} = \data{X}^{\top}\data{X} = \sum_{i=1}^n x_{i}x_{i} \end{displaymath}

leads to the Wishart distribution $W_1(\sigma ^2,n)=\sigma^2\chi^2_n$. The one-dimensional Wishart distribution is thus in fact a $\chi^2$ distribution.

When we talk about the distribution of a matrix, we mean of course the joint distribution of all its elements. More exactly: since $\data{M} =
\data{X}^{\top}\data{X}$ is symmetric we only need to consider the elements of the lower triangular matrix

\begin{displaymath}
\data{M} = \left( \begin{array}{cccc} m_{11} &&& \\ m_{21} &...
...ts & \\ m_{p1} & m_{p2} & \ldots & m_{pp} \end{array}\right).
\end{displaymath} (5.14)

Hence the Wishart distribution is defined by the distribution of the vector
\begin{displaymath}
(m_{11}, \ldots, m_{p1}, m_{22}, \ldots, m_{p2}, \ldots, m_{pp})^{\top}.
\end{displaymath} (5.15)

Linear transformations of the data matrix $\data{X}$ also lead to Wishart matrices.

THEOREM 5.5   If $ \data{M}\sim W_p(\Sigma ,n)$ and $\data{B}(p\times q)$, then the distribution of $\data{B}^{\top}\data{MB}$ is Wishart $W_q( \data{B}^{\top}\Sigma \data{B},n)$.

With this theorem we can standardize Wishart matrices since with ${\data{B}} = \Sigma ^{-1/2}$ the distribution of $\Sigma ^{-1/2} \data{M}\Sigma ^{-1/2}$ is $W_p(\data{I},n)$. Another connection to the $\chi^2$-distribution is given by the following theorem.

THEOREM 5.6   If $\data{M}\sim W_p(\Sigma ,m)$, and $a\in \mathbb{R}^p$ with $a^{\top}\Sigma a\ne 0$, then the distribution of ${\displaystyle \frac{a^{\top}
\data{M}a}{a^{\top}\Sigma a }}$ is $\chi ^2_m$.

This theorem is an immediate consequence of Theorem 5.5 if we apply the linear transformation $x\mapsto a^{\top}x$. Central to the analysis of covariance matrices is the next theorem.

THEOREM 5.7 (Cochran)   Let $\data{X}(n\times p)$ be a data matrix from a $ N_p(0, \Sigma)$ distribution and let $\data{C}(n\times n)$ be a symmetric matrix.
(a)
$\data{X}^{\top}\data{C}\data{X}$ has the distribution of weighted Wishart random variables, i.e.

\begin{displaymath}\data{X}^{\top}\data{C}\data{X}= \sum_{i=1}^n \lambda _iW_p(\Sigma ,1),\end{displaymath}

where $\lambda_i$, $i=1,\dots,n$, are the eigenvalues of $\data{C}$.
(b)
$\data{X}^{\top}\data{C}\data{X}$ is Wishart if and only if $
\data{C}^2=\data{C}$. In this case

\begin{displaymath}\data{X}^{\top}\data{C}\data{X}\sim W_p(\Sigma ,r),\end{displaymath}

and $r = \mathop{\rm {rank}}(\data{C})=\mathop{\hbox{tr}}(\data{C}).$
(c)
$n \data{S}=\data{X}^{\top}\data{HX}$ is distributed as $W_p(\Sigma,n-1)$ (note that $\data{S}$ is the sample covariance matrix).
(d)
$\bar{x}$ and $\data{S}$ are independent.

The following properties are useful:

  1. If $ \data{M}\sim W_p(\Sigma ,n)$, then $E(\data{M}) =n\Sigma$.
  2. If $\data{M}_i$ are independent Wishart $W_p(\Sigma,n_i)$ i = $1,\cdots,k$, then $\data{M}=\sum_{i=1}^{k}\data{M}_i \sim W_p(\Sigma,n)$ where $n=\sum_{i=1}^{k}n_i$.
  3. The density of $W_p(\Sigma,n-1)$ for a positive definite ${\data{M}}$ is given by:
    \begin{displaymath}
f_{\Sigma,n-1}({\data{M}})=
\frac{\vert{\data{M}}\vert^{\fra...
...rt^{\frac{1}{2}(n-1)}\prod_{i=1}^p\Gamma\{ \frac{n-i}{2} \} },
\end{displaymath} (5.16)

    where $\Gamma$ is the gamma function, see Feller (1966).

For further details on the Wishart distribution see Mardia et al. (1979).

Summary
$\ast$
The Wishart distribution is a generalization of the $\chi^2$-distribution. In particular $W_{1}(\sigma^2,n) = \sigma^2
\chi^2_{n}$.
$\ast$
The empirical covariance matrix $\data{S}$ has a $\frac{1}{n} W_{p}(\Sigma,n-1)$ distribution.
$\ast$
In the normal case, $\bar{x}$ and $\data{S}$ are independent.
$\ast$
For $\data{M}\sim W_p(\Sigma ,m),\ \frac{a^{\top} \data{M}a}{a^{\top}\Sigma a } \sim \chi ^2_m$.