4.5 Sampling Distributions and Limit Theorems

In multivariate statistics, we observe the values of a multivariate random variable $X$ and obtain a sample $\{x_i\}_{i=1}^n$, as described in Chapter 3. Under random sampling, these observations are considered to be realizations of a sequence of i.i.d. random variables $X_1,\ldots,X_n$, where each $X_i$ is a $p$-variate random variable which replicates the parent or population random variable $X$. Some notational confusion is hard to avoid: $X_i$ is not the $i$th component of $X$, but rather the $i$th replicate of the $p$-variate random variable $X$ which provides the $i$th observation $x_i$ of our sample.

For a given random sample $X_1,\ldots,X_n$, the idea of statistical inference is to analyze the properties of the population variable $X$. This is typically done by analyzing some characteristic $\theta$ of its distribution, like the mean, covariance matrix, etc. Statistical inference in a multivariate setup is considered in more detail in Chapters 6 and 7.

Inference can often be performed using some observable function of the sample $X_1,\ldots,X_n$, i.e., a statistics. Examples of such statistics were given in Chapter 3: the sample mean $\bar x$, the sample covariance matrix ${\cal{S}}$. To get an idea of the relationship between a statistics and the corresponding population characteristic, one has to derive the sampling distribution of the statistic. The next example gives some insight into the relation of $(\overline x, S)$ to $(\mu,\Sigma)$.

EXAMPLE 4.15   Consider an iid sample of $n$ random vectors $X_i \in \mathbb{R}^p$ where $E(X_i)=\mu$ and $\Var (X_i) = \Sigma$. The sample mean $\bar{x}$ and the covariance matrix ${\cal{S}}$ have already been defined in Section 3.3. It is easy to prove the following results

\begin{displaymath}\begin{array}{lcl}
E(\bar{x}) &=& \frac{1}{n}\sum\limits_{i=1...
...p}\right)\right\}\\ [3mm]
&=& \frac{n-1}{n}\Sigma.
\end{array}\end{displaymath}

This shows in particular that ${\cal{S}}$ is a biased estimator of $\Sigma$. By contrast, $ {\cal{S}}_u = \frac{n}{n-1}{\cal{S}}$ is an unbiased estimator of $\Sigma$.

Statistical inference often requires more than just the mean and/or the variance of a statistic. We need the sampling distribution of the statistics to derive confidence intervals or to define rejection regions in hypothesis testing for a given significance level. Theorem 4.9 gives the distribution of the sample mean for a multinormal population.

THEOREM 4.9   Let $X_1,\ldots,X_n$ be i.i.d. with $X_i \sim N_p(\mu,\Sigma)$. Then $\bar x \sim N_p(\mu,\frac{1}{n}\Sigma)$.

PROOF:
$\bar x=(1/n)\sum_{i=1}^n X_i$ is a linear combination of independent normal variables, so it has a normal distribution (see chapter 5). The mean and the covariance matrix were given in the preceding example. ${\Box}$

With multivariate statistics, the sampling distributions of the statistics are often more difficult to derive than in the preceding Theorem. In addition they might be so complicated that approximations have to be used. These approximations are provided by limit theorems. Since they are based on asymptotic limits, the approximations are only valid when the sample size is large enough. In spite of this restriction, they make complicated situations rather simple. The following central limit theorem shows that even if the parent distribution is not normal, when the sample size $n$ is large, the sample mean $\bar x$ has an approximate normal distribution.

THEOREM 4.10 (Central Limit Theorem (CLT))   Let $ X_1, X_2, \ldots, X_n $ be i.i.d. with
$ X_i \sim (\mu, \Sigma) $. Then the distribution of $ \displaystyle \sqrt n (\overline x - \mu ) $ is asymptotically $ N_p(0, \Sigma)$, i.e.,

\begin{displaymath}\sqrt n (\overline x - \mu) \stackrel{\cal L}{\longrightarrow...
...(0, \Sigma) \qquad \textrm{as} \quad n \longrightarrow \infty. \end{displaymath}

The symbol `` $\stackrel{\cal L}{\longrightarrow}$'' denotes convergence in distribution which means that the distribution function of the random vector $\sqrt{n}(\bar{x}-\mu)$ converges to the distribution function of $ N_p(0, \Sigma)$.

EXAMPLE 4.16   Assume that $X_{1},\ldots,X_{n}$ are i.i.d. and that they have Bernoulli distributions where $p=\frac{1}{2}$ (this means that $P(X_i=1)=\frac{1}{2},\;P(X_i=0)=\frac{1}{2})$. Then $\mu=p=\frac{1}{2}$ and $\Sigma=p(1-p)=\frac{1}{4}$. Hence,

\begin{displaymath}\sqrt n \left(\overline x - \frac{1}{2}\right) \stackrel{\cal...
...{4}\right) \qquad \textrm{as} \quad n \longrightarrow
\infty. \end{displaymath}

Figure 4.4: The CLT for Bernoulli distributed random variables. Sample size $n=5$ (left) and $n=35$ (right). 16864 MVAcltbern.xpl
\includegraphics[width=0.6\defepswidth]{cltberna.ps} \includegraphics[width=0.6\defepswidth]{cltbernb.ps}

The results are shown in Figure 4.4 for varying sample sizes.

EXAMPLE 4.17   Now consider a two-dimensional random sample $X_{1},\ldots,X_{n}$ that is i.i.d. and created from two independent Bernoulli distributions with $p=0.5$. The joint distribution is given by $P(X_{i}=(0,0)^{\top}) =
\frac{1}{4}, P(X_{i}=(0,1)^{\top}) = \frac{1}{4}, P(X_{i}=(1,0)^{\top}) =
\frac{1}{4}, P(X_{i}=(1,1)^{\top}) = \frac{1}{4}$. Here we have

\begin{displaymath}\sqrt{n} \left\{ \bar{x}- {\frac{1}{2} \choose \frac{1}{2}} \...
...ght)
\right) \quad \textrm{as}\quad n \longrightarrow
\infty. \end{displaymath}

Figure 4.5 displays the estimated two-dimensional density for different sample sizes.

Figure 4.5: The CLT in the two-dimensional case. Sample size $n=5$ (left) and $n=85$ (right). 16870 MVAcltbern2.xpl
\includegraphics[width=0.6\defepswidth]{cltbern2.ps} \includegraphics[width=0.6\defepswidth]{cltbern3.ps}

The asymptotic normal distribution is often used to construct confidence intervals for the unknown parameters. A confidence interval at the level $1-\alpha,\,\alpha \in (0,1)$, is an interval that covers the true parameter with probability $1-\alpha$:

\begin{displaymath}P(\theta \in [\widehat{\theta}_{l} , \widehat{\theta}_{u}]) = 1 - \alpha,\end{displaymath}

where $\theta$ denotes the (unknown) parameter and $\widehat{\theta}_{l}$ and $\widehat{\theta}_{u}$ are the lower and upper confidence bounds respectively.

EXAMPLE 4.18   Consider the i.i.d. random variables $X_{1},\ldots,X_{n}$ with $X_{i} \sim (\mu,
\sigma^2)$ and $\sigma^2$ known. Since we have $\sqrt{n}(\bar{x}-\mu)
\stackrel{\cal L}{\rightarrow} N(0,\sigma^2)$ from the CLT, it follows that

\begin{displaymath}P(-u_{1-\alpha/2} \le \sqrt{n}\frac{(\bar{x}-\mu)}{\sigma} \l...
...w 1 - \alpha,\qquad \textrm{as} \quad n \longrightarrow \infty \end{displaymath}

where $u_{1-\alpha/2}$ denotes the $(1 - \alpha/2)$-quantile of the standard normal distribution. Hence the interval

\begin{displaymath}\left[\bar{x}-\frac{\sigma}{\sqrt{n}}\, u_{1-\alpha/2},\,
\bar{x}+\frac{\sigma}{\sqrt{n}}\, u_{1-\alpha/2}\right]\end{displaymath}

is an approximate $(1-\alpha)$-confidence interval for $\mu$.

But what can we do if we do not know the variance $\sigma^2$? The following corollary gives the answer.

COROLLARY 4.1   If $\widehat{\Sigma}$ is a consistent estimate for $\Sigma$, then the CLT still holds, namely

\begin{displaymath}\sqrt{n}\;\widehat{\Sigma}^{-1/2} (\bar{x}-\mu) \stackrel{\ca...
...0,\data{I})\qquad \textrm{as} \quad n \longrightarrow \infty . \end{displaymath}

EXAMPLE 4.19   Consider the i.i.d. random variables $X_{1},\ldots,X_{n}$ with $X_{i} \sim (\mu,
\sigma^2)$, and now with an unknown variance $\sigma^2$. From Corollary 4.1 using $\widehat{\sigma}^2 = \frac{1}{n} \sum_{i=1}^n (x_{i}-\bar{x})^2$ we obtain

\begin{displaymath}\sqrt{n}\left( \frac{\bar{x}-\mu}{\widehat{\sigma}} \right)
\...
... L}} N(0,1)\qquad \textrm{as} \quad n \longrightarrow \infty . \end{displaymath}

Hence we can construct an approximate $(1-\alpha)$-confidence interval for $\mu$ using the variance estimate $\widehat{\sigma}^2$:

\begin{displaymath}C_{1-\alpha} = \left[\bar{x}-\frac{\widehat{\sigma}}{\sqrt{n}...
...r{x}+\frac{\widehat{\sigma}}{\sqrt{n}}\, u_{1-\alpha/2}\right].\end{displaymath}

Note that by the CLT

\begin{displaymath}P(\mu \in C_{1-\alpha}) \longrightarrow 1 - \alpha
\qquad \textrm{as} \quad n \longrightarrow \infty . \end{displaymath}

REMARK 4.1   One may wonder how large should $n$ be in practice to provide reasonable approximations. There is no definite answer to this question: it mainly depends on the problem at hand (the shape of the distribution of the $X_i$ and the dimension of $X_i$). If the $X_i$ are normally distributed, the normality of $\bar x$ is achieved from $n=1$. In most situations, however, the approximation is valid in one-dimensional problems for $n$ larger than, say, 50.

Transformation of Statistics

Often in practical problems, one is interested in a function of parameters for which one has an asymptotically normal statistic. Suppose for instance that we are interested in a cost function depending on the mean $\mu$ of the process: $f(\mu)=\mu^{\top}\data{A}\mu$ where $\data{A}>0$ is given. To estimate $\mu$ we use the asymptotically normal statistic $\bar{x}$. The question is: how does $f(\bar{x})$ behave? More generally, what happens to a statistic $t$ that is asymptotically normal when we transform it by a function $f(t)$? The answer is given by the following theorem.

THEOREM 4.11   If $ \sqrt n (t - \mu) \stackrel{\cal L}{\longrightarrow}
N_p(0,\Sigma) $ and if $ \undertilde f = (f_1, \ldots, f_q)^{\top} : \mathbb{R}^p \to \mathbb{R}^q $ are real valued functions which are differentiable at $ \mu \in \mathbb{R}^p$, then $f(t)$ is asymptotically normal with mean $ f(\mu) $ and covariance $ \data{D}^{\top} \Sigma \data{D}$, i.e.,
\begin{displaymath}
\sqrt n \{f(t) - f(\mu)\} \stackrel{\cal L}{\longrightarrow}...
...\data{D} ) \qquad \textrm{for} \quad n
\longrightarrow \infty,
\end{displaymath} (4.56)

where

\begin{displaymath}\data{D} = \left .\left( \frac{\partial f_j}{\partial t_i}
\right)(t)
\right \vert _{t = \mu} \end{displaymath}

is the $ (p \times q) $ matrix of all partial derivatives.

EXAMPLE 4.20   We are interested in seeing how $f(\bar{x})=\bar{x}^{\top}\data{A}
\bar{x}$ behaves asymptotically with respect to the quadratic cost function of $\mu , f(\mu)=\mu^{\top}\data{A}\mu$, where $\data{A}>0$.

\begin{displaymath}
D=\left.\frac{\partial f(\bar{x})}{\partial \bar{x}}\right\vert _{\bar{x}=\mu}=2\data{A}\mu.
\end{displaymath}

By Theorem 4.11 we have

\begin{displaymath}\sqrt{n}(\bar{x}^{\top}\data{A}\bar{x}-\mu^{\top}\data{A}\mu)...
...its_{}^{\cal L}} N_1\
(0,4\mu^{\top}\data{A}\Sigma\data{A}\mu).\end{displaymath}

EXAMPLE 4.21   Suppose

\begin{displaymath}X_i \sim (\mu, \Sigma); \quad \mu = {0
\choose 0}, \quad
\...
...ay}{cc} 1 & 0.5 \\ 0.5 & 1 \end{array} \right),
\quad p = 2. \end{displaymath}

We have by the CLT (Theorem 4.10) for $n\to\infty$ that

\begin{displaymath}\sqrt n (\overline x - \mu) \mathrel{\mathop{\longrightarrow}\limits_{}^{\cal L}}
N(0, \Sigma).\end{displaymath}

Suppose that we would like to compute the distribution of $\left( \begin{array}{c} \overline x_{1}^2 - \overline x_{2}\\
\overline x_{1} + 3 \overline x_{2} \end{array} \right)$. According to Theorem 4.11 we have to consider $f = (f_1,f_2)^{\top}$ with

\begin{displaymath}f_1(x_1,x_2) = x_1^2 - x_2, \quad f_2(x_1,x_2) = x_1 + 3x_2,
\quad q = 2. \end{displaymath}

Given this $ f(\mu) = {0 \choose 0} $ and

\begin{displaymath}\data{D} = (d_{ij}), \quad d_{ij}
= \left( \left . \frac{\p...
...}{ll} 2x_{1}&1\\ -1&3 \end{array}\right)
\right \vert _{x=0}. \end{displaymath}

Thus

\begin{displaymath}\data{D} = \left( \begin{array}{rr} 0 & 1 \\ -1 & 3 \end{array}
\right). \end{displaymath}

The covariance is

\begin{displaymath}\begin{array}{cccccccc}
\left( \begin{array}{rr} 0 & -1 \\ 1...
...igma\data{D}
& &\data{D}^{\top}\Sigma\data{D}
\end{array}, \end{displaymath}

which yields

\begin{displaymath}\sqrt{n}
\left(\begin{array}{c} \overline x_{1}^2 - \overline...
...-\frac{7}{2} \\
-\frac{7}{2} & 13 \end{array} \right) \right).\end{displaymath}

EXAMPLE 4.22   Let us continue the previous example by adding one more component to the function $f$. Since $q = 3 > p = 2$, we might expect a singular normal distribution. Consider $f = (f_1,f_2,f_{3})^{\top}$ with

\begin{displaymath}f_1(x_1,x_2) = x_1^2 - x_2, \quad f_2(x_1,x_2) = x_1 + 3x_2, \quad
f_3 = x_2^3, \quad q = 3. \end{displaymath}

From this we have that

\begin{displaymath}\data{D} = \left( \begin{array}{rrr} 0 & 1 & 0 \\ -1 & 3 & 0 ...
...0 \\
-\frac{7}{2} & 13 & 0 \\ 0 & 0 & 0 \end{array} \right). \end{displaymath}

The limit is in fact a singular normal distribution!

Summary
$\ast$
If $X_1,\ldots,X_n$ are i.i.d. random vectors with $X_i \sim N_p(\mu,\Sigma)$, then $\bar x \sim N_p(\mu,\frac{1}{n}\Sigma)$.
$\ast$
If $X_{1},\ldots,X_{n}$ are i.i.d. random vectors with $X_{i}\sim(\mu,\Sigma)$, then the distribution of $\sqrt{n}(\overline x
- \mu)$ is asymptotically $N(0,\Sigma)$ (Central Limit Theorem).
$\ast$
If $X_{1},\ldots,X_{n}$ are i.i.d. random variables with $X_{i}\sim(\mu,\sigma)$, then an asymptotic confidence interval can be constructed by the CLT: $\bar{x} \pm
\frac{\widehat{\sigma}}{\sqrt{n}}\,u_{1-\alpha/2}$.
$\ast$
If $t$ is a statistic that is asymptotically normal, i.e., $\sqrt{n}
(t-\mu) \mathrel{\mathop{\longrightarrow}\limits_{}^{\cal L}} N_{p}(0,\Sigma)$, then this holds also for a function $f(t)$, i.e., $\sqrt{n} \{f(t)-f(\mu)\}$ is asymptotically normal.