7.1 Likelihood Ratio Test

Suppose that the distribution of $\{x_i\}^n_{i=1}$ , $x_i\in \mathbb{R}^p$ , depends on a parameter vector $\theta$ . We will consider two hypotheses:

$\begin{eqnarray*} H_0 &:& \theta \in \Omega _0\\ H_1 &:& \theta \in \Omega _1. \\ \end{eqnarray*}$

The hypothesis corresponds to the ``reduced model'' and to the ``full model''. This notation was already used in Chapter 3.

EXAMPLE 7.1 Consider a multinormal $N_{p}(\theta,\data{I})$ . To test if $\theta$ equals a certain fixed value $\theta_{0}$ we construct the test problem:

$\begin{eqnarray*} H_0 &:& \theta=\theta_{0} \\ H_1 &:& \textrm{no constraints on }\theta \end{eqnarray*}$

or, equivalently, $\Omega_{0}=\{\theta_{0}\}$ , $\Omega_{1}=\mathbb{R}^p$ .

Define $L^*_j=\max\limits_{\theta \in \Omega _j}L(\data{X};\theta )$ , the maxima of the likelihood for each of the hypotheses. Consider the likelihood ratio (LR)

$\begin{displaymath} \lambda (\data{X})=\frac { L_0^* }{L^*_1 } \end{displaymath}$

(7.1)

One tends to favor if the LR is high and if the LR is low. The likelihood ratio test (LRT) tells us when exactly to favor $H_{0}$ over $H_{1}$ . A likelihood ratio test of size $\alpha$ for testing against has the rejection region

$\begin{displaymath}R = \{\data{X}:\lambda (\data{X})<c\}\end{displaymath}$

where

is determined so that $\sup\limits_{\theta \in \Omega _0} P_\theta (\data{X}\in R)=\alpha$ . The difficulty here is to express

as a function of $\alpha$ , because $\lambda (\data{X})$ might be a complicated function of $\data{X}$ .

Instead of $\lambda$ we may equivalently use the log-likelihood

$\begin{displaymath}-2\log\lambda =2(\ell ^*_1-\ell ^*_0).\end{displaymath}$

In this case the rejection region will be $R=\{\data{X}:-2\log \lambda (\data{X})>k\}.$ What is the distribution of $\lambda$ or of $-2 \log\lambda$ from which we need to compute

$\begin{theorem} {If $\Omega _1\subset \mathbb{R}^q$\ is a $q$-dimensional space ... ...chi^2_{q-r}\qquad \textrm{as}\quad n\to\infty.\end{displaymath}} \end{theorem}$
An asymptotic rejection region can now be given by simply computing the $1-\alpha$ quantile $k=\chi^2_{1-\alpha ;q-r}$ . The LRT rejection region is therefore

$\begin{displaymath}R=\{\data{X}: -2\log\lambda(\data{X}) >\chi^2_{1-\alpha ;q-r}\}.\end{displaymath}$

The Theorem 7.1 is thus very helpful: it gives a general way of building rejection regions in many problems. Unfortunately, it is only an asymptotic result, meaning that the size of the test is only approximately equal to $\alpha$ , although the approximation becomes better when the sample size

increases. The question is ``how large should

be?''. There is no definite rule: we encounter here the same problem that was already discussed with respect to the Central Limit Theorem in Chapter 4.

Fortunatelly, in many standard circumstances, we can derive exact tests even for finite samples because the test statistic $-2 \log \lambda({\cal{X}})$ or a simple transformation of it turns out to have a simple form. This is the case in most of the following standard testing problems. All of them can be viewed as an illustration of the likelihood ratio principle.

Test Problem 1 is an amuse-bouche: in testing the mean of a multinormal population with a known covariance matrix the likelihood ratio statistic has a very simple quadratic form with a known distribution under .

TEST PROBLEM 1 Suppose that $X_{1},\ldots,X_{n}$ is an i.i.d. random sample from a $N_p(\mu,\Sigma)$ population.

$\begin{displaymath}H_0:\mu =\mu _0, \ \Sigma\ \textrm{known versus}\ H_1:\ \textrm{no constraints.}\end{displaymath}$

In this case is a simple hypothesis, i.e., $\Omega_0=\{\mu_0\}$ and therefore the dimension of $\Omega_0$ equals . Since we have imposed no constraints in , the space $\Omega_1$ is the whole $\mathbb{R}^p$ which leads to . From (6.6) we know that

$\begin{displaymath}\ell _0^*=\ell (\mu _0,\Sigma ) =-\frac{n }{ 2}\log\vert 2\p... ...n(\overline x-\mu _0)^{\top} \Sigma^{-1}(\overline x-\mu _0) .\end{displaymath}$

Under

the maximum of $\ell (\mu ,\Sigma )$ is

$\begin{displaymath}\ell ^*_1=\ell (\overline x,\Sigma ) =-\frac{n }{2 }\log \ve... ...\vert -\frac{ 1}{2 }n\mathop{\hbox{tr}}(\Sigma ^{-1}\data{S}).\end{displaymath}$

Therefore,

$\begin{displaymath} -2\log\lambda =2(\ell ^*_1-\ell ^*_0)=n(\overline x-\mu _0)^{\top}\Sigma ^{-1}(\overline x-\mu _0) \end{displaymath}$

(7.2)

which, by Theorem 4.7, has a $\chi^2_p$ -distribution under

EXAMPLE 7.2 Consider the bank data again. Let us test whether the population mean of the forged bank notes is equal to

$\begin{displaymath}\mu _0=(214.9,129.9,129.7,8.3,10.1,141.5)^{\top}.\end{displaymath}$

(This is in fact the sample mean of the genuine bank notes.) The sample mean of the forged bank notes is

$\begin{displaymath}\overline x=(214.8,130.3,130.2,10.5,11.1,139.4)^{\top}.\end{displaymath}$

Suppose for the moment that the estimated covariance matrix $\data{S}_f$ given in (3.5) is the true covariance matrix $\Sigma$ . We construct the likelihood ratio test and obtain

$\begin{eqnarray*} -2\log\lambda & = & 2(\ell ^*_1-\ell ^*_0)=n(\overline x-\mu _0)^{\top}\Sigma ^{-1}(\overline x-\mu _0)\\ & = & 7362.32, \end{eqnarray*}$

the quantile $k=\chi ^2_{0.95;6}$ equals

. The rejection rejection consists of all values in the sample space which lead to values of the likelihood ratio test statistic larger than

. Under

the value of $-2 \log\lambda$ is therefore highly significant. Hence, the true mean of the forged bank notes is significantly different from $\mu _0$ !

Test Problem 2 is the same as the preceding one but in a more realistic situation where the covariance matrix is unknown: here the Hotelling's -distribution will be useful to determine an exact test and a confidence region for the unknown $\mu$ .

TEST PROBLEM 2 Suppose that $X_{1},\ldots,X_{n}$ is an i.i.d. random sample from a $N_p(\mu,\Sigma)$ population.

$\begin{displaymath}H_0:\mu =\mu _0,\ \Sigma\ \textrm{unknown versus}\ H_1:\ \textrm{no constraints.}\end{displaymath}$

Under it can be shown that

$\begin{displaymath} \ell^*_0=\ell (\mu _0,\data{S}+dd^{\top}),\quad d = (\overline x-\mu_0) \end{displaymath}$

(7.3)

and under

we have

$\begin{displaymath}\ell^*_1 = \ell (\overline x,\data{S}).\end{displaymath}$

This leads after some calculation to

$\begin{displaymath} -2\log\lambda = 2(\ell^*_1-\ell_0^*) =n\log(1+d^{\top}\data{S}^{-1}d). \end{displaymath}$

(7.4)

This statistic is a monotone function of $(n-1)d^{\top}\data{S}^{-1}d$ . This means that $-2\log \lambda> k$ if and only if $(n-1)d^{\top}\data{S}^{-1}d>k'$ . The latter statistic has by Corollary 5.3, under

a Hotelling's

-distribution. Therefore,

$\begin{displaymath} (n-1)(\bar{x}-\mu_0)^{\top}\data{S}^{-1}(\bar{x}-\mu_0) \sim T^2(p,n-1), \end{displaymath}$

(7.5)

or equivalently

$\begin{displaymath} \left (\frac{n-p }{p }\right )(\bar{x}-\mu_0)^{\top}\data{S}^{-1}(\bar{x}-\mu_0) \sim F_{p,n-p}. \end{displaymath}$

(7.6)

In this case an exact rejection region may be defined as

$\begin{displaymath}\left (\frac{n-p }{p }\right )(\bar{x}-\mu_0)^{\top}\data{S}^{-1}(\bar{x}-\mu_0) >F_{1-\alpha;p,n-p}.\end{displaymath}$

Alternatively, we have from Theorem 7.1 that under $H_{0}$ the asymptotic distribution of the test statistic is

$\begin{displaymath}-2 \log\lambda \stackrel{\cal L}{\mathrel{\mathop{\longrighta... ...}\limits_{}^{}}} \chi^2_{p},\quad\textrm{as }n\rightarrow\infty\end{displaymath}$

which leads to the (asymptotically valid) rejection region

$\begin{displaymath}n\log \{1+(\bar{x}-\mu_0)^{\top}S^{-1}(\bar{x}-\mu_0)\}>\chi^2_{1-\alpha ;p},\end{displaymath}$

but of course, in this case, we would prefer to use the exact

-test provided just above.

EXAMPLE 7.3 Consider the problem of Example 7.2 again. We know that $\data{S}_f$ is the empirical analogue for $\Sigma_f$ , the covariance matrix for the forged banknotes. The test statistic (7.5) has the value 1153.4 or its equivalent for the

distribution in (7.6) is 182.5 which is highly significant ( $F_{0.95;6,94}=2.1966$ ) so that we conclude that $\mu_f \not= \mu_0$ .

Confidence Region for $\mu$

When estimating a multidimensional parameter $\theta \in \mathbb{R}^k$ from a sample, we saw in Chapter 6 how to determine the estimator $\widehat \theta = \widehat \theta ({\data{X}})$ . After the sample is observed we end up with a point estimate, which is the corresponding observed value of $\widehat \theta$ . We know $\widehat \theta ({\data{X}})$ is a random variable and we often prefer to determine a confidence region for $\theta$ . A confidence region (CR) is a random subset of $\mathbb{R}^k$ (determined by appropriate statistics) such that we are ``confident'', at a certain given level $1-\alpha$ , that this region contains $\theta$ :

$\begin{displaymath} P(\theta\in \textrm{CR}) = 1-\alpha. \end{displaymath}$

This is just a multidimensional generalization of the basic univariate confidence interval. Confidence regions are particularly useful when a hypothesis

on $\theta$ is rejected, because they help in eventually identifying which component of $\theta$ is responsible for the rejection.

There are only a few cases where confidence regions can be easily assessed, and include most of the testing problems on mean presented in this section.

Corollary 5.3 provides a pivotal quantity which allows confidence regions for $\mu$ to be constructed. Since $\left (\frac{n-p }{p }\right )(\bar{x}-\mu)^{\top}\data{S}^{-1} (\bar{x}-\mu) \sim F_{p,n-p}$ , we have

$\begin{displaymath}P\left(\left (\frac{n-p }{p }\right )(\mu-\bar{x})^{\top}\data{S}^{-1} (\mu-\bar{x}) <F_{1-\alpha;p,n-p}\right )= 1-\alpha.\end{displaymath}$

Then,

$\begin{displaymath}\textrm{CR}=\left\{\mu\in \mathbb{R}^p\mid(\mu-\bar{x})^{\top... ...S}^{-1}(\mu-\bar{x}) \le\frac{p}{n-p}F_{1-\alpha;p,n-p}\right\}\end{displaymath}$

is a confidence region at level (1- $\alpha$ ) for $\mu$ . It is the interior of an iso-distance ellipsoid in $\mathbb{R}^p$ centered at $\bar{x}$ , with a scaling matrix $\data{S}^{-1}$ and a distance constant $\left (\frac{p }{n-p }\right )F_{1-\alpha;p,n-p}$ . When

is large, ellipsoids are not easy to handle for practical purposes. One is thus interested in finding confidence intervals for $\mu_1,\mu_2,\ldots ,\mu_p$ so that simultaneous confidence on all the intervals reaches the desired level of say, $1-\alpha$ .

In the following, we consider a more general problem. We construct simultaneous confidence intervals for all possible linear combinations $a^{\top}\mu$ , $a\in \mathbb{R}^p$ of the elements of $\mu$ .

Suppose for a moment that we fix a particular projection vector . We are back to a standard univariate problem of finding a confidence interval for the mean $a^{\top}\mu$ of a univariate random variable $a^{\top}X$ . We can use the -statistics and an obvious confidence interval for $a^{\top}\mu$ is given by the values $a^{\top}\mu$ such that

$\begin{displaymath}\left\vert\frac{\sqrt{n-1}(a^{\top}\mu -a^{\top}\bar{x})}{\sqrt{a^{\top}\data{S}a}}\right\vert\le t_{1-\frac{\alpha}{2};n-1}\end{displaymath}$

or equivalently

$\begin{displaymath}t^2(a)=\frac{(n-1)\left\{a^{\top}(\mu -\bar{x})\right\}^2}{a^{\top}\data{S}a}\le F_{1-\alpha ;1,n-1}.\end{displaymath}$

This provides the ( $1-\alpha$ ) confidence interval for $a^{\top}\mu$ :

$\begin{displaymath}\left(a^{\top}\bar{x}-\sqrt{F_{1-\alpha ;1,n-1}\frac{a^{\top}... ...\sqrt{F_{1-\alpha ;1,n-1}\frac{a^{\top}\data{S}a}{n-1}}\right).\end{displaymath}$

Now it is easy to prove (using Theorem 2.5) that:

$\begin{displaymath}\max_at^2(a)=(n-1)(\bar{x}-\mu)^{\top}\data{S}^{-1}(\bar{x}-\mu)\sim T^2(p,n-1).\end{displaymath}$

Therefore, simultaneously for all $a\in \mathbb{R}^p$ , the interval

$\begin{displaymath} \left(a^{\top}\bar{x}-\sqrt{K_\alpha a^{\top}\data{S}a},\ a^{\top}\bar{x}+\sqrt{K_\alpha a^{\top}\data{S}a}\right) \end{displaymath}$

(7.7)

where $K_\alpha= \frac{p}{n-p}F_{1-\alpha ;p,n-p}$ , will contain $a^{\top}\mu$ with probability ( $1-\alpha$ ).

A particular choice of are the columns of the identity matrix ${\data I}_p$ , providing simultaneous confidence intervals for $\mu_1,\ldots ,\mu_p$ . We have therefore with probability ( $1-\alpha$ ) for $j=1,\ldots , p$

$\begin{displaymath} \bar{x}_j-\sqrt{\frac{p}{n-p}F_{1-\alpha ;p,n-p}s_{jj}}\le \mu_j \le \bar{x}_j+\sqrt{\frac{p}{n-p}F_{1-\alpha ;p,n-p}s_{jj}}. \end{displaymath}$

(7.8)

It should be noted that these intervals define a rectangle inscribing the confidence ellipsoid for $\mu$ given above. They are particularly useful when a null hypothesis

of the type described above is rejected and one would like to see which component(s) are mainly responsible for the rejection.

EXAMPLE 7.4 The $95\%$ confidence region for $\mu_f$ , the mean of the forged banknotes, is given by the ellipsoid:

$\begin{displaymath}\left\{\mu \in \mathbb{R}^6\left\vert(\mu-\bar{x}_f)^{\top}S_... ...1}(\mu-\bar{x}_f)\right.\le \frac{6}{94}F_{0.95;6,94}\right\}.\end{displaymath}$

The $95\%$ simultaneous confidence intervals are given by (we use $F_{0.95;6,94}=2.1966$ )

$\begin{displaymath}\begin{array}{rcccr} 214.692 &\le & \mu_1 & \le & 214.954\\ ... ...e & 11.370\\ 139.242 &\le & \mu_6 & \le & 139.658. \end{array}\end{displaymath}$

Comparing the inequalities with $\mu_0=(214.9,129.9,129.7,8.3,10.1,141.5)^{\top}$ shows that almost all components (except the first one) are responsible for the rejection of $\mu _0$ in Example 7.2 and 7.3.

In addition, the method can provide other confidence intervals. We have at the same level of confidence (choosing $a^{\top}=(0,\ 0,\ 0,\ 1,\ {-1},\ 0)$ )

$\begin{displaymath}-1.211 \le \mu_4 - \mu_5 \le 0.005\end{displaymath}$

showing that for the forged bills, the lower border is essentially smaller than the upper border.

REMARK 7.1 It should be noted that the confidence region is an ellipsoid whose characteristics depend on the whole matrix $\data{S}$ . In particular, the slope of the axis depends on the eigenvectors of

and therefore on the covariances $s_{ij}$ . However, the rectangle inscribing the confidence ellipsoid provides the simultaneous confidence intervals for $\mu_j,\ j=1,\ldots, p$ . They do not depend on the covariances $s_{ij}$ , but only on the variances $s_{jj}$ (see (7.8)). In particular, it may happen that a tested value $\mu _0$ is covered by the intervals (7.8) but not covered by the confidence ellipsoid. In this case, $\mu _0$ is rejected by a test based on the confidence ellipsoid but not rejected by a test based on the simultaneous confidence intervals. The simultaneous confidence intervals are easier to handle than the full ellipsoid but we have lost some information, namely the covariance between the components (see Exercise 7.14).

The following Problem concerns the covariance matrix in a multinormal population: in this situation the test statistic has a slightly more complicated distribution. We will therefore invoke the approximation of Theorem 7.1 in order to derive a test of approximate size $\alpha$ .

TEST PROBLEM 3 Suppose that $X_{1},\ldots,X_{n}$ is an i.i.d. random sample from a $N_p(\mu,\Sigma)$ population.

$\begin{displaymath}H_0:\Sigma =\Sigma _0,\ \mu\ \textrm{unknown versus}\ H_1:\ \textrm{no constraints}.\end{displaymath}$

Under $H_{0}$ we have $\widehat \mu = \overline x$ , and $\Sigma =\Sigma_0$ , whereas under $H_{1}$ we have $\widehat \mu = \overline x$ , and $\widehat\Sigma=\data{S}$ . Hence

$\begin{eqnarray*} \ell^*_0 &=& \ell (\overline x,\Sigma _0) =-\frac{1 }{2 }n\... ...a{S})=-\frac{1}{2}n\log\vert 2\pi \data{S}\vert-\frac{1}{2} np \end{eqnarray*}$

and thus

$\begin{eqnarray*} -2\log\lambda &=&2(\ell ^*_1-\ell ^*_0)\cr &=&n\mathop{\hbox{t... ...igma ^{-1}_0\data{S})-n\log\vert\Sigma ^{-1}_0\data{S}\vert-np. \end{eqnarray*}$

Note that this statistic is a function of the eigenvalues of $\Sigma ^{-1}_0\data{S}$ ! Unfortunately, the exact finite sample distribution of $-2 \log\lambda$ is very complicated. Asymptotically, we have under

$\begin{displaymath}-2\log\lambda \stackrel{\cal L}{\to} \chi ^2_m\qquad \textrm{as}\quad n\to\infty\end{displaymath}$

with $m=\frac{1 }{2 }\left\{p(p+1)\right\}$ , since a $(p \times p)$ covariance matrix has only these

parameters as a consequence of its symmetry.

EXAMPLE 7.5 Consider the US companies data set (Table B.5) and suppose we are interested in the companies of the energy sector, analyzing their assets

and sales

. The sample is of size 15 and provides the value of $S=10^7\times \left[\begin{array}{cc} 1.6635 & 1.2410\\ 1.2410 & 1.3747 \end{array}\right]$ . We want to test if $\Var {X_1 \choose X_2} = 10^7\times \left[\begin{array}{cc} 1.2248 & 1.1425\\ 1.1425 & 1.5112 \end{array}\right] = \Sigma_0$ . ( $\Sigma_0$ is in fact the empirical variance matrix for

and

for the manufacturing sector). The test statistic turns out to be $-2\log \lambda =2.7365$ which is not significant for $\chi_3^2$ (p-value=0.4341). So we can not conclude that $\Sigma \not= \Sigma_0$ .

In the next testing problem, we address a question that was already stated in Chapter 3, Section 3.6: testing a particular value of the coefficients $\beta$ in a linear model. The presentation is done in general terms so that it can be built on in the next section where we will test linear restrictions on $\beta$ .

TEST PROBLEM 4 Suppose that $Y_{1}, \ldots, Y_{n}$ are independent r.v.'s with $Y_{i} \sim N_{1}(\beta^{\top}x_{i}, \sigma^2), x_{i}~\in~\mathbb{R}^p$ .

$\begin{displaymath}H_0:\beta = \beta_0, \ \sigma^2\ \textrm{unknown versus} \ H_1:\ \textrm{no constraints}.\end{displaymath}$

Under $H_{0}$ we have $\beta = \beta_{0}, \widehat{\sigma}^2_0 = \frac{1}{n} \vert\vert y-\data{X}\beta_0\vert\vert^2$ and under $H_{1}$ we have $\hat{\beta} = (\data{X}^{\top} \data{X})^{-1} \data{X}^{\top}y, \hat{\sigma}^2 = \frac{1}{n} \vert\vert y-\data{X}\beta\vert\vert^2$ (see Example 6.3). Hence by Theorem 7.1

$\begin{eqnarray*} -2 \log \lambda & = & 2(\ell_{1}^\ast - \ell_{0}^\ast) \\ & =... ...} \right) \\ & \stackrel{\cal L}{\longrightarrow} &\chi^2_{p}. \end{eqnarray*}$

We draw upon the result (3.45) which gives us:

$\begin{displaymath}F = \frac{(n-p)}{p}\left(\frac{\vert\vert y-\data{X}\beta_{0}... ... y-\data{X}\hat{\beta}\vert\vert^2} -1 \right) \sim F_{p,n-p}, \end{displaymath}$

so that in this case we again have an exact distribution.

EXAMPLE 7.6 Let us consider our ``classic blue'' pullovers again. In Example 3.11 we tried to model the dependency of sales on prices. As we have seen in Figure 3.5 the slope of the regression curve is rather small, hence we might ask if ${\alpha \choose \beta} = {211 \choose 0}$ . Here

$\begin{displaymath}y = \left( \begin{array}{c} y_{1} \\ \vdots \\ y_{10} \end{ar... ..._{1,2} \\ \vdots & \vdots \\ 1 & x_{10,2} \end{array} \right). \end{displaymath}$

The test statistic for the LR test is

$\begin{displaymath}-2\log \lambda = 9.10 \end{displaymath}$

which under the $\chi_{2}^2$ distribution is significant. The exact

-test statistic

$\begin{displaymath}F = 5.93 \end{displaymath}$

is also significant under the $F_{2,8}$ distribution $(F_{2,8;0.95}=4.46)$ .

Summary

$\ast$: The hypotheses $H_{0} : \theta\in\Omega_{0}$ against $H_{1} : \theta\in\Omega_{1}$ can be tested using the likelihood ratio test (LRT). The likelihood ratio (LR) is the quotient $\lambda(\data{X})=L^*_{0}/L^*_{1}$ where the $L^*_{j}$ are the maxima of the likelihood for each of the hypotheses.
$\ast$: The test statistic in the LRT is $\lambda (\data{X})$ or equivalently its logarithm $\log\lambda(\data{X})$ . If $\Omega_{1}$ is -dimensional and $\Omega_{0}\subset\Omega_{1}$ -dimensional, then the asymptotic distribution of $-2 \log\lambda$ is $\chi^2_{q-r}$ . This allows $H_{0}$ to be tested against $H_{1}$ by calculating the test statistic $-2\log\lambda=2(\ell^*_{1}-\ell^*_{0})$ where $\ell^*_{j}=\log L^*_{j}$ .
$\ast$: The hypothesis $H_{0} : \mu=\mu_{0}$ for $X\sim N_{p}(\mu,\Sigma)$ , where $\Sigma$ is known, leads to $-2\log \lambda =n (\overline x-\mu_{0})^{\top}\Sigma^{-1} (\overline x-\mu_{0})\sim\chi^2_{p}.$
$\ast$: The hypothesis $H_{0} : \mu=\mu_{0}$ for $X\sim N_{p}(\mu,\Sigma)$ , where $\Sigma$ is unknown, leads to $-2\log \lambda =n \log\{1+(\overline x-\mu_{0})^{\top}\data{S}^{-1} (\overline x-\mu_{0})\}\mathrel{\mathop{\longrightarrow}\limits_{}^{}}\chi^2_{p}$ , and
$(n-1)(\bar{x}-\mu_0)^{\top}\data{S}^{-1}(\bar{x}-\mu_0)\sim T^2(p,n-1).$
$\ast$: The hypothesis $H_{0} : \Sigma=\Sigma_{0}$ for $X\sim N_{p}(\mu,\Sigma)$ , where $\mu$ is unknown, leads to $-2\log \lambda =n \mathop{\hbox{tr}}\left(\Sigma_{0}^{-1}\data{S}\right) - n\l... ...athrel{\mathop{\longrightarrow}\limits_{}^{}}\chi^2_{m}, \ m=\frac{1}{2}p(p+1).$
$\ast$: The hypothesis $H_{0} : \beta=\beta_{0}$ for $Y_{i} \sim N_{1}(\beta^{\top}x_{i},\sigma^2)$ , where $\sigma^2$ is unknown, leads to $-2 \log \lambda = n \log \left( \frac{\vert\vert y - \data{X}\beta_{0}\vert\ver... ...rt\vert y - \data{X}\hat{\beta}\vert\vert^2} \right) \longrightarrow \chi^2_{p}$ .