2.7 Linear Hypothesis Testing

The previous sections have developed, through point and interval estimation, a method to infer a population value from a sample. Hypothesis testing constitutes another method of inference which consists of formulating some assumptions about the probability distribution of the population from which the sample was extracted, and then trying to verify these assumptions for them to be considered adequate. In this sense, hypothesis testing can refer to the systematic component of the model as well as its random component. Some of these procedures will be studied in the following chapter of this book, whilst in this section we only focus on linear hypotheses about the coefficients and the parameter of dispersion of the MLRM.

In order to present how to compute hypothesis testing about the coefficients, we begin by considering the general statistic which allows us to test any linear restrictions on $\beta$ . Afterwards, we will apply this method to particular cases of interest, such as the hypotheses about the value of a $\beta_{j}$ coefficient, or about all the coefficients excepting the intercept.

2.7.1 Hypothesis Testing about the Coefficients

In order to test any linear hypothesis about the coefficient, the problem is formulated as follows:

$\begin{displaymath}\begin{array}{c} H_{0}:R\beta=r \\ H_{A}:R\beta \neq r \end{array}\end{displaymath}$

(2.134)

where

is a $q \times k$ ( $q \leq k$ ) matrix of known elements, with

being the number of linear restrictions to test, and

is a $q \times 1$ vector of known elements. The rank of

, which implies that the restrictions are linearly independent.

The matrix and the vector can be considered as artificial instruments which allow us to express any linear restrictions in matrix form. To illustrate the role of these instruments, consider an MLRM with 4 coefficients. For example, if we want to test

$\begin{displaymath}\begin{array}{c} H_{0}:6\beta_{3}-2\beta_{2}=12 \\ H_{A}:6\beta_{3}-2\beta_{2} \neq 12 \end{array}\end{displaymath}$

(2.135)

the

matrix is a row vector of four elements:

$\displaystyle R= \begin{pmatrix} 0 & -2 & 6 & 0 \end{pmatrix}$

and

. Then, the restriction we want to test can be expressed as $R\beta=r$ , given that:

$\displaystyle \begin{pmatrix} 0 & -2 & 6 & 0 \end{pmatrix}\begin{pmatrix} \beta... ... \beta_{3} \\ \beta_{4} \end{pmatrix}=12 \Rightarrow 6\beta_{3}-2\beta_{2}=12$

If the null hypothesis we test includes more than one restriction, the implementation is similar. For example, if we have the testing problem:

$\begin{displaymath}\begin{array}{cc} H_{0}:& 2\beta_{1}+\beta_{2}=1 \\ & \beta_{1}+3\beta_{4}=2 \\ H_{A}: & no H_{0} \end{array}\end{displaymath}$

(2.136)

it follows that

$\displaystyle R= \begin{pmatrix} 2 & 1 & 0 & 0 \\ 1 & 0 & 0 & 3 \end{pmatrix}$

$\displaystyle r= \begin{pmatrix} 1 \\ 2 \end{pmatrix}$

In order to derive the statistic which allows us to test the hypothesis, we begin by obtaining the probability distribution of $R\hat{\beta}$ :

$\displaystyle R\hat{\beta} \sim N[R\beta, \sigma^{2}R(X^{\top }X)^{-1}R^{\top }]$

(2.137)

This result is obtained from (2.74), given that if the $\hat{\beta}$ vector follows a normal distribution, then a linear combination of it, such as $R\hat{\beta}$ , is also normally distributed, with moments:

$\displaystyle \textrm{E}(R\hat{\beta})=RE(\hat{\beta})=R\beta$

$\displaystyle V(R\hat{\beta})=\textrm{E}[(R\hat{\beta}-R\beta)(R\hat{\beta}-R\b... ...{\top }]=\textrm{E}[R(\hat{\beta}-\beta)(\hat{\beta}-\beta)^{\top }R^{\top }]=$

$\displaystyle RE[(\hat{\beta}-\beta)(\hat{\beta}-\beta)^{\top }]R^{\top }=\sigma^{2}R(X^{\top }X)^{-1}R^{\top }$

A general result establishes that, given an

dimensional $\nu$ vector, if $\nu \sim N(\mu,\Sigma)$ , with $\Sigma$ nonsingular, then $(\nu-\mu)^{\top }\Sigma^{-1}(\nu-\mu) \sim \chi^{2}_{m}$ . If we substitute $\nu$ by $R\hat{\beta}$ , we have

$\displaystyle (R\hat{\beta}-R\beta)^{\top }[\sigma^{2}R(X^{\top }X)^{-1}R^{\top }]^{-1}(R\hat{\beta}-R\beta) \sim \chi^{2}_{q}$

(2.138)

Expression (2.138) includes the unknown parameter $\sigma ^{2}$ , so in order to obtain a value for the statistic, we have to use the independence between the quadratic form given in (2.138), and the distribution (2.125) is (see Hayashi (2000)), in such a way that:

$\displaystyle \frac{\frac{(R\hat{\beta}-r)^{\top }[\sigma^{2}R(X^{\top }X)^{-1}... ...1}(R\hat{\beta}-r)}{q}}{\frac{\frac{\hat{u}^{\top }\hat{u}}{\sigma^{2}}}{n-k}}$

$\displaystyle =\frac{(R\hat{\beta}-r)^{\top }[R(X^{\top }X)^{-1}R^{\top }]^{-1}(R\hat{\beta}-r)}{q\hat{\sigma}^{2}} \sim F^{q}_{n-k}$

(2.139)

If the null hypothesis $H_{0}:R\beta=r$ is true (that is to say, $R\beta-r=0$ ), then a small value of $R\hat{\beta}-r$ is expected. Consequently, small values for (2.139) are thought to be evidence in favour of $H_{0}$ . This means that this is a one-sided test. As in all tests, the decision rule to carry out the test can be summarized as follows:

a.: To calculate the value of the F-ratio ( $F^{*}$ ) expressed in (2.139).
b.: To search for the critical point $F_{\epsilon}$ of the F-Snedecor distribution for degrees of freedom, for a fixed level of significance $\epsilon$ .
c.: If $F^{*} < F_{\epsilon}$ , we conclude that there is evidence in favour of $H_{0}$ . Otherwise, we have evidence against it.

The previous stages constitute the general procedure of testing, which is based on the comparison of a statistic which is obtained from the sample, with the probability distribution which such a statistic should have if $H_{0}$ is true. Nevertheless, the result obtained can be very sensitive to the fixed level of significance, which is arbitrarily chosen (usually at 1, 5 or even 10 percent). In this sense, we could find that $H_{0}$ is rejected at $\epsilon=0.05$ , while it is accepted at $\epsilon=0.04$ , which leads researchers to obtain different conclusions if they have different opinions about the adequate value of $\epsilon$ .

A way of solving this question consists of employing the so-called p-value provided by a sample in a specific test. It can be defined as the lowest significance level which allows us to reject $H_{0}$ , with the available sample:

$\displaystyle p-value=Pr(F \geq F^{*}\vert H_{0})$

which depends on the $F^{*}$ statistic value and the sample size.

It we use the p-value, the decision rule is modified in stages and as follows: to calculate the p-value, and if $p-value>\epsilon$ , $H_{0}$ is accepted. Otherwise, it is rejected.

Econometric softwar does not usually contain the general F-statistic, except for certain particular cases which we will discuss later. So, we must obtain it step by step, and it will not always be easy, because we have to calculate the inverses and products of matrices. Fortunately, there is a convenient alternative way involving two different residual sum of squares (): that obtained from the estimation of the MLRM, now denoted $RSS_{u}$ (unrestricted residual sum of squares), and that called restricted residual sum of squares, denoted $RSS_{R}$ . The latter is expressed as:

$\displaystyle RSS_{R}=\hat{u}_{R}^{\top }\hat{u}_{R}$

where $\hat{u}_{R}$ is the residuals vector corresponding to the restricted least squares estimator (RLS) which, as we will prove in the following section, is the coefficient vector value ( $\hat{\beta}_{R}$ ) that satisfies:

$\displaystyle \hat{\beta}_{R}=\arg\min_{\hat{\hat{\beta}}}S(\hat{\hat{\beta}})$

subject to

$\displaystyle R\hat{\hat{\beta}}=r$

From both residual sum of squares ( $RSS_{R}$ and $RSS_{u}$ ), we obtain an alternative way of expressing (2.139) as:

$\displaystyle \frac{\frac{RSS_{R}-RSS_{u}}{q}}{\frac{RSS_{u}}{n-k}}$

(2.140)

The equivalence between these two alternative ways of expressing the F-statistic will be shown in the following section.

If we use (2.140) to test a linear hypothesis about $\beta$ , we only need to obtain the corresponding to both the estimation of the specified MLRM, and the estimation once we have substituted the linear restriction into the model. The decision rule does not vary: if $H_{0}$ is true, $RSS_{R}$ should not be much different from $RSS_{U}$ , and consequently, small values of the statistic provide evidence in favour of $H_{0}$ .

Having established the general F statistic, we now analyze the most useful particular cases.

2.7.2 Hypothesis Testing about a Coefficient of the MLRM

When the hypothesis to test has the form

$\begin{displaymath}\begin{array}{c} H_{0}:\beta_{j}=\beta_{j}^{0} \\ H_{A}:\beta_{j} \neq \beta_{j}^{0} \end{array}\end{displaymath}$

(2.141)

that is to say, the null hypothesis only contains one coefficient, the general statistics given in (2.139) can be expressed as:

$\displaystyle \frac{(\hat{\beta}_{j}-\beta_{j}^{0})^{2}}{\hat{\sigma}^{2}((X^{\top }X)^{-1})_{jj}}$

(2.142)

which follows an $F^{1}_{n-k}$ distribution.

To obtain (2.142) we must note that, under $H_{0}$ , the matrix becomes a row vector with zero value for each element, except for the $j^{th}$ element which has 1 value, and $r=\beta_{j}^{0}$ . Thus, the term $(R\hat{\beta}-r)$ becomes $(\hat{\beta}_{j}-\beta_{j}^{0})$ . Element $R(X^{\top }X)^{-1}R^{\top }$ becomes $((X^{\top }X)^{-1})_{jj}$ .

Moreover, we know that the squared root of the F random variable expressed in (2.142) follows a t-student whose degrees of freedom are those of the denominator of the F distribution, that is to say,

$\displaystyle \frac{(\hat{\beta}_{j}-\beta_{j}^{0})}{\sqrt{\hat{\sigma}^{2}((X^{\top }X)^{-1})_{jj}}}$

(2.143)

This t-statistic is usually computed when we want to test $H_{0}:\beta_{j}=\beta_{j}^{0}$ .

It must be noted that, given the form of $H_{A}$ in (2.141), (2.143) is a two-tailed test, so once we have calculated the statistic value $t^{*}$ , $H_{0}$ is rejected if $\vert t^{*}\vert \geq t_{\frac{\epsilon}{2}}$ .

An interesting particular case of the t-statistic consists of testing $\beta_{j}^{0}=0$ , which simplifies (2.143), yielding:

$\displaystyle \frac{\hat{\beta}_{j}}{\hat{\sigma}\sqrt{((X^{\top }X)^{-1})_{jj}}}$

(2.144)

which is known as the "t-ratio". This is the appropriate statistic to test whether the corresponding explanatory variable $x_{j}$ has no statistically significant linear influence on the dependent variable. If we find evidence in favour of $H_{0}$ , we conclude that $x_{j}$ is not important to explain

. If we test the intercept, the result only allows us to decide if we have to include a constant term in the MLRM.

The statistic given in (2.143) is the same as (2.122), which was derived in order to obtain the interval estimation for a $\beta_{j}$ coefficient. This leads us to conclude that there is an equivalence between creating a confidence interval and carrying out a two-tailed test of the hypothesis (2.141). In this sense, the confidence interval can be considered as an alternative way of testing (2.141). The decision rule will be: given a fixed level of significance $\epsilon$ and calculating a $100(1-\epsilon)$ percent confidence interval, if the $\beta_{j}$ value in $H_{0}$ ( $\beta_{j}^{0}$ ) belongs to the interval, we accept the null hypothesis, at a level of significance $\epsilon$ . Otherwise, $H_{0}$ should be rejected. Obviously, this equivalence holds if the significance level in the interval is the same as that of the test.

2.7.3 Testing the Overall Significance of the Model

This test tries to verify if all the coefficients, except the intercept are jointly significant, that is to say,

$\begin{displaymath}\begin{array}{cc} H_{0}: \beta_{(2)}=0_{k-1} & H_{A}: no \quad H_{0} \end{array}\end{displaymath}$

(2.145)

where, as we know, $\beta_{(2)}$ is a

vector which includes the coefficients of interest. In order to test (2.145), the matrix

and

in (2.139) are:

$\displaystyle R= \begin{pmatrix}0 & 1 & 0 & \ldots & 0 \\ 0& 0 & 1 & \ldots & 0... ...& 0 & \ldots & 1 \end{pmatrix} = \begin{pmatrix}0_{k-1} & I_{k-1} \end{pmatrix}$

(2.146)

$\displaystyle r=0_{k-1}$

(2.147)

and then $(R\hat{\beta}-r)$ becomes $(\hat{\beta}_{2},\hat{\beta}_{3},\ldots,\hat{\beta}_{k})^{\top }$ = $\hat{\beta}_{(2)}$ . Matrix

can be partitioned as $(\imath,X_{2})$ (as we have seen when we expressed the MLRM in deviations), in such a way that $R(X^{\top }X)^{-1}R^{\top }$ becomes matrix $(X^{\top }X)^{-1}$ adjusted by eliminating the first row and the first column. The results about the inverse of a partitioned matrix (see Greene (1993)) allow us to prove that $[R(X^{\top }X)^{-1}R^{\top }]^{-1}=((X_{2}^{D})^{\top }X_{2}^{D})$ , with $X_{2}^{D}$ being the $n\times(k-1)$ matrix with the variables in deviations, which was defined earlier. Thus, the statistic gives:

$\displaystyle \frac{\hat{\beta}_{2}^{\top }(X_{2}^{D})^{\top }X_{2}^{D}\hat{\beta}_{(2)}}{(k-1)\hat{\sigma}^{2}} \sim F^{k-1}_{n-k}$

(2.148)

If the value of (2.148) is larger than the corresponding critical point $F_{\epsilon}$ , we can accept that $\beta_{(2)}$ is significantly different from zero, that is to say, the set of regressors is important for explaining

. In other words, we conclude that, as a whole, the model is adequate.

Nevertheless, the F statistic (2.148) has an alternative form as a function of the explained sum of squares . To prove it, we begin by considering:

$\displaystyle \hat{y}^{D}=X_{2}^{D}\hat{\beta}_{(2)}$

in such a way that

can be expressed as

$\displaystyle (\hat{y}^{D})^{\top }\hat{y}^{D}=\hat{\beta}_{(2)}^{\top }(X_{2}^{D})^{\top }X_{2}^{D}\hat{\beta}_{(2)}$

(2.149)

which is the numerator of expression (2.148), which can be rewritten as:

$\displaystyle \frac{\frac{(\hat{y}^{D})^{\top }\hat{y}^{D}}{k-1}}{\frac{\hat{u}^{\top }\hat{u}}{n-k}}= \frac{\frac{ESS}{k-1}}{\frac{RSS}{n-k}}$

(2.150)

Furthermore, from the definition of $R^{2}$ given in (2.130) we can deduce that:

$\displaystyle \frac{\frac{ESS}{k-1}}{\frac{RSS}{n-k}}=\frac{\frac{R^{2}}{k-1}}{\frac{1-R^{2}}{n-k}}$

(2.151)

We must note that the equivalence between (2.148) and (2.151) is only given when the MLRM has a constant term.

2.7.4 Testing Hypothesis about $\sigma ^{2}$

The earlier mentioned relationship between the confidence interval and hypothesis testing, allows us to derive the test of the following hypothesis easily:

$\begin{displaymath}\begin{array}{c} H_{0}:\sigma^{2}=\sigma_{0}^{2}\\ H_{A}:\sigma^{2} \neq \sigma_{0}^{2} \end{array}\end{displaymath}$

(2.152)

with $\sigma_{0}^{2}$ $\geq$ 0. Under $H_{0}$ , the statistic to test (2.152) is that given in (2.125):

$\displaystyle \frac{(n-k)\hat{\sigma}^{2}}{\sigma^{2}} \sim \chi^{2}_{n-k}$

The decision rule consists of rejecting $H_{0}$ if the value of the statistic $(\chi^{2})^{*} \leq \chi^{2}_{\frac{\epsilon}{2}}$ or $(\chi^{2})^{*} \geq \chi^{2}_{1-\frac{\epsilon}{2}}$ . Otherwise, $H_{0}$ will be accepted. In other words, fixing a level of significance, $H_{0}$ is accepted if $\sigma_{0}^{2}$ belongs to the confidence interval for $\sigma ^{2}$ .

2.7.5 Example

Now, we present the quantlet 10266 linreg in the stats quantlib which allows us to obtain the main measures of fit and testing hypothesis that we have just described in both this section and the previous section.

{beta,bse,bstan,bpval}= linreg ( x, y ): estimates the parameters of a MLRM and obtains the main statistics.

For the example of the consumption function which we presented in previous sections, the quantlet XEGmlrm05.xpl obtains the statistical information

XEGmlrm05.xpl

The column represents the squared sum of the regression (ESS), the squared sum of the residuals (RSS) and the total squared sum (TSS). The column represents the means of calculated by dividing by the corresponding degrees of freedom(df). The F-test is the statistic to test $H_{0}:\beta_{2}=\beta_{3}=0$ , which is followed by the corresponding p-value. Afterwards, we have the measures of fit we presented in the previous section, that is to say, $R^{2}$ , adjusted- $R^{2}$ ( $\bar{R}^{2}$ ), and Standard Error (SER). Moreover, multiple R represents the squared root of $R^{2}$ .

Finally, the output presents the columns of the values of the estimated coefficients (beta) and their corresponding standard deviations (SE). It also presents the t-ratios (t-test) together with their corresponding p-values. By observing the p-values, we see that all the p-values are very low, so we reject $H_{0}:\beta_{j}=0$ , whatever the significance level (usually 1, 5 or 10 percent), which means that all the coefficients are statistically significant. Moreover, the p-value of the F-tests also allows us to conclude that we reject $H_{0}:\beta_{2}=\beta_{3}=0$ , or in other words, the overall regression explains the variable. Finally, with this quntlet it is also possible to illustrate the computation of the F statistic to test the hypothesis $H_{0}:\beta_{2}=1$ .