Under the classical assumptions of the MLRM, in the section on the testing hypotheses we have derived appropriate finite sample test statistics in order to verify linear restrictions on the coefficients. Nevertheless, these exact tests are not always available, so in these cases it is very useful to consider the following three approaches, which allow us to derive large sample tests, which are asymptotically equivalent. Several situations which require making use of these tests will be presented in following chapters of this book. In this chapter we now focus on their general derivation, and on their illustration in the context of an MLRM under classical assumptions.
All three test procedures are developed within the framework of ML estimation and they use the information included in the log-likelihood in different but asymptotically equivalent ways.
The general framework to implement these principles is defined as follows:
Formal derivation of these tests can be found, for example, in Davidson and MacKinnon (1993).
This test is based on the distance between the log-likelihood function evaluated at the ML and the RML estimators. Thus, it is defined as:
Taking (2.181) it can be thought that if the
restriction
is true when it is included, the
log-likelihood should not reduce its value by a significant
amount and thus, both
and
should be similar. Given that the
inequality
always holds (because a maximum subject to restrictions is never
larger than an unrestricted maximum), significant discrepancies
between both estimated log-likelihoods can be thought of as
evidence against
, since the RML estimator moves far away
from the unrestricted ML.
Another way of understanding what underlies this test focuses on
the asymptotic properties of the ML estimators under correct
specification. Given several regularity conditions, the ML
estimators are consistent, asymptotically efficient and their
asymptotic distribution is normal. Moreover, it is shown that the
RML estimators are consistent when the restrictions are true
(correct a priori information). According to these results, we can
say that, if is true, then both ML and RML estimators are
consistent, so it is expected that
. Thus, small values of
(2.181) provide evidence in favour of the null
hypothesis.
As it was earlier described, the decision rule consists of, for a
fixed significance level , comparing the value of the LR
statistic for a given sample (
) with the corresponding
critical point
(with
degrees of
freedom), and concluding the rejection of
if
. Equivalently, we reject
if
the p-value is less than
.
According to the result:
Given that
is consistent, if
is true, we
expect that
takes a value close to zero and
consequently the value of W for a given sample (
) adopts
a small value. However,
is rejected if
. This amounts to saying that
is
rejected if
is "very distant" from zero.
Finally, we must note that the asymptotic information matrix,
which appears in (2.182), is usually non observable.
In order to be able to implement the test,
is substituted by
. Thus, the W statistic
for a given sample of size
is written as:
This asymptotic distribution is obtained from the result:
The idea which underlies this test can be thought of as follows. If
the restrictions of are true, the penalization for
including them in the model is minimum. Thus,
is close to zero, and
also tends to zero. Consequently, large
values of the statistic provide evidence against the null
hypothesis.
Again, we must note that expression (2.185) contains the asymptotic information matrix, which is a problem for implementing the test. In a similar way to that described in the Wald test, we have:
If we remember the restricted maximization problem, which in our case is solved by means of the Lagrange function:
From the first set of first-order conditions in
(2.188) one can deduce that
.
Thus,
can be substituted by
, leading to the known score form of the LM
test (or simply the score test):
Again
is substituted by
, and the expression of the statistic for
a sample of size
is given by:
Very often, the test statistic is asymptotically equal
to
times the non centered
of an artificial linear
regression (for a more detailed description of this approach, see
Davidson and Mackinnon (1984).
The main property of these three statistics is that, under
, all of them tend to the same random variable, as
. This random variable is distributed as a
. In other words, in the asymptotic context we are
dealing with the same statistic, although they have very different
definitions. Thus, in large samples, it does not really matter
which of the three tests we use. The choice of which of the three
test statistics to use depends on the convenience in their
computation. LR requires obtaining two estimators (under
and
). If
is easy to compute but
is not, as may be the case of non linear
restrictions in a linear model, then the Wald statistic becomes
attractive. On the other hand, if
is easier
to compute than
, as is often the case of tests
for autocorrelation and heteroskedasticity, then the
test is
more convenient. When the sample size is not large, choosing from
the three statistics is complicated because of the fact that they
may have very different finite-sample properties.
These tests satisfy the "consistency of size
"
property, and moreover, they are "locally uniformly more
powerful". The first property means that, when
is true,
the probability of deciding erroneously (rejecting
) is
equal or less than the fixed significance level. The second
property implies that these tests have maximum power (the
probability of rejecting
when it is false) against
alternative hypotheses such as:
We shall now examine some questions related to the use of these tests. In a finite sample context, the asymptotic distribution used for the three tests is different from the exact distribution, which is unknown (except in some situations, as an MLRM under the classical assumptions), and furthermore, may not be equal in the three tests.
Moreover, once a significance level is adopted, the
same critical point
is used in the three
cases because they have the same asymptotic distribution. But the
values the three tests take, given the same sample data, are
different, so this can lead to opposite conclusions. Specifically,
it has been shown that for most models the following holds:
In order to obtain the form which the LR, W and LM adopt, when the
linear restrictions (2.191) are tested, it is
convenient to remember some results that were obtained in the
previous sections referring to the ML and the RML estimation.
First, the set of parameters of the MLRM are denoted:
We now substitute the two last expressions in the general form of the LR test, to obtain:
In order to derive the form of the Wald test in this context, we remember the general expression of this test which is presented in (2.184):
Then, from (2.197) and (2.206) we get:
With respect to the LM test, it must be remembered that it had two alternative forms, and both will be considered in this illustration.
If we focus on the first one, which was written as:
![]() |
(2.209) |
Now, we shall consider the second form of the LM test, which is given by:
According to the expressions (2.195) and (2.197), evaluated at the RML estimator vector, we have:
Having presented the corresponding expressions of the three statistics for testing linear restrictions, it is convenient to derive a last result which consists in obtaining the each statistic as a function of the general F test given in (2.139) or (2.140). If we take this last expression, we have that:
![]() |
(2.213) |
![]() |
(2.214) |
With the aim of testing the same restriction as in previous
sections, in the quantlet XEGmlrm07.xpl we calculate the
three asymptotic tests LR, W and LM. Note that the RML and ML
estimation of and
, were obtained in the
previous quantlet,
XEGmlrm06
.
The p-values associated with the LR, W and LM statistics show that the null hypothesis is rejected in all the cases.