Prediction must be thought of as the final step of econometric methodology. Thus, in order to minimize the possible errors in the predictions, we must have evidence in favour of the stability of the model during the sample period, and also of the accuracy of the estimated coefficients.
The first of these two aspects refers to the maintenance of the classical assumption of coefficient stability, which can be verified (among other approaches) by means of dummy variables, as we saw in the previous sections. Thus, if we have evidence in favour of the stability of the model during the sample period, we will maintain this evidence for the out-sample (prediction) period. Nevertheless, we are not sure that this stability holds in this out-sample period.
The second aspect refers to the maintenance of the classical assumptions in every specific application. We have proved in previous sections of this chapter that, under the classical assumptions, the OLS and ML estimators of the coefficients satisfy desirable properties in both finite samples and in the asymptotic framework. Thus, the violation of one or more "ideal conditions" in an empirical application can affect the properties of the estimators, and then, it can be the cause of some errors in the prediction.
Similarly to the estimation stage, we can obtain two class of predictions: point prediction and interval prediction.
If we assume that model (2.228) is stable, the relation for the out-sample period is given by:
In the following, we consider , i.e. we are interested in
the prediction one period ahead; nevertheless, this can be easily
generalized.
First, we focus on the prediction of the mean value of
which, from (2.229) and under classical
assumptions, is:
Using the Gauss-Markov theorem, it follows that
If we are interested in the prediction for itself, the
best linear unbiased predictor is still
. This result
can be easily explained because the only difference between
and
is the error term
. So, taking into
account that the best prediction of
is zero (given that
the best prediction of a random variable is its expectation when
we have no sample information), we derive that the OLS predictor
is still optimum for predicting
.
The prediction error () is defined as the difference
between the variable we want to predict and the prediction. Thus,
we have:
The previous result has been obtained from (2.54) which
establishes
. We
have also taken into account that the new disturbance
is
not correlated with the disturbances in the sample, so the last
term is null, and:
Similarly to the interval estimation and given that
is an unknown parameter, we
obtain:
Therefore, the
percent confidence interval for
is given by:
Note that the statistic (2.237) or the interval
prediction (2.238) allow us to carry out testing
hypotheses on the value of
.
With respect to the interval prediction for , we use
(2.235) to derive:
Therefore, the
percent confidence interval for
is given by:
The statistic (2.240) or the confidence interval
(2.241) lead to a test of the hypothesis that a new
data point
is generated by the same structure
as that of the sample data. This test is called test for
stability. We denote
as the real value of the endogenous
variable in the period
, and
as the row vector
with the
real observation of the
explanatory
variables. This vector
could not be the same as the
vector
, because the latter has been used to carry
out a conditioned prediction. In other words, we want to test the
model stability hypothesis at the prediction period. In this
context, the t-value for this new data point is given by:
If there is no structural change in the out-sample period, the
difference between (real value) and
(predicted value) should not be large and so,
(2.242) should tend to adopt a small value. Thus,
fixing a significance level
, when
, we can conclude that the new
observation may be generated by a different structure. That is to
say, the null hypothesis of structural stability can be rejected
for the out-sample period.
We can generalize this test for the case of several out-sample periods. In this situation, the statistic test becomes:
Once the prediction periods has passed, the researcher knows the true values of the dependent variable, and then, he can evaluate the goodness of the obtained predictions. With this aim, various measures have been proposed. Most of them are based on the residuals from the forecast.
The latter measure reflects the ability of model to track the turning points in the data.
|
On the basis of this information, we obtain the point and interval
prediction of and
. When the period 1998 passes, we
know that the true value of
for the first, second, third and
fourth quarters are 11432012, 12113995, 11813440 and 12585278. The
explanatory variables adopt the values we previously supposed.
This new information allows us to test the hypothesis of stability
model in the out-sample period. Additionally we calculate some
measures of accuracy. The following quantlet provides the
corresponding results:
Note that the p-value is high, and so we accept the null hypothesis of stability in the out-sample periods, for the usual significance levels. Additionally the values of RMSEP and the statistic U of Theil are low, which means a good forecasting performance.