2.11 Forecasting

A possible use of an MLRM consists of obtaining predictions for the endogenous variable when the set of regressors adopt a given value. That is to say, an MLRM can only provide predictions conditioned to the value of the regressors.

Prediction must be thought of as the final step of econometric methodology. Thus, in order to minimize the possible errors in the predictions, we must have evidence in favour of the stability of the model during the sample period, and also of the accuracy of the estimated coefficients.

The first of these two aspects refers to the maintenance of the classical assumption of coefficient stability, which can be verified (among other approaches) by means of dummy variables, as we saw in the previous sections. Thus, if we have evidence in favour of the stability of the model during the sample period, we will maintain this evidence for the out-sample (prediction) period. Nevertheless, we are not sure that this stability holds in this out-sample period.

The second aspect refers to the maintenance of the classical assumptions in every specific application. We have proved in previous sections of this chapter that, under the classical assumptions, the OLS and ML estimators of the coefficients satisfy desirable properties in both finite samples and in the asymptotic framework. Thus, the violation of one or more "ideal conditions" in an empirical application can affect the properties of the estimators, and then, it can be the cause of some errors in the prediction.

Similarly to the estimation stage, we can obtain two class of predictions: point prediction and interval prediction.


2.11.1 Point Prediction

We assume the data set is divided into subsets of size $ n$ (the sample size we have used to estimate) and $ n_{p}$ (the out-sample period), where $ n_{p}\geq 1$, but is small relative to $ n$. The idea consists of using the fitted model to generate predictions of $ y_{p}$ from $ X_{p}$. The general expression of an MLRM is given by:

\begin{displaymath}\begin{array}{cc} y_{i}=x^{\top }_{i}\beta+u_{i}& (i=1,\ldots,n) \end{array}\end{displaymath} (2.228)

where $ x^{\top }_{i}$ represents the row vector of the $ i^{th}$ observation of every regressor, including the intercept. We also consider that classical assumptions are satisfied.

If we assume that model (2.228) is stable, the relation for the out-sample period is given by:

\begin{displaymath}\begin{array}{cc} y_{p}=x^{\top }_{p}\beta+u_{p} & (p=n+1,\ldots,n_{p}) \ \end{array}\end{displaymath} (2.229)

where $ \textrm{E}(u_{p})=0$, $ var(u_{p})=\sigma^{2}$ and $ cov(u_{p}u_{i})=0$ ( $ \forall i=1,\ldots,n$) are satisfied.

In the following, we consider $ p=n+1$, i.e. we are interested in the prediction one period ahead; nevertheless, this can be easily generalized.

First, we focus on the prediction of the mean value of $ y_{p}$ which, from (2.229) and under classical assumptions, is:

$\displaystyle \textrm{E}(y_{p})=x^{\top }_{p}\beta
$

Using the Gauss-Markov theorem, it follows that

$\displaystyle \hat{y}_{p}=x^{\top }_{p}\hat{\beta}$ (2.230)

is the best linear unbiased predictor of $ \textrm{E}(y_{p})$. We can intuitively understand this result, given that $ \textrm{E}(y_{p})$ is an unknown parameter ( $ x^{\top }_{p}\beta$). Thus, if we substitute $ \beta $ for $ \hat{\beta}$, which is BLUE, we obtain the best predictor of $ \textrm{E}(y_{p})$. In other words, among the class of linear and unbiased predictors, the OLS predictor has minimum variance. This variance is obtained as:

$\displaystyle var(\hat{y}_{p})=\textrm{E}[(\hat{y}_{p}-\textrm{E}(\hat{y}_{p}))...
...y}_{p}-\textrm{E}(\hat{y}_{p}))(\hat{y}_{p}-\textrm{E}(\hat{y}_{p}))^{\top }]=
$

$\displaystyle \textrm{E}[(x^{\top }_{p}\hat{\beta}-x^{\top }_{p}\beta)(x^{\top ...
...=\textrm{E}[x^{\top }_{p}(\hat{\beta}-\beta)(\hat{\beta}-\beta)^{\top }x_{p}]=
$

$\displaystyle x^{\top }_{p}\textrm{E}[(\hat{\beta}-\beta)(\hat{\beta}-\beta)^{\top }]x_{p}=\sigma^{2}x^{\top }_{p}(X^{\top }X)^{-1}x_{p}$ (2.231)

where we have used result (2.56) about the variance-covariance matrix of $ \hat{\beta}$.

If we are interested in the prediction for $ y_{p}$ itself, the best linear unbiased predictor is still $ \hat{y}_{p}$. This result can be easily explained because the only difference between $ y_{p}$ and $ \textrm{E}(y_{p})$ is the error term $ u_{p}$. So, taking into account that the best prediction of $ u_{p}$ is zero (given that the best prediction of a random variable is its expectation when we have no sample information), we derive that the OLS predictor $ \hat{y}_{p}$ is still optimum for predicting $ y_{p}$.

The prediction error ($ e_{p}$) is defined as the difference between the variable we want to predict and the prediction. Thus, we have:

$\displaystyle e_{p}=y_{p}-\hat{y}_{p}=x^{\top }_{p}\beta+u_{p}-x^{\top }_{p}\hat{\beta}=u_{p}-x^{\top }_{p}(\hat{\beta}-\beta)$ (2.232)

in such a way that its expected value is zero, and its variance is:

$\displaystyle var(e_{p})=\textrm{E}(e_{p}-\textrm{E}(e_{p}))^{2}=\textrm{E}(e_{p})^{2}=\textrm{E}[u_{p}-x^{\top }_{p}(\hat{\beta}-\beta)]^{2}=
$

$\displaystyle \textrm{E}[u_{p}^{2}+x^{\top }_{p}(\hat{\beta}-\beta)(\hat{\beta}-\beta)^{\top }x_{p}-2x^{\top }_{p}(\hat{\beta}-\beta)u_{p}]=
$

$\displaystyle \textrm{E}[u_{p}^{2}+x^{\top }_{p}(\hat{\beta}-\beta)(\hat{\beta}-\beta)^{\top }x_{p}-2x^{\top }_{p}(X^{\top }X)^{-1}X^{\top }uu_{p}]
$

The previous result has been obtained from (2.54) which establishes $ \hat{\beta}-\beta=(X^{\top }X)^{-1}X^{\top }u$. We have also taken into account that the new disturbance $ u_{p}$ is not correlated with the disturbances in the sample, so the last term is null, and:

$\displaystyle var(e_{p})=\sigma^{2}[1+x^{\top }_{p}(X^{\top }X)^{-1}x_{p}]$ (2.233)

We can see that the variance of the prediction error adds $ var(u_{p})$ to the variance of the predictor given in (2.231).


2.11.2 Interval Prediction

To provide an interval prediction of $ \textrm{E}(y_{p})$ or $ y_{p}$, we begin by establishing the sample distributions of $ \hat{y}_{p}$ and $ e_{p}$. From (2.230) and (2.232) we can see that both $ \hat{y}_{p}$ and $ e_{p}$ are linear combinations of normal random variables, so they also follow a normal distribution. Specifically:

$\displaystyle \hat{y}_{p} \sim N[x^{\top }_{p}\beta,\sigma^{2}x^{\top }_{p}(X^{\top }X)^{-1}x_{p}]$ (2.234)

$\displaystyle e_{p} \sim N[0,\sigma^{2}(1+x^{\top }_{p}(X^{\top }X)^{-1}x_{p})]$ (2.235)

Similarly to the interval estimation and given that $ \textrm{E}(y_{p})=x^{\top }_{p}\beta$ is an unknown parameter, we obtain:

$\displaystyle \frac{\hat{y}_{p}-\textrm{E}(y_{p})}{\sigma\sqrt{x^{\top }_{p}(X^{\top }X)^{-1}x_{p}}}\sim N(0,1)$ (2.236)

Again, we must eliminate the unknown $ \sigma ^{2}$ parameter, so from (2.125) and using the independence between the variables (2.236) and (2.125), we have:

$\displaystyle \frac{\hat{y}_{p}-\textrm{E}(y_{p})}{\hat{\sigma}\sqrt{x^{\top }_...
...}}=\frac{\hat{y}_{p}-\textrm{E}(y_{p})}{\hat{\sigma}_{\hat{y}_{p}}}\sim t_{n-k}$ (2.237)

where $ \hat{\sigma}_{\hat{y}_{p}}$ denotes the estimated standard deviation of the predictor.

Therefore, the $ 100(1-\epsilon)$ percent confidence interval for $ \textrm{E}(y_{p})$ is given by:

$\displaystyle \hat{y}_{p}\pm t_{\frac{\epsilon}{2}}\hat{\sigma}_{\hat{y}_{p}}$ (2.238)

with $ t_{\frac{\epsilon}{2}}$ being the critical point of the t-distribution.

Note that the statistic (2.237) or the interval prediction (2.238) allow us to carry out testing hypotheses on the value of $ \textrm{E}(y_{p})$.

With respect to the interval prediction for $ y_{p}$, we use (2.235) to derive:

$\displaystyle \frac{e_{p}}{\sigma\sqrt{1+x^{\top }_{p}(X^{\top }X)^{-1}x_{p}}}=...
...{p}-\hat{y}_{p}}{\sigma\sqrt{1+x^{\top }_{p}(X^{\top }X)^{-1}x_{p}}}\sim N(0,1)$ (2.239)

and then, in order to eliminate $ \sigma ^{2}$,(2.239) is transformed to obtain:

$\displaystyle \frac{{y}_{p}-\hat{y}_{p}}{\hat{\sigma}\sqrt{1+x^{\top }_{p}(X^{\top }X)^{-1}x_{p}}}=\frac{{y}_{p}-\hat{y}_{p}}{\hat{\sigma}_{e_{p}}}\sim t_{n-k}$ (2.240)

with $ \hat{\sigma}_{e_{p}}$ being the standard deviation of the prediction error associated with $ y_{p}$. Again, result (2.240) is based on the independence between the distributions given in (2.239) and (2.125).

Therefore, the $ 100(1-\epsilon)$ percent confidence interval for $ y_{p}$ is given by:

$\displaystyle \hat{y}_{p}\pm t_{\frac{\epsilon}{2}}\hat{\sigma}_{e_{p}}$ (2.241)

The statistic (2.240) or the confidence interval (2.241) lead to a test of the hypothesis that a new data point $ (y_{p},x_{p}^{*t})$ is generated by the same structure as that of the sample data. This test is called test for stability. We denote $ y_{p}$ as the real value of the endogenous variable in the period $ p$, and $ x_{p}^{*t}$ as the row vector with the $ p^{th}$ real observation of the $ k$ explanatory variables. This vector $ x_{p}^{*t}$ could not be the same as the vector $ x_{p}^{\top }$, because the latter has been used to carry out a conditioned prediction. In other words, we want to test the model stability hypothesis at the prediction period. In this context, the t-value for this new data point is given by:

$\displaystyle t=\frac{{y}_{p}-x_{p}^{*t}\hat{\beta}}{\hat{\sigma}\sqrt{1+x^{*t}_{p}(X^{\top }X)^{-1}x^{*}_{p}}}$ (2.242)

If there is no structural change in the out-sample period, the difference between $ y_{p}$ (real value) and $ \hat{y}_{p}$ (predicted value) should not be large and so, (2.242) should tend to adopt a small value. Thus, fixing a significance level $ \epsilon$, when $ \vert t\vert>t_{\frac{\epsilon}{2}}$, we can conclude that the new observation may be generated by a different structure. That is to say, the null hypothesis of structural stability can be rejected for the out-sample period.

We can generalize this test for the case of several out-sample periods. In this situation, the statistic test becomes:

$\displaystyle \frac{\frac{RSS_{R}-RSS_{u}}{n_{2}}}{\frac{RSS_{u}}{n_{1}-k}}$ (2.243)

which follows a F-Snedecor with $ n_{1}$ (the sample size) and $ n_{2}-k$ degrees of freedom, where $ n_{2}$ is the out-sample size. In 2.243, $ RSS_{R}$ is the residual sum of squares from the regression based on $ (n_{1}+n_{2})$ observations, and $ RSS_{U}$ is the residual sum of squares from the regression based on $ n_{1}$ observations.


2.11.3 Measures of the Accuracy of Forecast

Once the prediction periods has passed, the researcher knows the true values of the dependent variable, and then, he can evaluate the goodness of the obtained predictions. With this aim, various measures have been proposed. Most of them are based on the residuals from the forecast.

The latter measure reflects the ability of model to track the turning points in the data.


2.11.4 Example

To carry out the forecasting of the value of the consumption for the out-sample period 1998, we need information about the value of the explanatory variables in this period. In this sense, we suppose that the values of $ x_{2}$ (exports) and $ x_{3}$ (M1) are those of Table 2.2.


Table 2.2: Assumed values for $ x_2$ and $ x_3$
    X2 X3    
  quarter 1 5403791 19290.1904    
  quarter 2 5924731 19827.894    
  quarter 3 5626585 20356.7381    
  quarter 4 5749398 20973.5399    


On the basis of this information, we obtain the point and interval prediction of $ y$ and $ \textrm{E}(y)$. When the period 1998 passes, we know that the true value of $ y$ for the first, second, third and fourth quarters are 11432012, 12113995, 11813440 and 12585278. The explanatory variables adopt the values we previously supposed. This new information allows us to test the hypothesis of stability model in the out-sample period. Additionally we calculate some measures of accuracy. The following quantlet provides the corresponding results:

14116 XEGmlrm09.xpl

Note that the p-value is high, and so we accept the null hypothesis of stability in the out-sample periods, for the usual significance levels. Additionally the values of RMSEP and the statistic U of Theil are low, which means a good forecasting performance.