1.4 Forecasting

An apparently different problem, but in actually very close to parameter estimation, is that of forecasting. We consider a situation where there is a data set available on both $ Y$ and $ X$ for elements $ 1$ to $ n$. We can not only estimate the relationship between $ Y$ and $ X$. With this estimation, we can use it to forecast or predict the value of the variable $ Y$ for any given value of $ X$. Suppose that $ x^\star$ is a known value of the regressor, and we are interested in predicting $ y^\star$, the value of $ Y$ associated with $ x^\star$.

It is evident that, in general, if $ X$ takes the value $ x^\star$, the predicted value of $ y^\star$, is given by:

$\displaystyle \hat y^\star=\hat\alpha+\hat\beta x^\star$ (1.81)

The conditional mean of the predictor of $ Y$ given $ X=x^\star$ is

$\displaystyle E(\hat y \vert X=x^\star )=E(\hat\alpha)+x^\star E(\hat\beta)= \alpha +\beta x^\star = E( Y\vert X=x^\star)$ (1.82)

Thus, $ \hat y^\star$ is an unbiased conditional predictor of $ y^\star$.


1.4.1 Confidence Interval for the Point Forecast

Because $ \alpha $ and $ \beta $ are estimated with imprecision, $ \hat y^\star$ is also subject to error. To take account of this, we compute the variance and confidence interval for the point predictor. The prediction error is:

$\displaystyle \hat u^\star=y^\star-\hat y^\star=(\alpha-\hat\alpha)+(\beta-\hat\beta)x^\star+u^\star$ (1.83)

Clearly the expected prediction error is zero. The variance of $ u^\star$ is

$\displaystyle var(\hat u^\star)=\sigma^2\left(1+\frac{1}{n}+\frac{(x^\star-\bar x) ^2}{\sum_{i=1}^n(x_i-\bar x)^2}\right)$ (1.84)

We see that $ \hat u^\star$ is a linear combination of normally distributed variables. Thus, it is also normally distributed. and so

$\displaystyle \frac{\hat u^\star}{\sigma\sqrt{1+\frac{1}{n}+\frac{(x^\star-\bar x) ^2}{\sum_{i=1}^n(x_i-\bar x)^2}}}\sim \textrm{N}(0,1)$ (1.85)

By inserting the sample estimate $ \hat\sigma$ for $ \sigma$,

$\displaystyle \frac{\hat u^\star}{\hat\sigma \sqrt{1+\frac{1}{n}+\frac{(x^\star-\bar x) ^2}{\sum_{i=1}^n(x_i-\bar x)^2}}}\sim t_{(n-2)}$ (1.86)

We can construct a prediction interval for $ y^\star$ in the usual way, we derive a $ (1-\epsilon)$ per cent forecast interval for $ y^\star$

$\displaystyle (\hat\alpha+\hat\beta x^\star)\pm t_{\epsilon /2,(n-2)} \hat\sigma \sqrt{1+\frac{1}{n}+\frac{(x^\star-\bar x) ^2}{\sum_{i=1}^n(x_i-\bar x)^2} }$ (1.87)

where $ t_{\epsilon /2,(n-2)}$ is the critical value from the $ t$ distribution with $ (n-2)$ degrees of freedom.


1.4.2 Example

We implement the following experiment using the following Quantlet. We generate a sample $ n=20$ of the following data generating process: $ y_i=2+0.5 x_i+u_i$, the vector of explanatory variables is $ X=[8,...,27]$. First of all, we estimate $ \alpha $ and $ \beta $, then we obtain predictions for several values of $ X$.

5065 XEGlinreg16.xpl

In this program, the vector of $ X$ takes values from $ 8$ to $ 27$ for the estimation, after this we want to calculate a interval prediction for $ X=[1,...,60]$. This procedure gives the Figure 1.11

Figure 1.11: Interval prediction
\includegraphics[width=0.59\defpicwidth]{predic.ps}


1.4.3 Confidence Interval for the Mean Predictor

The sample given in the previous section is that for predicting a point. We also like the variance of the mean predictor. The variance of the prediction error for the mean $ (\hat u_m^\star)$ is

$\displaystyle var(\hat u_m^\star)=\sigma^2\left(\frac{1}{n}+\frac{(x^\star-\bar x) ^2}{\sum_{i=1}^n(x_i-\bar x)^2}\right)$ (1.88)

We see that $ \hat u_m^\star$ is a linear combination of normally distributed variables. Thus, it is also normally distributed. By inserting the sample estimate $ \hat\sigma$ for $ \sigma$

$\displaystyle \frac{\hat u_m^\star}{\hat\sigma \sqrt{\frac{1}{n}+\frac{(x^\star-\bar x) ^2}{\sum_{i=1}^n(x_i-\bar x)^2}}}\sim t_{(n-2)}$ (1.89)

The $ (1-\epsilon)$ per cent confidence interval of the mean forecast is given by

$\displaystyle (\hat\alpha+\hat\beta x^\star)\pm t_{\epsilon /2,(n-2)} \hat\sigma \sqrt{\frac{1}{n}+\frac{(x^\star-\bar x) ^2}{\sum_{i=1}^n(x_i-\bar x)^2} }$ (1.90)

where $ t_{\epsilon /2,(n-2)}$ is the critical value from the $ t$ distribution with $ (n-2)$ degrees of freedom.