6.2 ARCH(1) Model

The basic model we consider is a parameterisation of the conditional variance of the time series based on the order one lag on squared past recent perturbations. For the sake of simplicity, we assume that the time series $ y_{t} $ has no structure in the mean, is conditionally gaussian and, furthermore, that the conditional variance is time dependent.

$\displaystyle y_{t}$ $\displaystyle =$ $\displaystyle u_{t}\cr
u_{t}$ (6.1)

A process that satisfies these three conditions is called autoregressive conditional heteroscedastic of order one.

This basic ARCH(1) process can be formulated as a linear model on squared perturbations. Let $ \upsilon_{t}=u_{t}^{2}-\sigma_{t}^{2}$, so that the square error can be written as

$\displaystyle u_{t}^{2}=\alpha_{0}+\alpha_{1}u_{t-1}^{2}+ \upsilon_t.$

Because $ \textrm{E}(\upsilon_{t}\vert I_{t-1})=0$, where $ I_{t-1}$ is the information set up to time $ t$; the law of iterated expectations reveals that $ \upsilon_{t}$ has zero mean and is serially uncorrelated. Therefore, $ u_{t}^{2}$ has an AR(1) representation, where $ \upsilon_{t}$ is a non-gaussian white noise.

This ARCH process can be included as the innovation model of several other linear models (ARMA models, regression models, $ \dots$).


6.2.1 Conditional and Unconditional Moments of the ARCH(1)

The derivation of the unconditional moments of the ARCH(1) process is possible through extensive use of the law of iterated expectations on conditional distributions. Then, the following expressions are satisfied:

$\displaystyle \textrm{E}(u_{t}\vert I_{t-1})$ $\displaystyle =$ $\displaystyle 0\cr V(u_{t}\vert I_{t-1})$ (6.2)

hence,

$\displaystyle u_{t}\vert I_{t-1} \sim N(0,\sigma_{t}^{2})$ (6.3)

Therefore,
$\displaystyle \textrm{E}(u_{t})$ $\displaystyle =$ $\displaystyle E[\textrm{E}(u_{t}\vert I_{t-1})] = 0 \cr \textrm{E}(u_{t}^{2})$  

which is a linear difference equation for the sequence of variances. Assuming the process began infinitely far in the past with a finite initial variance, the sequence of variances converge to the constant

$\displaystyle \sigma^{2}=\frac{\alpha_{0}}{1-\alpha_{1}},\quad \alpha_1< 1.$

When this unconditional variance exists, the prior information does not give any information to forecast the volatility at infinite horizon.

The difference between the conditional and the unconditional variance is a simple function of the deviation of squared innovations from their mean. Let $ \sigma_{t}^{2}-\sigma^{2}=\alpha_{1}(u_{t-1}^{2}-\sigma^{2})$, in the ARCH(1) model with $ \alpha_1 >0$. Then the variance of the current error $ u_{t}$, conditioned on the realised values of the lagged errors $ u_{t-1}$, is an increasing function of the magnitude of the lagged errors, irrespective of their signs. Hence, large errors of either sign tend to be followed by a large error of either sign, similarly, small errors of either sign tend to be followed by a small error of either sign.

The nature of the unconditional density of an ARCH(1) process can be analysed by the higher order moments. Indeed,

$\displaystyle \textrm{E}(u_{t}^{4}\vert I_{t-1})=\textrm{E}(\epsilon_{t}^{4}\si...
...})E[(\sigma_{t}^{2})^{2}\vert I_{t-1}]=
3(\alpha_{0}+\alpha_{1}u_{t-1}^{2})^{2}$

Applying once again the law of iterated expectations, we have

$\displaystyle \textrm{E}(u_{t}^{4})=E[\textrm{E}(u_{t}^{4}\vert I_{t-1})]=
3\textrm{E}(\alpha_{0}+\alpha_{1}u_{t-1}^{2})^{2}=$

$\displaystyle 3[\alpha_{0}^{2}+2\alpha_{0}\alpha_{1}\textrm{E}(u_{t-1}^{2})+\al...
..._{1}\frac{\alpha_{0}^{2}}{1-\alpha_{1}}+\alpha_{1}^{2}\textrm{E}(u_{t-1}^{4})].$

Assuming that the process is stationary both in variance and in the fourth moment, if $ \textrm{E}(u_{t}^{4})=c$,

$\displaystyle c=\frac{3\alpha_{0}^{2}[1-\alpha_{1}^{2}]}{(1-\alpha_{1})^{2}(1-3\alpha_{1}^{2})}.$

Simple algebra then reveals that the kurtosis is

$\displaystyle \kappa_{u}=\frac{\textrm{E}(u_{t}^{4})}{\sigma^{4}}=
\frac{3(1-\alpha_{1}^{2})}{1-3\alpha_{1}^{2}}$

which is clearly greater than 3 (kurtosis value of the normal distribution). Moreover, it is required that $ 3\alpha_{1}^{2}<1$ for the fourth moment and, consequently, the unconditional kurtosis is finite.

Hence, the unconditional distribution of $ u_{t}$ is leptokurtic. That is to say, the ARCH(1) process has tails heavier than the normal distribution. This property makes the ARCH process attractive because the distributions of asset returns frequently display tails heavier than the normal distribution.

The quantlet 25985 XEGarch05 generates an ARCH(1) series with unconditional variance equal to 1 and obtain the basic descriptive statistics.

[ 1,] " " [ 2,]
"=========================================================" [ 3,]
" Variable 1" [ 4,]
"=========================================================" [ 5,]
" " [ 6,] " Mean            0.0675013" [ 7,] " Std.Error 0.987465
Variance         0.975087" [ 8,] " " [ 9,] " Minimum -4.59634
Maximum           4.19141" [10,] " Range 8.78775" [11,] " " [12,]
" Lowest cases                  Highest cases " [13,] "
278:      -4.59634              49: 2.69931" [14,] "        383:
-3.34884             442: 2.76556" [15,] "        400:
-3.33363             399: 3.69674" [16,] "        226:
-3.2339             279: 4.17015" [17,] "         40:
-2.82524             287: 4.19141" [18,] " " [19,] " Median
0.0871746"
[20,] " 25% Quartile    -0.506585     75% Quartile     0.675945"
[21,] " " [22,] " Skewness        -0.123027     Kurtosis 8.53126"
[23,] " " [24,] " Observations                    500" [25,] "
Distinct observations           500" [26,] " " [27,] " Total
number of {-Inf,Inf,NaN}    0" [28,] " " [29,]
"=========================================================" [30,]
" "
25989 XEGarch05.xpl

We can see in the corresponding output that the unconditional standard error is not far from one. However, we can also observe a higher kurtosis and a wider range than we expect from a standardised white noise gaussian model.


6.2.2 Estimation for ARCH(1) Process

The process $ \{u_t\}_{t=1}^T$ is generated by an ARCH(1) process described in equations (6.1), where $ T$ is the total sample size. Although the process define by (6.1) has all observations conditionally normally distributed, the vector of observations is not jointly normal. Therefore, conditioned on an initial observation, the joint density function can be written as

$\displaystyle f(u) = \prod_{t=1}^T f(u_t\vert I_{t-1}).$ (6.4)

Using this result, and ignoring a constant factor, the log-likelihood function $ L(\alpha_0,\alpha_1)$ for a $ T$ sample size is

$\displaystyle L(\alpha_0,\alpha_1) = \sum_{t=1}^T l_t
$

where the conditional log-likelihood of the $ t$th observation for $ (\alpha_0,\alpha_1)$ is,
$\displaystyle l_t$ $\displaystyle =$ $\displaystyle -\frac{1}{2}\log(\sigma_t^2) -\frac{1}{2}\frac{u_t^2}
{\sigma^2_t} \cr$ (6.5)

The first order conditions to obtain the maximum likelihood estimator are:

$\displaystyle \frac{\partial l_t}{\partial \alpha_0}$ $\displaystyle =$ $\displaystyle \frac{1}{2(\alpha_0 +
\alpha_1 u^2_{t-1})} \left(\frac{u_t^2}{\alpha_0 + \alpha_1
u^2_{t-1}}-1\right)\cr \frac{\partial l_t}{\partial \alpha_1}$ (6.6)

More generally, the partial derivation of $ L$ is:

$\displaystyle \frac{\partial L}{\partial \alpha} = \sum_t \frac{1}{2\sigma_t^2}...
...ht) = \sum_t \frac{1}{2\sigma_t^2} z_t \left(\frac{u_t^2}{\sigma_t^2}-1 \right)$ (6.7)

where $ z_t^\top = (1, u_{t-1}^2)$.

6.2.2.0.1 Example

In the quantlet 26257 XEGarch06 , we simulate an ARCH(1) process and plot the likelihood function of the $ \alpha_1$ parameter. Although the log-likelihood function depends on $ \alpha=(\alpha_0,\alpha_1)$ we have simplified it by imposing the restriction $ \hat\alpha_0=\hat\sigma^2(1-\hat\alpha_1)$ where $ \hat\sigma^2$ is an unconditional variance estimate.

Figure 6.5: Log-likelihood function of ARCH(1) simulated data. Vertical line marks the true parameter value
\includegraphics[width=0.59\defpicwidth]{archlike.ps}

The ML estimators $ \hat\alpha=
(\hat\alpha_0,\hat\alpha_1)^\top $, under the usual assumptions, are asymptotically normal

$\displaystyle \sqrt{T}(\hat\alpha-\alpha)\rightarrow N(0,I_{\alpha\alpha}^{-1})$

where

$\displaystyle I_{\alpha\alpha} = -E \left[ \frac{\partial^2 l_t}
{\partial \alp...
...0\alpha_1} \\
I_{\alpha_1\alpha_0} & I_{\alpha_1\alpha_1} \end{matrix}\right)
$

where $ I_{\alpha\alpha}$ must be approximated.

The elements of the Hessian matrix are:

$\displaystyle \frac{\partial^2 l_t}{\partial \alpha^2_0}$ $\displaystyle =$ $\displaystyle \frac{-1}{2\sigma^4_t} \left(\frac{2u_t^2}{\sigma^2_t}-1\right)
\cr \frac{\partial^2 l_t}{\partial \alpha^2_1}$  

The information matrix is simply the negative expectation of the Hessian average over all observations, that is to say,

$\displaystyle I_{\alpha \alpha} = -\frac{1}{T}\sum_{t=1}^T\textrm{E}\left[ \frac{\partial^2 l_t} {\partial \alpha\partial \alpha^\top } \vert I_{t-1} \right].$ (6.8)

Taking into account (6.3), the conditional expectations of the last terms is 1. Hence, to calculate the unconditional expectation of the Hessian matrix and, therefore, the information matrix, we approximate it by the average over all the conditional expectations. Then, $ I_{\alpha\alpha}$ is consistently estimated by

$\displaystyle \hat{I}_{\alpha_0\alpha_0}$ $\displaystyle =$ $\displaystyle \frac{1}{2T}\sum_{t=1}^T
\frac{1}{\hat\sigma_t^4}\cr \hat{I}_{\alpha_1\alpha_1}$ (6.9)

or, more generally,

$\displaystyle {\hat I}_{\alpha\alpha} = \frac{1}{2T}\sum_{t=1}^T \frac{z_t z_t^\top }{\hat\sigma^4_t}$ (6.10)

In practice, the maximum likelihood estimator is computed by numerical methods and, in particular gradient methods are preferred for their simplicity. They are iterative methods and at each step, we increase the likelihood by searching a step forward along the gradient direction. It is therefore desirable to construct the following iteration scheme, computing $ \alpha^{(k+1)}$ from $ \alpha^{(k)}$ by

$\displaystyle \alpha^{(k+1)}= \alpha^{(k)} + \lambda^{(k)} \big({\hat I}_{\alpha\alpha}^{(k)}\big)^{-1}\left(\frac{\partial L}{\partial \alpha}\right)^{(k)}$ (6.11)

where the step length $ \lambda^{(k)}$ is usually obtained by a one-dimensional search, (for details, see Bernt, Hall, Hall and Hausman; 1974)

6.2.2.0.2 Example

In the following example we show the joint sample distribution of the parameter estimates. To that end, we plot the marginal kernel density estimates of both $ \hat\alpha_0$ and $ \hat\alpha_1$ and the corresponding scatter plot with the regression line drawn (kernel density estimation is a valuable tool in data analysis and to explore the unknown density function of a continuous variable, (see Silverman; 1989)) . Data are obtained by simulating and estimating $ 100$ time series of size $ 400$ of the same model $ (\alpha_0=0.5, \alpha_1=0.5)$.

Figure: Kernel density of parameter estimators: In the top left hand panel for $ \hat\alpha_0$, in the bottom right hand panel for $ \hat\alpha_1$ and in the bottom left hand panel we have a scatter plot of parameter estimators
\includegraphics[width=0.59\defpicwidth]{archest.ps}

26270 XEGarch07.xpl

In figure 6.6 we can see, as the asymptotic theory states, that both sampling distributions approach a bivariate normal density with a small correlation between them.

6.2.2.0.3 Example

We will see the use of function 26275 archest in order to obtain the estimation of an ARCH(1) process. The data are simulated previously by function 26278 genarch (yt=genarch(0.5|0.5,0,500)). The function 26281 archest allows us estimate a general ARCH process (see section 6.6.1 for a complete description of this model). The result variable is a list with different information about the estimated model. For example, in the first element we have the parameter estimates and in the second element we have the standard error estimation for the estimates and , we easily obtain t-ratios values that show the statistical significance of the parameters.


Table 6.1: Estimates and t-ratio values from an ARCH(1) model
Parameter True Value Estimates t-ratio
$ \alpha_0$ 0.5 0.4986 10.034
$ \alpha_1$ 0.5 0.5491 6.9348


In this example, we see that the estimated parameters agree with the theoretical values and they have a very high t-ratio. In the third component of the list we have the likelihood and in the fourth component we have the estimated volatility for the model. For example, we can plot the time series and add two lines representing twice the squared root of the estimated volatility around the mean value of the time series, as you can see in figure 6.7,

Figure 6.7: Simulated time series with the volatility bands estimated from the ARCH(1) model.
\includegraphics[width=0.75\defpicwidth]{archvol.ps}