13.3 Multivariate GARCH models

The generalization of univariate GARCH models to the multivariate case is straightforward. For the error term $ \varepsilon_t$ of a $ d$-dimensional time series model we assume that the conditional mean is zero and the conditional covariance matrix is given by the positive definite $ (d \times d)$ matrix $ H_t$, i.e.,

$\displaystyle \varepsilon_t=H_t^{1/2} \xi_t$ (13.36)

with i.i.d. innovation vector $ \xi_t$, whose mean is zero and covariance matrix equals the identity matrix $ I_d$. As in the univariate case, $ H_t$ depends on lagged error terms $ \varepsilon_{t-i}, \, i=1,\ldots,q,$ and on lagged conditional covariance matrices $ H_{t-i}, \, i=1,\ldots,p,$. As we will see shortly, the general case with arbitrary dependencies can lead to very complex structures that may be too difficult to deal with in practice. It is therefore often tried to reduce the dimension of the parameter space. In the following, we first discuss a general specification and then a popular restriction, the BEKK model. We will also briefly sketch a model that assumes constant conditional correlations.

13.3.1 The Vec Specification

Let vech$ (\cdot)$ denote the operator that stacks the lower triangular part of a symmetric $ d \times d$ matrix into a $ d^*=d(d+1)/2$ dimensional vector. Furthermore we use the notation $ h_t =$   vech$ (H_{t})$ and $ \eta_t=$vech$ (\varepsilon_t\varepsilon_t^\top )$. The Vec specification of a multivariate GARCH($ p,q$) model is then given by

$\displaystyle h_t = \omega +\sum_{i=1}^{q} A_{i} \eta_{t-i} + \sum_{j=1}^{p} B_{j} h_{t-j},$ (13.37)

where $ A_{i}$ and $ B_{j}$ are parameter matrices with each one containing $ (d^*)^2$ parameters. The vector $ \omega$ represents constant components of the covariances and contains $ d^*$ parameters.

For the bivariate case and $ p=q=1$ we can write the model explicitly as

$\displaystyle \left(\begin{array}{c} h_{11,t} \\ h_{12,t} \\ h_{22,t}
\end{arra...
...psilon_{1,t-1}\varepsilon_{2,t-1} \\
\varepsilon_{2,t-1}^2
\end{array}\right)
$

$\displaystyle +\left(\begin{array}{ccc}
b_{11} & b_{12} & b_{13} \\
b_{21} & b...
...eft(\begin{array}{c} h_{11,t-1} \\ h_{12,t-1} \\ h_{22,t-1}
\end{array}\right)
$

By rearranging terms, we can write the second order process $ \eta_t$ as a vector autoregressive moving average (VARMA) process of order ( $ \max(p,q),p$),

$\displaystyle \eta_t = \omega + \sum_{i=1}^{\max(p,q)} (A_i + B_i) \eta_{t-i} - \sum_{j=1}^p B_j u_{t-j} + u_t,$ (13.38)

where $ u_t = \eta_t - h_t$ is a vector white noise process, i.e., $ \mathop{\text{\rm\sf E}}[u_t]=0$, $ \mathop{\text{\rm\sf E}}[u_t u_t^\top ]=\Sigma_u$ und $ \mathop{\text{\rm\sf E}}[u_t
u_s^\top ]=0$, $ s \neq t$. In (12.38) we set $ A_{q+1}=\ldots=A_p=0$ if $ p>q$ and $ B_{p+1}=\ldots=B_q=0$ if $ q>p$. Often the VARMA representation of multivariate GARCH models simplifies the derivation of stochastic properties, as one can refer to known results of the VARMA literature.

In the Vec representation (12.37), the multivariate GARCH($ p,q$) process $ \varepsilon_t$ is covariance stationary if and only if all eigenvalues of the matrix

$\displaystyle \sum_{i=1}^{\max(p,q)} (A_i + B_i)$

are smaller than one in modulus, see Engle and Kroner (1995). In that case, the unconditional covariance matrix is given by

$\displaystyle \sigma =$   vech$\displaystyle (\Sigma) = \left(I_{d^*} - \sum_{i=1}^{\max(p,q)} (A_i + B_i) \right)^{-1} \omega.$ (13.39)

In order to illustrate the prediction of volatility, let us consider in the following the often used GARCH(1,1) model. The optimal prediction with respect to the mean squared prediction error is the conditional expectation of volatility. Due to the law of iterated expectations, the $ k$-step prediction of $ \eta_{t+k}$ is identical to the $ k$-step prediction of $ h_{t+k}$, that is,

$\displaystyle \mathop{\text{\rm\sf E}}[\eta_{t+k} \mid {\cal F}_{t}] = \mathop{...
...k-1})\mid {\cal F}_{t}] = \mathop{\text{\rm\sf E}}[h_{t+k} \mid {\cal F}_{t}].
$

Having information up to time $ t$, the predictions for the next three time periods are given by
$\displaystyle \mathop{\text{\rm\sf E}}[\eta_{t+1} \mid {\cal F}_{t}]$ $\displaystyle =$ $\displaystyle h_{t+1}$  
$\displaystyle \mathop{\text{\rm\sf E}}[\eta_{t+2} \mid {\cal F}_{t}]$ $\displaystyle =$ $\displaystyle \omega + (A+B)h_{t+1}$  
$\displaystyle \mathop{\text{\rm\sf E}}[\eta_{t+3} \mid {\cal F}_{t}]$ $\displaystyle =$ $\displaystyle (I_{d^*}+A+B)\omega + (A+B)^2
h_{t+1},$  

and it can be seen that in general, the $ k$-step prediction with $ k \ge 2$ is given by

$\displaystyle \mathop{\text{\rm\sf E}}[\eta_{t+k} \mid {\cal F}_{t}] =
\left\{I_{d^*}+(A+B)+\ldots+(A+B)^{k-2}\right\}\omega +
(A+B)^{k-1} h_{t+1}.
$

This converges to the unconditional covariance matrix $ \sigma=(I_{d^*}-A-B)^{-1}\omega$ if and only if the process is covariance stationary.

In the bivariate case ($ d=2$) and with $ p=q=1$, there are already 21 parameters that characterize the dynamics of volatility. In order to obtain a feasible model for empirical work, one often imposes restrictions on the parameter matrices of the Vec model. Bollerslev et al. (1988) propose to use diagonal parameter matrices such that the conditional variance of one variable only depends on lagged squared values of the same variable, and the conditional covariances between two variables only depend on lagged values of cross-products of these variables. This model reduces substantially the number of parameters (in the above case from 21 to 9), but potentially important causalities are excluded.

For parameter estimation the Quasi Maximum Likelihood Method (QML) is suitable. The conditional likelihood function for a sample time series of $ n$ observations is given by $ \log L = \sum_{t=1}^n l_t$ with

$\displaystyle l_t = -\frac{d}{2} \log(2 \pi) - \frac{1}{2} \log \{$det$\displaystyle (H_{t})\} - \frac{1}{2} \varepsilon_{t}^\top H_{t}^{-1} \varepsilon_{t}.$ (13.40)

If the conditional distribution of $ \varepsilon_t$ is not normal, then (12.40) is interpreted as quasi likelihood function, which serves merely as target function in the numerical optimization, but which does not say anything about the true distribution. In the multivariate case, the QML estimator is consistent and asymptotically normal under the main assumptions that the considered process is strictly stationary and ergodic with finite eighth moment. Writing all parameters in one vector, $ \theta$, we obtain the following standard result.

$\displaystyle \sqrt{n}(\hat{\theta}-\theta) \stackrel{\cal{L}}{\rightarrow}$   N$\displaystyle (0,J^{-1}I J^{-1}),$ (13.41)

where $ I$ is the expectation of outer product of the score vector (i.e., the vector $ \partial l_t/\partial \theta$), and $ J$ the negative expectation of the Hessian (i.e., the matrix of second derivatives). In the case of a normal distribution, we have $ I=J$ and the asymptotic distribution simplifies to

$\displaystyle \sqrt{n}(\hat{\theta}-\theta) \stackrel{\cal{L}}{\rightarrow}$   N$\displaystyle (0,J^{-1}).$ (13.42)

In other words, these results are completely analogous to the univariate case, but the analytical expressions for $ I$ and $ J$ become much more complicated. Of course one can also determine $ I$ and $ J$ numerically, but this can lead to unreliable results, especially for $ J$, in the multivariate case.

In empirical work one often finds that estimated standardized residuals are not normally distributed. In this case the QML likelihood function would be misspecified and provides only consistent, not efficient parameter estimators. Alternatively, one can assume that the true innovation distribution is given by some specific non-normal parametric distribution, but in general this does not guarantee that parameter estimates are consistent in the case that the assumption is wrong.

13.3.2 Die BEKK Spezifikation

Engle and Kroner (1995) discuss the following specification of a multivariate GARCH model.

$\displaystyle H_{t} = C_{0}C_{0}^\top + \sum_{k=1}^{K} \sum_{i=1}^{q} A_{ki}^\t...
...n_{t-i}^\top A_{ki} + \sum_{k=1}^{K} \sum_{j=1}^{p} B_{kj}^\top H_{t-j} B_{kj}.$ (13.43)

In (12.43), $ C_{0}$ is a lower triangular matrix and $ A_{ki}$ and $ B_{ki}$ are $ d \times d$ parameter matrices. For example, in the bivariate case with $ K=1$, $ p=1$ and $ q=0$, the conditional variance of $ \varepsilon_{1t}$ can be written as

$\displaystyle h_{11,t}=c_{11}^2 + a_{11}^2\varepsilon_{1t}^2 +
a_{12}^2\varepsilon_{2t}^2+2a_{11}a_{12}\varepsilon_{1t}\varepsilon_{2t}
$

and the conditional covariance as

$\displaystyle h_{12,t}=c_{11}c_{21} + a_{11}a_{21}\varepsilon_{1t}^2 +
a_{12}a_...
...\varepsilon_{2t}^2+(a_{12}a_{21}+a_{11}a_{22})\varepsilon_{1t}\varepsilon_{2t}
$

The so-called BEKK specification in (12.43) guarantees under weak assumptions that $ H_t$ is positive definite. A sufficient condition for positivity is for example that at least one of the matrices $ C_0$ or $ B_{ki}$ have full rank and the matrices $ H_0, \ldots, H_{1-p}$ are positive definite. The BEKK model allows for dependence of conditional variances of one variable on the lagged values of another variable, so that causalities in variances can be modelled. For the case of diagonal parameter matrices $ A_{ki}$ and $ B_{ki}$, the BEKK model is a restricted version of the Vec model with diagonal matrices.

Due to the quadratic form of the BEKK model, the parameters are not identifiable without further restriction. However, simple sign restrictions will give identifiability. For example, in the often used model $ K=1$ and $ p=q=1$, it suffices to assume that the upper left elements of $ A_{11}$ and $ B_{11}$ are positive. The number of parameters reduces typically strongly when compared to the Vec model. For the above mentioned case, the number of parameters reduces from 21 to 11.

For each BEKK model there is an equivalent Vec representation, but not vice versa, so that the BEKK model is a special case of the Vec model. To see this, just apply the vech operator to both sides of (12.43) and define $ \omega = L_d(C_0
\otimes C_0)^\top D_d$vech$ (I_d)$, $ A_i = \sum_{k=1}^K L_d
(A_{ki} \otimes A_{ki})^\top D_d$, and $ B_j = \sum_{k=1}^K
L_d(B_{kj} \otimes B_{kj})^\top D_d$. Here $ \otimes$ denotes the Kronecker matrix product, and $ L_d$ and $ D_d$ are the elementary elimination and duplication matrices. Therefore, one can derive the stochastic properties of the BEKK model by those of the Vec model. For the empirical work, the BEKK model will be preferable, because it is much easier to estimate while being sufficiently general.

13.3.3 The CCC model

Bollerslev (1990) suggested a multivariate GARCH model in which all conditional correlation are constant and the conditional variances are modelled by univariate GARCH models. This so-called CCC model (constant conditional correlation) is not a special case of the Vec model, but belongs to another, nonlinear model class. For example, the CCC(1,1) model is given by

$\displaystyle h_{ii,t}=\omega_{i}+\alpha_i \varepsilon_{t-1}^2 + \beta_i
h_{ii,t-1},
$

$\displaystyle h_{ij,t}= \rho_{ij}\sqrt{h_{ii,t}h_{jj,t}}
$

for $ i,j=1\ldots,d,$ and $ \rho_{ij}$ equal to the constant correlation between $ \varepsilon_{it}$ and $ \varepsilon_{jt}$, which can be estimated separately from the conditional variances. The advantage of the CCC model is in the unrestricted applicability for large systems of time series. On the other hand, the assumption of constant correlation is possibly quite restrictive. For example, in the empirical analysis of financial markets one typically observes increasing correlation in times of crisis or in crash situations.

13.3.4 An empirical illustration

We consider a bivariate exchange rates example, two European currencies, DEM and GBP, with respect to the US Dollar. The sample period is 01/01/1980 to 04/01/1994 with altogether $ n = 3720$ observations. Figure 12.9 shows the time series of returns on both exchange rates. Table 12.4 provides some simple descriptive statistics of returns $ \varepsilon_t$. Apparently, the empirical mean of both processes is close to zero.


Table: 22692 SFEmvol01.xpl
  Min. Max. Mean Median Std.Error
DEM/USD $ -0.040$ 0.032 $ -4.718e-06$ 0 0.0071
GBP/USD $ -0.047$ 0.039 0.000110 0 0.0070


Fig.: Exchange rate returns. 22696 SFEmvol01.xpl
\includegraphics[width=1\defpicwidth]{mvolmfxrate.ps}

As can be seen in Figure 12.9, the exchange rate returns follow a pattern that resembles a GARCH process: there is a clustering of volatilities in both series, and the cluster tend to occur simultaneously. This motivates an application of a bivariate GARCH model.

A first simple method to estimated the parameters of a BEKK model is the BHHH algorithm. This algorithm uses the first derivatives of the QML likelihood with respect to the 11 parameters that are contained in $ C_0, \, A_{11}$ and $ G_{11}$, recalling equation (12.43). As this is an iterative procedure, the BHHH algorithm needs suitable initial parameters. For the diagonal elements of the matrices $ A_{11}$ and $ B_{11}$, values between 0.3 and 0.9 are sensible, because this is the range often obtained in estimations. For the off-diagonal elements there is no rule of thumb, so one can try different starting values or just set them to zero. The starting values for $ C_0$ can be obtained by the starting values for $ A_{11}$ and $ B_{11}$ using the formula for the unconditional covariance matrix and matching the sample covariance matrix with the theoretical version.

For the bivariate exchange rate example, we obtain the following estimates:

$\displaystyle \hat{\theta}=\left(\begin{array}{r}
0.00115 \\
0.00031 \\
0.000...
...5\\
0.29344 \\
0.93878 \\
0.02512 \\
0.02750 \\
0.93910
\end{array}\right)$

$\displaystyle l_t = -28599 $

22699 SFEmvol02.xpl

the previous value represents the computed minimum of the negative log likelihood function. The displayed vector contains in the first three components the parameters in $ C_0$, the next four components are the parameters in $ A_{11}$, and the last four components are the parameters in $ B_{11}$.

In this example we thus obtain as estimated parameters of the BEKK model:

$\displaystyle C_0 = 10^{-3} \begin{pmatrix}1.15&0.31\\ 0.00&0.76\end{pmatrix},$    
$\displaystyle A_{11}= \begin{pmatrix}{\hspace*{0.2cm}0.282}&-0.050\\ -0.057&{\h...
...\end{pmatrix},\, B_{11}= \begin{pmatrix}0.939&0.028\\ 0.025&0.939\end{pmatrix}.$ (13.44)

Estimates for the conditional covariances are obtained by applying successively the difference equation (12.43), where the empirical covariance matrix

$\displaystyle \hat{H}_0=\frac{1}{n}\sum_{t=1}^n\varepsilon_t\varepsilon_t^\top $

of the observations $ \varepsilon_t$ is taken as initial value.

In Figure 12.10 estimated conditional variance and covariance processes are compared. The upper and lower plots show the variance of the DEM/USD and GBP/USD returns and the plot in the middle shows the estimated conditional covariance process. Apart from a very short period at the beginning of the sample, the covariance is positive and of not negligible magnitude.

Fig.: Estimated variance and covariance processes, $ 10^5\hat{H_t}$. 22703 SFEmvol02.xpl
\includegraphics[width=1.3\defpicwidth]{mvolmcovar.ps}

Fig.: Simulated variance and covariance processes with a bivariate (blue) and two univariate (green) GARCH processes, $ 10^5\hat{H_t}$. 22707 SFEmvol03.xpl
\includegraphics[width=1.4\defpicwidth]{mvolmcovarsimul.ps}

This confirms our intuition of mutual dependence in exchange markets which motivated the use of the bivariate GARCH model.

22710 SFEmvol03.xpl

The estimated parameters can also be used to simulate volatility. This can be done by drawing at every time step one realization of a multivariate normal distribution with mean zero and variance $ \hat{H}_t$. With these realizations one updates $ \hat{H}_t$ according to equation (12.43). Next, a new realization is obtained by drawing from N$ (0,\hat{H}_{t+1})$, and so on. We will now apply this method with $ n=3000$. The results of the simulation in Figure 12.11 show similar patterns as in the original process (Figure 12.10). For a further comparison, we include two independent univariate GARCH processes fitted to the two exchange rate return series. This corresponds to a bivariate Vec representation with diagonal parameter matrices. Obviously, both methods capture the clustering of volatilities. However, the more general bivariate model also captures spill over effect, that is, the increased uncertainty in one of the returns due to increased volatility in the other returns. This has an important impact on the amplitude of volatility.