Stochastic volatility (SV) models may be used as an alternative to generalized autoregressive conditonal heteroskedastic (GARCH) models as a way to model the time-varying volatility of asset returns. Time series of asset returns feature stylized facts, the most important being volatility clustering, which produces a slowly decreasing positive autocorrelation function of the squared returns, starting at a low value (about ). Another stylized fact is excess kurtosis of the distribution (with respect to the Gaussian distribution). See [7] for a detailed list of the stylized facts and a survey of GARCH models, [47] for a comparative survey of GARCH and SV models, and [26] for a survey of SV models focused on their theoretical foundations and their applications in finance. The first four parts of this section deal with SV models while in Sect. 2.3.5 we survey similar models for dynamic duration analysis.
The simplest version of a SV model is given by
For further use, let and denote the vectors of observed returns and unobserved log-volatilities, respectively.
Estimation of the parameters of the canonical SV model may be done by the maximum likelihood (ML) method or by Bayesian inference. Other methods have been used but they will not be considered here. We refer to [26], Sect. 5, for a review. ML and, in principle, Bayesian estimation require to compute the likelihood function of an observed sample, which is a difficult task. Indeed, the density of given and an initial condition (not explicitly written in the following expressions) requires to compute a multiple integral which has a dimension equal to the sample size:
Two methods directly approximate (2.22): efficient importance sampling (EIS), and Monte Carlo maximum likelihood (MCML). Another approach, which can only be used for Bayesian inference, works with as data density, and produces a posterior joint density for given . The posterior density is simulated by a Monte Carlo Markov chain (MCMC) algorithm, which produces simulated values of and . Posterior moments and marginal densities of are then estimated by their simulated sample counterparts. We pursue by describing each method.
A look at (2.23) suggests to sample sequences , , and to approximate it by . This direct method proves to be inefficient. Intuitively, the sampled sequences of are not linked to the observations . To improve upon this, the integral (2.22), which is the convenient expression to present EIS, is expressed as
(2.24) |
A convenient choice for is the Gaussian family of distributions. A Gaussian approximation to , as a function of , given and , turns out to be efficient. It can be expressed as proportional to , where , the auxiliary parameters. It is convenient to multiply it with 0.5 , where , which comes from the term included in . The product of these two exponential functions can be expressed as a Gaussian density , where
(2.26) |
The choice of the auxiliary parameters can be split into separate problems, one for each . It amounts to minimize the sum of squared deviations between plus a correction term, see (2.27), and where is an auxiliary intercept term. This problem is easily solved by ordinary least squares. See [36] for a detailed explanation.
Let us summarize the core of the EIS algorithm in three steps (for given and ):
Step 1: Generate trajectories using the `natural' samplers .
Step 2: For each (starting from and ending at ), using the observations generated in the previous step, estimate by OLS the regression
Step 3: Generate trajectories using the efficient samplers and finally compute (2.25).
Steps 1 to 3 should be iterated about five times to improve the efficiency of the approximations. This is done by replacing the natural sampler in Step 1 by the importance functions built in the previous iteration. It is also possible to start Step 1 of the first iteration with a more efficient sampler than the natural one. This is achieved by multiplying the natural sampler by a normal approximation to 0.5 . The normal approximation is based on a second-order Taylor series expansion of the argument of the exponential in the previous expression around . In this way, the initial importance sampler links and . This enables one to reduce to three (instead of five) the number of iterations over the three steps. In practical implementations, can be fixed to . When computing (2.25) for different values of , such as in a numerical optimizer, it is important to use common random numbers to generate the set of trajectories that serve in the computations.
It is also easy to compute by EIS filtered estimates of functions of , such as the conditional standard deviation , conditional on the past returns (but not on the lagged unobserved ), given a value of (such as the ML estimate). Diagnostics on the model specification are then obtained as a byproduct: if the model is correctly specified, divided by the filtered estimates of is a residual that has zero mean, unit variance, and is serially uncorrelated (this also holds for the squared residual).
[43] contains a general presentation of EIS and its properties.
The likelihood to be computed at (the data) and any given is equal to and is conveniently expressed as (2.21) for this method. This quantity is approximated by importance sampling with an importance function defined from an approximating model. The latter is obtained by using the state space representation of the canonical SV model (parametrized with ):
In brief, the likelihood function is approximated by
For SML estimation, the approximation in (2.32) is transformed in logarithm. This induces a bias since the expectation of the log of the sample mean is not the log of the corresponding integral in (2.31). The bias is corrected by adding to the log of (2.32), where is the sample variance of the ratios and is the sample mean of the same ratios, i.e. is the sample mean appearing in (2.32). Moreover, [20] use antithetic and control variables to improve the efficiency of the estimator of the log-likelihood function.
[21] present several generalizations of MCML (e.g. the case where the state variable in non-Gaussian) and develop analogous methods for Bayesian inference.
We present briefly the `Mixture Sampler', one of the three algorithms added by [35] to the six algorithms already in the literature at that time (see their paper for references). They approximate the density of by a finite mixture of seven Gaussian densities, such that in particular the first four moments of both densities are equal. The approximating density can be written as
The crux of the algorithm is to add to and in the MCMC sampling space. This makes it possible to sample , and within a Gibbs sampling algorithm. Remark that and are independent given and . Moreover, can be sampled entirely as a vector. The intuition behind this property is that, once is known, the relevant term of the mixture (2.33) is known for each observation, and since this is a Gaussian density, the whole apparatus of the Kalman filter can be used. Actually, this a bit more involved since the relevant Gaussian density depends on , but an augmented Kalman filter is available for this case.
Sampling as one block is a big progress over previous algorithms, such as in [33], where each element is sampled individually given the other elements of (plus and ). The slow convergence of such algorithms is due to the high correlations between the elements of .
[35] write the model in state space form, using rather than or as a parameter, i.e.
[35] also propose an algorithm to compute filtered estimates of , from which model diagnostics can be obtained as described above for EIS.
For illustration, estimates of the canonical SV model parameters are reported in Table 2.8 for a series of centred daily returns of the Standard and Poor's 500 (SP500) composite price index (period: 02/01/80-30/05/03, source: Datastream). Returns are computed as 100 times the log of the price ratios. The sample mean and standard deviation are equal to and , respectively.
EIS () | MCML () | MCMC () | ||||
() | () | () | ||||
() | () | () | ||||
() | () | () | ||||
llf | ||||||
Time | 2.36 min | 7.56 min | 6.23 min | |||
Code | Gauss | Ox | Ox |
We used computer codes provided by the authors cited above. For EIS, we received the code from R. Liesenfeld, for MCML and MCMC we downloaded them from the web site staff.feweb.vu.nl/koopman/sv.
For SML estimation by EIS or MCML, identical initial values ( , , or ) and optimization algorithms (BFGS) are used, but in different programming environments. Therefore, the computing times are not fully comparable, although a careful rule of thumb is that Ox is two to three times faster than Gauss (see [15]). Reported execution times imply that EIS appears to be at least six times faster than MCML. This is a reversal of a result reported by Sandman and Koopman (1998, p.289), but they compared MCML with a precursor of EIS implemented by [16]. More importantly, the two methods deliver quasi-identical results.
MCMC results are based on draws after dropping initial draws. The posterior means and standard deviations are also quite close to the ML results. The posterior density of (computed by kernel estimation) is shown in Fig. 2.1 together with the large sample normal approximation to the density of the ML estimator using the EIS results. The execution time for MCMC is difficult to compare with the other methods since it depends on the number of Monte Carlo draws. It is however quite competitive since reliable results are obtained in no more time than MCML in this example.
The canonical model presented in (2.19) is too restrictive to fit the excess kurtosis of many return series. Typically, the residuals of the model reveal that the distribution of has fatter tails than the Gaussian distribution. The assumption of normality is most often replaced by the assumption that , which denotes Student- distribution with zero mean, unit variance, and degrees of freedom parameter . SML estimates of are usually between and for stock and foreign currency returns using daily data. Posterior means are larger because the posterior density of has a long tail to the right.
Several other extensions of the simple SV model presented in (2.19) exist in the literature. The mean of need not be zero and may be a function of explanatory variables (often a lag of and an intercept term). Similarly may be a function of observable variables () in addition to its own lags. An extended model along these lines is
It should be obvious that all these extensions are very easy to incorporate in EIS (see [36]) and MCML (see [46]). Bayesian estimation by MCMC remains quite usable but becomes more demanding in research time to tailor the algorithm for achieving a good level of efficiency of the Markov chain (see [12], in particular p 301-302, for such comments).
[12] also include a jump component term in the conditional mean part to allow for irregular, transient movements in returns. The random variable is equal to with unknown probablity and zero with probability , whereas is the size of the jump when it occurs. These time-varying jump sizes are assumed independent draws of 0.5 , being an unknown parameter representing the standard deviation of the jump size. For daily SP500 returns (period: 03/07/1962-26/08/1997) and a Student- density for , [12] report posterior means of for , and for (for prior means of and , respectively). This implies that a jump occurs on average every days, and that the variability of the jump size is on average . They also find that the removal of the jump component from the model reduces the posterior mean of from to , which corresponds to the fact that the jumps capture some outliers.
Another extension consists of relaxing the restriction of zero correlation between and . This may be useful for stock returns for which a negative correlation corresponds to the leverage effect of the financial literature. If the correlation is negative, a drop of , interpreted as a negative shock on the return, tends to increase and therefore . Hence volatility increases more after a negative shock than after a positive shock of the same absolute value, which is a well-known stylized fact. [46] estimate such a model by MCML, and report for daily returns of the SP500 index (period: 02/01/80-30/12/87), while [34] do it by Bayesian inference using MCMC and report a posterior mean of equal to on the same data. They use the same reparametrization as in (2.13) to impose that the first diagonal element of the covariance matrix of and must be equal to . This covariance matrix is given by
Multivariate SV models are also on the agenda of researchers. [36] estimate by EIS a one-factor model introduced by [47], using return series of four currencies. [35], Sect. 6.6, explain how to deal with the multi-factor model case by extending the MCMC algorithm reviewed in Sect. 2.3.2.
Models akin to the SV model have been used for dynamic duration analysis by [5] and [3]. The context of application is the analysis of a sequence of time spells between events (also called durations) occurring on stock trading systems like the New York Stock Exchange (NYSE). Time stamps of trades are recorded for each stock on the market during trading hours every day, resulting in an ordered series of durations. Marks, such as the price, the exchanged quantity, the prevailing bid and ask prices, and other observed features may also be available, enabling to relate the durations to the marks in a statistical model. See [2] for a presentation of the issues.
Let denote the arrival times and denote the corresponding durations, i.e. . The stochastic conditional duration (SCD) model of [5] is defined as
The similarity with the canonical SV model (2.19) is striking. A major difference is the non-normality of since this is by definition a positive random variable. This feature makes it possible to identify . Therefore, the estimation methods available for the SV model can be adapted to the estimation of SCD models. [5] have estimated the SCD model by the quasi-maximum likelihood (QML) method, since the first equation of the model may be expressed as . If were Gaussian, the model would be a Gaussian linear state space model and the Kalman filter could be directly applied. QML relies on maximizing the likelihood function as if were Gaussian. The QML estimator is known to be consistent but inefficient relative to the ML estimator which would obtain if the correct distribution of were used to define the likelihood function. [24], Chap. 3, has studied by simulation the loss of efficiency of QML relative to ML. ML estimation assuming a Weibull distribution is done by applying the EIS algorithm. For a sample size of observations, the efficiency loss ranges from to , except for the parameter , for which it is very small. He also applied the EIS method using the same data as in [5]. For example, for a dataset of volume durations of the Boeing stock (period: September-November 1996; source: TAQ database of NYSE), the ML estimates are: , , , . They imply a high persistence in the conditional mean process (corresponding to duration clustering), a Weibull distribution with an increasing concave hazard function, and substantial heterogeneity. Notice that an interesting feature of the SCD model is that the distribution of conditional to the past information, but marginalized with respect to the latent process, is a Weibull mixed by a lognormal distribution.
[48] have designed a MCMC algorithm for the SCD model (2.38) assuming a standard exponential distribution for . The design of their MCMC algorithm borrows features from Koopman and Durbin's MCML approach and one of the MCMC algorithms used for the SV model.
As an alternative to modelling the sequence of durations, [3] model directly the arrival times through the intensity function of the point process. Their model specifies a dynamic intensity function, where the intensity function is the product of two intensity components: an observable component that depends on past arrival times, and a latent component. The logarithm of the latter is a Gaussian autoregressive process similar to the second equation in (2.19) and (2.38). The observable component may be a Hawkes process ([31]) or an autoregressive intensity model ([45]). When the model is multivariate, there is an observable intensity component specific to each particular point process, while the latent component is common to all particular processes. Interactions between the processes occur through the observable components and through the common component. The latter induces similar dynamics in the particular processes, reflecting the impact of a common factor influencing all processes. Bauwens and Hautsch use intensity-based likelihood inference, with the EIS algorithm to deal with the latent component.