15.3 Estimating the volatility of financial time series

The locally time homogeneous approach appears to be also appropriate for the estimation of the volatility of financial time series. In order to provide some motivation we first describe the stylized facts of financial time series. Let $ S_t$ define the price process of a financial asset such as stocks or exchange rates, then the returns are defined as follows:

$\displaystyle R_t = \ln S_t - \ln S_{t-1}.
$

Stylized facts of financial asset returns are: a leptokurtic density, variance clustering and highly persistent autocorrelation of square and absolute returns (see Figure 15.6). Further details and examples on this topic can be found in Taylor (1986) and in Franke et al. (2001).

Figure: JPY/USD returns 30556 XFGretacf.xpl
\includegraphics[width=1.2\defpicwidth]{ret.ps} \includegraphics[width=1.2\defpicwidth]{acf.ps}


15.3.1 The standard approach

The returns of financial time series are usually modeled by the following equation:

$\displaystyle R_t = \sigma_t\varepsilon_t
$

Where $ \sigma_t$ is a strictly positive process, which describes the dynamics of the variance of $ R_t$, and $ \xi_t$ has a standard normal distribution: $ \xi_t\sim N(0,1)$. Standard parametric models of the volatility are of (G)ARCH type:

$\displaystyle \sigma_t^2 = \omega + \alpha R_{t-1}^2 + \beta \sigma_{t-1}^2,
$

like in Engle (1995) and Bollerslev (1995), and of stochastic volatility type:

$\displaystyle \ln \sigma_t^2 = \theta_0 + \theta_1 \ln \sigma_{t-1}^2 + \nu_t,
$

as described by Harvey et al. (1995). These models have been expanded in order to incorporate other characteristics of the financial return time series: TARCH, EGARCH and QARCH explicitly assume an asymmetric reaction of the volatility process to the sign of the observed returns, while IGARCH and FIGARCH model the long memory structure of the autocorrelations of the square returns.


15.3.2 The locally time homogeneous approach

A common feature to all the models which have been cited in the previous section is that they completely describe the volatility process by a finite set of parameters. The availability of very large samples of financial data has given the possibility of constructing models which display quite complicated parameterizations in order to explain all the observed stylized facts. Obviously those models rely on the assumption that the parametric structure of the process remains constant through the whole sample. This is a nontrivial and possibly dangerous assumption in particular as far as forecasting is concerned as pointed out in Clements and Hendry (1998). Furthermore checking for parameter instability becomes quite difficult if the model is nonlinear, and/or the number of parameters is large. Whereby those characteristics of the returns which are often explained by the long memory and (fractal) integrated nature of the volatility process, could also depend on the parameters being time varying. We want to suggest an alternative approach which relies on a locally time homogeneous parameterization, i.e. we assume that the volatility $ \sigma$ follows a jump process and is constant over some unknown interval of time homogeneity. The adaptive algorithm, which has been presented in the previous sections, also applies in this case; its aim consists in the data-driven estimation of the interval of time homogeneity, after which the estimate of the volatility can be simply obtained by local averaging.


15.3.3 Modeling volatility via power transformation

Let $ \, S_{t} \,$ be an observed asset process in discrete time, $ \, t=1,2,\ldots ,\tau \,$ and $ \, R_{t} \,$ are the corresponding returns: $ \, R_{t} = \log (S_{t}/S_{t-1}) \,$. We model this process via the conditional heteroscedasticity assumption

$\displaystyle R_{t} = \sigma_{t} \varepsilon_{t} \, ,$     (15.10)

where $ \, \varepsilon_{t} \,$, $ \, t \ge 1 \,$, is a sequence of independent standard Gaussian random variables and $ \, \sigma_{t} \,$ is the volatility process which is in general a predictable random process, that is, $ \, \sigma_{t}$ is measurable with respect to $ \mathcal{F}_{t-1} \,$ with $ \, \mathcal{F}_{t-1} = \sigma(R_{1},\ldots ,R_{t-1}) \,$.

The model equation (15.10) links the volatility $ \, \sigma_{t} \,$ with the observations $ \, R_{t} \,$ via the multiplicative errors $ \, \varepsilon_{t} \,$. In order to apply the theory presented in Section 15.1 we need a regression like model with additive errors. For this reason we consider the power transformation, which leads to a regression with additive noise and so that the noise is close to a Gaussian one, see Carroll and Ruppert (1988). Due to (15.10) the random variable $ \, R_{t} \,$ is conditionally on $ \, \mathcal{F}_{t-1} \,$ Gaussian and it holds

$\displaystyle \textrm{E}\left( R_{t}^{2} \vert \mathcal{F}_{t-1} \right) = \sigma_{t}^{2} .$      

Similarly, for every $ \, \gamma > 0 \,$,
$\displaystyle \textrm{E}\left( \bigl\vert R_{t} \bigr\vert^{\gamma} \vert \math...
...vert^{\gamma} \vert \mathcal{F}_{t-1} \right)
= C_{\gamma} \sigma_{t}^{\gamma},$      
$\displaystyle %%\nn
\textrm{E}\left(
\bigl\vert R_{t} \bigr\vert^{\gamma} - C_{...
...xi\vert^{\gamma} - C_{\gamma} \right)^{2}
= \sigma_{t}^{2\gamma} D_{\gamma}^{2}$      

where $ \, \xi \,$ denotes a standard Gaussian r.v., $ \, C_{\gamma} = \textrm{E}\vert\xi\vert^{\gamma} \,$ and $ \, D_{\gamma}^{2} = \textrm{Var}\vert\xi\vert^{\gamma} \,$. Therefore, the process $ \, \vert R_{t}\vert^{\gamma} \,$ allows for the representation
$\displaystyle \vert R_{t}\vert^{\gamma}
= C_{\gamma} \sigma_{t}^{\gamma} + D_{\gamma} \sigma_{t}^{\gamma} \zeta_{t} \, ,$     (15.11)

where $ \, \zeta_{t} $ is equal $ \, \left( \vert\xi\vert^{\gamma} - C_{\gamma} \right)/D_{\gamma} \,$. A suitable choice of the value of $ \gamma$ provides that the distribution of

$\displaystyle \left( \vert\xi\vert^{\gamma} - C_{\gamma} \right)/D_{\gamma}$

is close to the normal. In particular the value of $ \gamma = 0.5$ appears to be almost optimal, see Figure 15.7.

Figure: Normal and power transformed densities for $ \gamma = 0.5$. 30722 XFGpowtrans.xpl
\includegraphics[width=1.2\defpicwidth]{dens.ps}


15.3.4 Adaptive estimation under local time-homogeneity

The assumption of local time homogeneity means that the function $ \, \sigma_{t} \,$ is constant within an interval $ \, I = [\tau-m,\tau] \,$, and the process $ \, R_{t} \,$ follows the regression-like equation (15.11) with the constant trend $ \, \theta_{I} = C_{\gamma} \sigma_{I}^{\gamma} \,$ which can be estimated by averaging over this interval $ \, I \,$:

$\displaystyle \widehat{\theta}_{I} = \frac{1}{\vert I\vert} \sum_{t \in I} \vert R_{t}\vert^{\gamma} .$     (15.12)

By (15.11)
$\displaystyle \widehat{\theta}_{I} =
\frac{C_{\gamma} }{\vert I\vert} \sum_{t \...
...theta_{t}
+ \frac{s_{\gamma}}{\vert I\vert} \sum_{t \in I} \theta_{t} \zeta_{t}$     (15.13)

with $ \, s_{\gamma} = D_{\gamma} / C_{\gamma} \,$ so that
$\displaystyle \textrm{E}\widehat{\theta}_{I}$ $\displaystyle =$ $\displaystyle \textrm{E}\frac{1}{\vert I\vert} \sum_{t \in I} \theta_{t} \, ,$ (15.14)
$\displaystyle %%\Var \widehat{\theta}_{I}
\frac{s_{\gamma}^{2}}{\vert I\vert^{2}}
\textrm{E}\left( \sum_{t \in I} \theta_{t} \zeta_{t} \right)^{2}$ $\displaystyle =$ $\displaystyle \frac{s_{\gamma}^{2}}{\vert I\vert^{2}} \textrm{E}\sum_{t \in I} \theta_{t}^{2} .$ (15.15)

Define also
$\displaystyle v_{I}^{2} = \frac{s_{\gamma}^{2}}{\vert I\vert^{2}} \sum_{t \in I} \theta_{t}^{2} .$      

In view of (15.15) this value is called the conditional variance of $ \, \widehat{\theta}_{I} \,$. Under local homogeneity it holds $ \theta_{t}$ is constantly equal to $ \theta_{I}$ for $ \, t \in I \,$, and hence,
$\displaystyle \textrm{E}\widehat{\theta}_{I}$ $\displaystyle =$ $\displaystyle \theta_{I}$  
$\displaystyle v_{I}^{2}$ $\displaystyle =$ $\displaystyle \textrm{Var}\, \widehat{\theta}_{I}
=
\frac{s_{\gamma}^{2} \theta_{I}^{2}}{\vert I\vert}.$  

A probability bound analogous to the one in Section 15.1 holds also in this case. Let the volatility coefficient $ \, \sigma_{t} \,$ satisfy the condition $ b \le \sigma_{t}^{2} \le bB
$ with some constants $ \, b>0,\,B>1 \,$. Then there exists $ \, a_{\gamma} > 0 \,$ such that it holds for every $ \, \lambda \ge 0 \,$

$\displaystyle \textrm{P}\left( \vert\widehat{\theta}_{I} - \theta_{\tau}\vert >...
...e} \lambda (1 + \log B) \exp\left( - \frac{\lambda^{2}}{2 a_{\gamma}} \right) .$ (15.16)

The proof of the statement above and some related theoretical results can be found in Mercurio and Spokoiny (2000).

For practical application one has to substitute the unknown conditional standard deviation with its estimate: $ \widehat{v}_{I} = s_{\gamma} \widehat{\theta}_{I} \vert I\vert^{-1/2} .
$ Under the assumption of time homogeneity within an interval $ I = [\tau - m, \tau]$ equation (15.16) allows to bound $ \, \vert\widehat{\theta}_{I} - \widehat{\theta}_{J}\vert \,$ by $ \, \lambda \widehat{v}_{I} + \mu \widehat{v}_{J} \,$ for any $ J\subset I$, provided that $ \, \lambda \,$ and $ \mu$ are sufficiently large. Therefore we can apply the same algorithm described in Section 15.1 in order to estimate the largest interval of time homogeneity and the related value of $ \widehat\theta_\tau$. Here, as in the previous section, we are faced with the choice of three tuning parameters: $ m_0$, $ \lambda$, and $ \mu$. Simulation studies and repeated trying on real data by Mercurio and Spokoiny (2000) have shown that the choice of $ m_0$ is not particularly critical and it can be selected between 10 and 50 without affecting the overall results of the procedure.

As described in Section 15.2.2, the choice of $ \lambda$ and $ \mu$ is more delicate. The influence of $ \lambda$ and $ \mu$ is similar to the one of the smoothing parameters in the nonparametric regression. The likelihood of rejecting a time homogeneous interval decreases with increasing $ \lambda$ and/or $ \mu$. This is clear from equation (15.6). Therefore if $ \lambda$ and $ \mu$ are too large this would make the algorithm too conservative, increasing the bias of the estimator, while too small values of $ \lambda$ and $ \mu$ would lead to a frequent rejection and to a high variability of the estimate. Once again, a way of choosing the optimal values of $ \lambda$ and $ \mu$ can be made through the minimization of the squared forecast error. One has to define a finite set $ \mathcal{S}$ of the admissible pair of $ \lambda$ and $ \mu$. Then for each pair belonging to $ \mathcal{S}$ one can compute the corresponding estimate: $ \, \widehat{\theta}_{t}^{(\lambda,\mu)} \,$ and then select the optimal pair and the corresponding estimate by the following criterion:

$\displaystyle (\widehat{\lambda},\widehat{\mu}) =
\min_{\lambda,\mu\in \mathcal...
...( \vert R_{t}\vert^{\gamma} - \widehat{\theta}_{t}^{(\lambda,\mu)} \right)^{2}.$      

Figure 15.8 shows the result of the on-line estimation of the locally time homogeneous volatility model for the JPY/USD exchange rate. The bottom plot, in particular, shows the estimated length of the interval of time homogeneity: $ \widehat m$, at each time point.

Figure: From the top: returns, estimated locally time homogeneous volatility and estimated length of the interval of time homogeneity. 30975 XFGlochom.xpl
\includegraphics[width=1.1\defpicwidth]{jpydem.ps} \includegraphics[width=1.1\defpicwidth]{vol.ps} \includegraphics[width=1.1\defpicwidth]{hom.ps}