18.4 Extreme Value Theory for Time Series

Let $ Z_t,\ -\infty < t < \infty,$ be a strictly stationary time series, as defined in Definition 10.6, that is the distribution of the data and its probability structure does not change over time. Each single observation $ Z_t$ has, among other things, the same distribution function $ F$. To compare consider the i.i.d. random variables $ X_1, X_2, \ldots $ with the same distribution $ F$. Let $ M_n = \max \{ Z_1, \ldots, Z_n\},\ M_n^ x =
\max\{ X_1, \ldots, X_n\}$ be the maxima of $ n$ values from the time series respectively from $ n$ independent observations. A simple but basic relationship for the previous sections is (17.1), i.e.,

$\displaystyle {\P}(M_n^ x \le y) = \{ {\P}(X_j \le y) \}^ n = F^ n (y), $

where the independence of $ X_t$ is used. For dependent data this relationship does not hold and the distribution of the maximum $ M_n$ is not determined by $ F$ alone, but rather from the complete distribution of the time series. Luckily in many cases there is at least one comparable, approximate relationship:

$\displaystyle {\P}(M_n \le y) \approx F^ {n\delta} (y) \ge F^ n (y)$   for large $\displaystyle n, $

where $ \delta \in [0,1]$ is the so called extremal index. In order to find an exact definition, recall Theorem 17.2 for the independent case, whereby
    $\displaystyle n\overline{F} (u_n) \to \tau$  
if and only if   $\displaystyle {\P}(M_n^ x \le u_n) \to e^ {-\tau}.$  

Definition 18.14 (Extremal Index)  
$ \delta \in [0,1]$ is called the extremal index of the time series $ Z_j,\ -\infty < j < \infty,$ when for certain $ \tau, u_n$

$\displaystyle n\overline{F} (u_n) \to \tau$   and$\displaystyle \quad {\P}(M_n \le
u_n) \to e^ {- \delta \tau}. $

(If $ \delta$ exists, then the value does not depend on the specific choice of $ \tau, u_n$).

From the definition the above claimed approximate relationship between the distribution of the maximum and the exceedance probability immediately follows:

$\displaystyle {\P}(M_n \le u_n) \approx e^ {-\delta \tau} \approx e^ {-\delta n...
...)^ {n\delta}
\approx (1 - \overline{F} (u_n))^ {n\delta} = F^ {n\delta} (u_n),
$

when $ u_n$ is large and thus $ \overline{F}(u_n) \approx 0$.

Pure white noise automatically has the extremal index $ \delta =1,$ since $ Z_t$ here are independent. It is not obvious that all ARMA$ (p,q)$ processes (see Chapter 11) with normally distributed innovations also have an extremal index $ \delta=1$, its maxima thus behave like maxima from independent data. Intuitively this comes from, on the one hand, ARMA processes having an exponentially decreasing memory, i.e., the observations $ Z_t, Z_{t+\tau}$ are for sufficiently large time periods $ \tau$ practically independent, and, on the other hand, the probability of two extreme observations occurring within the same time interval, which is not too long, is small. These qualitative statements can be formulated as two precise criteria of time series that have an extremal index of 1, the exact formulation of which will not be given here.

For financial time series models the second condition is not fulfilled, because they contradict the presence of volatility clusters (see Chapter 12), i.e., the local frequency of extreme observations. The extremal index of an ARCH(1) process with parameters $ \omega, \alpha$ (see Definition 12.1) is, for example, always $ \delta =
\delta (\alpha) < 1$. It can be approximated for $ \alpha=0.5$, for example, $ \delta \approx 0.835.$

Finally note that not every time series has an extremal index. A simple counter example is $ Z_t = A \cdot X_t$ with i.i.d. random variables $ X_t$, which are modelled by a random factor $ A > 0$ that is independent of $ X_t$. Since the factor $ A$ is contained in all observations, even in the most distant past, this time series has no decreasing memory. If the distribution of $ X_t$ has slowly decaying tails, i.e., they belong to the MDA of a Fréchet distribution, then it can be shown that $ Z_t$ can not have an extremal index.

The extreme theory for time series is still developing. The Fisher-Tippett theorem, however, exists as a central result in the following modified form:

Theorem 18.9  
Let $ \{Z_t\}$ be a strictly stationary time series with the distribution function $ F$ and an extremal index $ \delta>0.$ Let $ X_1, X_2, \ldots $ be i.i.d. with the same distribution function $ F.$ $ M_n^x=max\{X_1, \dots, X_n\}.$ Let $ G_\gamma$ be a general extreme value distribution. We have

       $ {\P} \left( \frac{M_n^ x - d_n}{c_n} \le x\right) \to G_\gamma (x)$
    if and only if      $ {\P} \left( \frac{M_n -
d_n}{c_n} \le x\right) \to G_\gamma^ \delta (x)$

for all $ x$ with $ 0 < G_\gamma(x) < 1.$

The maxima of the time series are standardized by the same series $ c_n, d_n$ and converge in distribution to the same type of asymptotic distribution as the maxima of the corresponding independent data, since $ G_\gamma^ \delta$ is itself a general extreme value distribution with the same form parameters as $ G_\gamma$. For example, for $ \gamma>0$ it holds that

$\displaystyle G_\gamma^ \delta (x) = \exp \{ - \delta (1+\gamma x)^ {-1/\gamma} \} = G_\gamma
(\frac{x-\mu}{\sigma}),\ \, 1 + \gamma x > 0 $

with $ \sigma =
\delta ^ \gamma $ and $ \mu = - (1-\delta^ \gamma),$ i.e., except for the location and scale parameters the distributions are identical.

Many of the techniques used in extreme value statistics, that were developed for independent data can be used on time series. To do this, however, one needs to have more data, because the effective size of the sample is only $ n\delta$ instead of $ n$. Besides that, additional problems appear: the POT method is perhaps in theory still applicable, but the excesses are no longer independent, especially when a financial time series with volatility clusters is considered. For this reason the parameters of the generalized Pareto distribution, with which the excess distribution is approximated, cannot be estimated by simply taking the maximum of the likelihood function of independent data. One way out of this is to either use special model assumptions, with which the likelihood function of the dependent excesses can be calculated, or by using a reduction technique, with which the data is made more "independent" at the cost of the sample size. One application, for example, replaces the cluster of neighboring excesses with a maximum value from the cluster, whereby the cluster size is so chosen that the sample size of the excesses is approximately reduced by the factor $ \delta$. Afterwards the POT estimators, which were developed for independent data, can be calculated from the reduced excesses.

Another problem is that the extremal index needs to be estimated in order to be able to use applications like the one just described. In the literature several estimation techniques are described. We will introduce only one here; one that can be described without a lot of technical preparation, the so called Block method. First the time series data $ Z_1, \ldots, Z_n$ is divided into $ b$ blocks, each has a length $ l$ (size $ n = b l,\ b,\ l$ large). Let $ M_l^ {(k)}$ be the maximum of the observations in the $ k$-th block:

$\displaystyle M_l^ {(k)} = \max (Z_{(k-1)l+1}, \ldots, Z_{kl}),\ k = 1, \ldots, b.$

For a large threshold value $ u,$ let    N$ (u) = \char93  \{ t \le n;\ Z_t
> u\}$ be the number of observations beyond the threshold and let $ B(u) = \char93  \{ k \le b;\ M_l^ {(k)} > u\}$ be the number of blocks with at least one observation beyond the threshold $ u$. The estimator for the extremal index is then

$\displaystyle \hat{\delta} = \frac{1}{l} \, \frac{\log \ (1-\frac{B(u)}{b})}{\log \
(1-\frac{\text{\rm N}(u)}{n} )} . $

Heuristically this estimator can be derived from the following three observations:
(i)
From the definition of the extremal index it follows that $ {\P}(M_n \le u) \approx F^ {\delta n} (u)$, when $ n, u \to \infty$, so that $ n \overline{F} (u) \to \tau.$ Solving for $ \delta$ it follows that

$\displaystyle \delta \approx \frac{\log \ {\P}(M_n \le u)}{n \log \ F(u)}. $

(ii)
$ F$ can be estimated using the empirical distribution function $ \hat{F}_n$, so that $ F(u) = 1-{\P}(Z_t > u)\approx 1-\frac{\text{\rm N}(u)}{n}$ .
(iii)
With $ n=b l$ it follows that
$\displaystyle {\P}(M_n \le u)$ $\displaystyle \approx$ $\displaystyle \prod_{k=1}^b {\P}(M_l^{(k)} \le u) \approx \{ {\P}(M_l^{(1)} \le u)\} ^ b$  
  $\displaystyle \approx$ $\displaystyle \{ \frac{1}{b} \sum^ b_{k=1} \boldsymbol{1}(M_l^{(k)}\leq u)\}^b = (1-\frac{B(u)}{b})^ b.$  

By combining the three observations we have

$\displaystyle \delta \approx \frac{b \ \log \ (1-\frac{B(u)}{b})}{n \ \log \ (1-\frac{\text{\rm N}(u)}{n})} =
\hat{\delta}. $