18.2 Statistics of Extreme Events

Throughout the entire section $ X, X_1, \ldots, X_n$ are unbounded, i.i.d. random variables with distribution function $ F$.

Notation: $ X_{(1)} \le \ldots \le X_{(n)}$ and $ X^ {(1)} \ge
\ldots \ge X^ {(n)}$ represent the order statistics, that is, the data is sorted according to increasing or decreasing size. Obviously then $ X_{(1)} = X^ {(n)},\ X_{(n)} = X^ {(1)} $ etc.

Definition 18.7 (Empirical Average Excess Function)  
Let $ K_n (u) = \{ j \le n;\ X_j > u\}$ be the index of the observations outside of the threshold $ u$, and let $ {\text{\rm N}(u)} = \char93
K_n (u)$ be their total number and

$\displaystyle \hat{F}_n (x) = \frac{1}{n} \, \sum^ n_{j=1} \boldsymbol{1}(X_j \leq x)$

the empirical distribution function, $ \overline{\hat{F}_n} =
1 - \hat{F}_n$.
$\displaystyle e_n (u) = \int^ \infty_u \overline{\hat{F}_n} (y) dy /
\overline{\hat{F}_n} (u)$ $\displaystyle =$ $\displaystyle \frac{1}{N(u)} \sum_{j\in K_n(u)} (X_j
- u)$  
  $\displaystyle =$ $\displaystyle \frac{1}{N(u)} \sum_{j=1}^n max\{(X_j - u),0\}$  

is called the empirical average excess function.

$ e_n(u)$ estimates the average excess function $ e(u)$ from Section 17.1.

As an explorative data analysis the following graphs will be considered:

Plot of the probability distribution function$\displaystyle \index{plot of the probability distribution function}$   $\displaystyle \big\{ F(X^{(k)}),\ \frac{n-k+1}{n+1} \big\}_{k=1}^n,$  
Quantile plot   $\displaystyle \big\{X^{(k)},\ F^{-1} (\frac{n-k+1}{n+1})\big\}_{k=1}^n,$  
Average excess plot   $\displaystyle \big\{X^{(k)}, e_n (X^{(k)})\big\}_{k=1}^n.$  

If the original model assumptions, that $ F$ is the distribution of the data, is correct, then the first two graphs should be approximately linear. If this is not the case, then the distribution assumptions must be changed. On the other hand, due to Theorem 17.5, b) the average excess plot is for size $ k$ approximately linear with a slope $ 1/(\alpha-1)$ if $ F$ belongs to the maximum domain of attraction of a Fréchet distribution $ G_{1,\alpha}$ for $ \alpha > 1$, i.e. with a finite expectation.

Fig.: Daily log-return of JPY/USD exchange rate. 30328 SFEjpyusd.xpl
\includegraphics[width=1\defpicwidth]{jpyusd.ps}

As an example consider the daily returns of the exchange rate between the Yen and the U.S. dollar from December 1, 1978 to January 31, 1991 in Figure 17.4. Figure 17.5 shows the plot of the probability distribution function and the quantile plot for the pdf $ F(x)=\Phi(x)$ of the standard normal. The deviations from the straight line clearly shows that the data is not normally distributed. Figure 17.6 shows again the average excess plot of the data.

Fig.: PP plot and QQ plot. 30333 SFEjpyusd.xpl
\includegraphics[width=0.6\defpicwidth]{cdfp.ps}
\includegraphics[width=0.6\defpicwidth]{qa.ps}

Fig.: Empirical mean excess function (solid line), GP mean excess function for Hill estimator (dotted line) and moment estimator (broken line). 30337 SFEjpyusd.xpl
\includegraphics[width=1\defpicwidth]{mexplot.ps}

18.2.1 The POT (peaks-over-threshold) Method

In this section and the following we will take a look at estimators for extreme value characteristics such as the exceedance probabilities $ \overline{F}(x) = 1- F(x)$ for values $ x$ or the extreme quantile $ F^{-1} (q) $ for $ q \approx 1.$

First, we only consider distributions $ F$ that are contained in the MDA of a GEV distribution $ G_\gamma,\ \gamma \ge 0,$. The corresponding random variables are thus unbounded.

Definition 18.8 (Excess)  
Let $ K_n (u)$ and $ {\text{\rm N}(u)}$ be, as before, the index and total number of observations beyond the threshold $ u$ respectively. The excess beyond the threshold $ u$ is defined as the random variables $ Y_l, l= 1, \ldots, {\text{\rm N}(u)},$ with

$\displaystyle \{ Y_1, \ldots, Y_{N(u)}\} = \{ X_j - u;\ j \in K_n (u)\} = \{ X^{(1)}-u, \ldots, X^{(N(u))}-u \}. $

The excesses $ Y_l,\ l \le {\text{\rm N}(u)}$ describe by how much the observations, which are larger than $ u$, go beyond the threshold $ u$. The POT method (peaks-over-threshold method) assumes that these excesses are the basic information source for the initial data. From the definition it immediately follows that $ Y_1, \ldots, Y_{N(u)}$ are i.i.d. random variables with distribution $ F_u$ given their random total number $ {\text{\rm N}(u)}$, i.e., the excess distribution from Definition 17.5 is the actual distribution of the excesses. Due to Theorem 17.6 it also holds that $ F_u (y) \approx
W_{\gamma,\beta(u)}(y)$ for a GP distribution $ W_{\gamma,\beta(u)}$ and all sufficiently large $ u$.

Let's first consider the problem of estimating the exceedance probability $ \overline{F}(x)$ for large $ x$. A natural estimator is $ \overline{\hat{F}}_n(x)$, the cdf at $ x$ is replaced with the empirical distribution function. For large $ x$, however, the empirical distribution function varies a lot, because it is determined by the few extreme observations which are located around $ x$. The effective size of the sub-sample of extreme, large observations is too small to use a pure non-parametric estimator such as the empirical distribution function. Therefore, we use the following relationship among the extreme exceedance probability $ \overline{F}(x)$, the exceedance probability $ \overline{F}(u)$ for a large, but not extremely large threshold and the excess distribution. Due to Definition 17.5 the excess distribution is

    $\displaystyle \overline{F}_u (y) = {\P}(X - u > y \ \vert \ X > u) =
\overline{F} (y + u) / \overline{F} (u),$   i.e.  
    $\displaystyle \overline{F} (x) = \overline{F} (u) \cdot \overline{F_u} (x -
u),\ \, u < x < \infty.$ (18.4)

For large $ u$ and using Theorem 17.6 we can approximate $ F_u$ with $ W_{\gamma, \beta}$ for appropriately chosen $ \gamma,
\beta$. $ F(u)$ is replaced with the empirical distribution function $ \hat{F}_n(u)$ at the threshold $ u$, for which due to the definition of $ {\text{\rm N}(u)}$ it holds that

$\displaystyle \hat{F}_n (u) = \frac{n-{\text{\rm N}(u)}}{n} = 1 - \frac{{\text{\rm N}(u)}}{n} . $

For $ u$ itself this is a useful approximation, but not for the values $ x$, which are clearly larger than the average sized threshold $ u$. The estimator $ 1-\hat{F}_n (x) $ of $ \overline{F}(x)$ for extreme $ x$ only depends on a few observations and is therefore too unreliable. For this reason the POT method uses the identity (17.4) for $ \overline{F}(x)$ and replaces both factors on the right hand side with their corresponding approximations, whereby the unknown parameter of the generalized Pareto distribution is replaced with a suitable estimator.

Definition 18.9 (POT Estimator)  
The POT estimator $ \overline{F}^ {\wedge} (x)$ for the exceedance probability $ \overline{F} (x),$ for large $ x$, is

$\displaystyle \overline{F} ^ {\wedge} (x) = \frac{{\text{\rm N}(u)}}{n} \,
\ove...
...at{\gamma} (x-u)}{\hat{\beta}} \right\} ^ {-
1/\hat{\gamma}},\ u < x < \infty, $

whereby $ \hat{\gamma},
\hat{\beta}$ are suitable estimators for $ \gamma$ and $ \beta$ respectively.

$ \hat{\gamma},
\hat{\beta}$ can be, for example, calculated as maximum likelihood estimators from the excesses $ Y_1, \ldots, Y_{N(u)}$. First let's consider the case where N$ (u)=m$ is a constant and where $ Y_1,\ldots, Y_m$ is a sample of i.i.d. random variables with the distribution $ W_{\gamma, \beta}, \gamma > 0,$. Thus $ W_{\gamma, \beta}$ is literally a Pareto distribution and has the probability density

$\displaystyle p(y) = \frac{1}{\beta} (1 + \frac{\gamma y}{\beta} ) ^ { -
\frac{1}{\gamma} - 1} ,\ \, x \ge 0. $

Therefore, the log likelihood function is

$\displaystyle \ell (\gamma, \beta \ \vert \ Y_1, \ldots, Y_m) = - m \log \beta ...
...\frac{1}{\gamma} + 1\big) \, \sum^
m_{j=1} \log (1 + \frac{\gamma}{\beta} Y_j).$

By maximizing this function with respect to $ \gamma,
\beta$ we obtain the maximum likelihood (ML) estimator $ \hat{\gamma}, \hat{\beta}.$ Analogously we could also define the ML estimator for the parameter of the generalized Pareto distribution using $ \gamma \le 0$.

Theorem 18.7  
For all $ \gamma > - \frac{1}{2}$ it holds for $ m \to \infty$

$\displaystyle \sqrt{m} (\hat{\gamma} - \gamma,\ \frac{\hat{\beta}}{\beta} - 1)
\stackrel{{\cal L}}{\longrightarrow} {\text{\rm N}_2} (0,D^ {-1}), $

with $ D = (1+\gamma) \left( \begin{array}{ll} 1 + \gamma & -1 \\
-1 & 2
\end{array}\right) , $ i.e. $ (\hat{\gamma}, \hat{\beta})$ are asymptotically normally distributed. In addition they are asymptotically efficient estimators.

In our initial problem $ m =$   N$ (u)$ was random. Here the estimators we just defined, $ \hat{\gamma}$   and $ \hat{\beta},$ are the conditional ML estimators given N$ (u).$ The asymptotic distribution theory is also known in this case; in order to avoid an asymptotic bias, $ \overline{F}$ must fulfill an additional regularity condition. After we find an estimator for the exceedance probability and thus a cdf for large $ x$, we immediately obtain an estimator for the extreme quantile.

Definition 18.10 (POT Quantile Estimator)  
The POT Quantile estimator $ \hat{x}_q$ for the $ q$-quantile $ x_q = F^{-1}(q)$ is the solution to $ \overline{F} ^ {\wedge}
(\hat{x}_q) = 1 - q,$ i.e.

$\displaystyle \hat{x}_q = u + \frac{\hat{\beta}}{\hat{\gamma}} \left[ \left\{ \frac{n}{\text{\rm N}(u)}
(1-q) \right\} ^ {-\hat{\gamma}} - 1 \right] . $

30634 SFEpotquantile.xpl

We can compare these estimators with the usual sample quantiles. To do this we select a threshold value $ u$ so that exactly $ k$ excesses lie beyond $ u$, that is N$ (u)= k
> n(1-q)$ and thus $ u = X^{(k+1)}$. The POT quantile estimator that is dependent on the choice of $ u$ respectively $ k$ is

$\displaystyle \hat{x}_{q,k} = X^ {(k+1)} + \frac{\hat{\beta}_k}{\hat{\gamma}_k}
\left[ \left\{ \frac{n}{k} (1-q)\right\}^ {- \hat{\gamma}_k}
-1\right],$

where $ \hat{\gamma}_k, \hat{\beta}_k$ is the ML estimator, dependent on the choice of $ k$, for $ \gamma$ and $ \beta$. The corresponding sample quantile is

$\displaystyle \hat{x}_q^ s = X^ {([n(1-q)] + 1)}. $

This is in approximate agreement with $ \hat{x}_{q,k}$ when the minimal value $ k=[n(1-q)] + 1$ is chosen for $ k$. Simulation studies show that the value $ k_0$ of $ k$, which minimizes the mean squared error MSE$ (\hat{x}_{q,k})= {\mathop{\text{\rm\sf E}}} (\hat{x}_{q,k} -
x_q)^ 2 $, is much larger than $ [n(1-q)] + 1$, i.e., the POT estimator for $ x_q$ differs distinctly from the sample quantile $ \hat{x}_q^ s$ and is superior to it with respect to the mean squared error when the threshold $ u$ respectively $ k$ is cleverly chosen.

We are interested in a threshold $ u$, for which the mean squared error of $ \hat{x}_q$ is as small as possible. The error can be split up into the variance and the squared bias of $ \hat{x}_q$:

MSE$\displaystyle (\hat{x}_q) = {\mathop{\text{\rm\sf E}}}( \hat{x}_q - x_q )^2 = {...
...text{\rm Var}}}(\hat{x}_q) + \{{\mathop{\text{\rm\sf E}}}(\hat{x}_q) - x_q\}^2.$

Unfortunately the two components of the mean squared error move in opposite directions when we vary the threshold $ u$ used in calculating the POT quantile estimators. We are therefore confronted with the following bias variance dilemma:
-
when $ u$ is too large, there are few excesses $ Y_l,\ l \le$   N$ (u)$, and the estimator's variance is too large,
-
when $ u$ is too small, the approximation of the excess distribution using a generalized Pareto distribution is not good enough, and the bias $ {\mathop{\text{\rm\sf E}}} (\hat{x}_{q}) - x_q$ is no longer reliable.
An essential aid in selecting an appropriate threshold $ u$ is the average excess plot, which is approximately linear beyond the appropriate threshold. This has already been discussed in Theorem 17.5, when one considers the relationship between the Fréchet distribution as the asymptotic distribution of the maxima and the Pareto distribution as the asymptotic distribution of the excesses. It is supported by the following result for the Pareto and exponential distributions $ W_{\gamma,\beta}, \gamma \ge
0,$.

Theorem 18.8  
Let $ Z$ be a $ W_{\gamma, \beta}$ distributed random variable with $ 0 \le \gamma < 1$. The average excess function is linear:

$\displaystyle e(u) = {\mathop{\text{\rm\sf E}}} \{ Z-u \vert Z> u \} = \frac{\beta + \gamma u}{1+\gamma},\ \, u
\ge 0,\ \, \text{\rm for\ } \, 0 \le \gamma < 1. $

With the usual parametrization of the Pareto distribution $ \gamma =
\frac{1}{\alpha}$, i.e., the condition $ \gamma < 1$ means that $ \alpha > 1$ and thus $ {\mathop{\text{\rm\sf E}}} \vert Z \vert< \infty. $

This result motivates the following application in choosing the threshold: select the threshold $ u$ of the POT estimator so that the empirical average excess function $ e_n (v)$ for values $ v \ge
u$ is approximately linear. An appropriate $ u$ is chosen by considering the average excess plots, where it is recommended that the largest points $ (X_{(k)}, e_n (X_{(k)})),\ \, k \approx n, $ along the righthand edge of the plot be excluded, since their large variability for the most part distorts the optical impression.

18.2.2 The Hill Estimator

The POT method for estimating the exceedance probability and the extreme quantiles can be used on data with cdf that is in the MDA of a Gumbel or a Fréchet distribution, as long as the expected value is finite. Even for extreme financial data, this estimator seems reasonable based on empirical evidence. A classic alternative to the POT method is the Hill estimator, which was already discussed in Chapter 12 in connection with the estimation of the tail exponents of the DAX stocks. It is of course only useful for distributions with slowly decaying tails, such as those in the MDA of the Fréchet distribution, and performs in simulations more often worse in comparison to the POT estimator. The details are briefly introduced in this section.

In this section we will always assume that the data $ X_1,\ldots,X_n$ are i.i.d. with a distribution function $ F$ in the MDA of $ G_{1,\alpha}$ for some $ \alpha>0$. Due to Theorem 17.3 this is the case when $ \overline{F}(x) = x^{-\alpha} L(x)$ with a slowly varying function $ L$. The tapering behavior of $ \overline{F}(x) = {\P}(X_t
> x)$ for increasing $ x$ is mainly determined by the so called tail exponents $ \alpha$. The starting point of the Hill method is the following estimator for $ \alpha$.

Definition 18.11 (Hill estimator)  
$ X^ {(1)} \ge X^ {(2)} \ge \ldots \ge X^ {(n)}$ are the order statistics in decreasing order. The Hill estimator $ \hat{\alpha}_H$ of the tail exponents $ \alpha$ for a suitable $ k=
k(n)$ is

$\displaystyle \hat{\alpha} _H = \{ \frac{1}{k} \, \sum^ k_{j=1} \log \ X^ {(j)} - \log \ X^
{(k)} \} ^ {-1}. $

The form of the estimator can be seen from the following simple special case. In general it holds that $ \overline{F}(x) =
L(x)/(x^\alpha)$, but we now assume that with a fixed $ c > 0$ $ L(x) = c^\alpha$ is constant. Set $ V_j = \log (X_j/c)$, it holds that

$\displaystyle {\P}(V_j > v) = {\P}(X_j > c e^ v) = \overline{F}(c e^v) = \frac{c^\alpha}{(c e^v)^\alpha}
= e^{-\alpha v},\ y \ge 0, $

$ V_1, \ldots, V_n$ are therefore independent exponentially distributed random variables with parameter $ \alpha$. As is well known it holds that $ \alpha = ({\mathop{\text{\rm\sf E}}}
V_j)^{-1}$, and the ML estimator $ \hat{\alpha}$ for $ \alpha$ is $ 1/\overline{V}_n$, where $ \overline{V}_n$ stands for the sample average of $ V_1, \ldots, V_n$, thus,

$\displaystyle \hat{\alpha} = \frac{1}{\overline{V}_n} = \{ \frac{1}{n} \sum^ n_...
...\} ^ {-1} = \{ \frac{1}{n} \sum^ n_{j=1} \log \ X^ {(j)}
- \log \ c \} ^ {-1}, $

where for the last equation only the order of addition was changed. $ \hat{\alpha}$ is already similar to the Hill estimator. In general it of course only holds that $ \overline{F}(x) \approx \frac{c^\alpha}{x^ \alpha} $ for sufficiently large $ x$. The argument for the special case is similar for the largest observations $ X^{(1)} \ge X^{(2)} \ge
\ldots \ge X^ {(k)} \ge u$ beyond the threshold $ u$, so that only the $ k$ largest order statistics enter the definition of the Hill estimator.

The Hill estimator is consistent, that is it converges in probability to $ \alpha$ when $ n, k \to \infty$ such that $ k/n \to
0$. Under an additional condition it can also be shown that $ \sqrt{k} (\hat{\alpha}_H - \alpha) \stackrel{{\cal
L}}{\longrightarrow} {\text{\rm N}(0, \alpha ^ 2)}$, i.e., $ \hat{\alpha}_H$ is asymptotically normally distributed.

Similar to the POT estimator when considering the Hill estimator the question regarding the choice of the threshold $ u = X^{(k)}$ comes into play, since the observations located beyond it enter the estimation. Once again we have a bias variance dilemma:

-
When $ k$ is too small, only a few observations influence $ \hat{\alpha}_H$, and the variance of the estimator, which is $ \alpha^2/k$ asymptotically, is too large,
-
when $ k$ is too large, the assumption underlying the derivation of the estimator, i.e., that $ L(x)$ is approximately constant for all $ x
\ge X^ {(k)}$, is in general not well met and the bias $ {\mathop{\text{\rm\sf E}}}
\hat{\alpha}_H - \alpha$ becomes too large.
Based on the fundamentals of the Hill estimator for the tail exponents $ \alpha$ we obtain direct estimators for the exceedance probability $ \overline{F}(x)$ and for the quantiles of $ F$. Since $ \overline{F}(x) = x^{-\alpha} L(x)$ with a slowly varying function $ L$, it holds for large $ x
\ge X^ {(k)}$ that:

$\displaystyle \frac{\overline{F}(x)}{\overline{F}(X^ {(k)})} = \frac{L(x)}{L(X^...
...^ {(k)}}{x} \right) ^ \alpha \approx \left( \frac{X^ {(k)}}{x} \right)^ \alpha,$ (18.5)

Because exactly one portion $ k/n$ of the data is larger or equal to the order statistic $ X^{(k)}$, this is the $ (1-k/n)$ sample quantile. Therefore, the empirical distribution function takes on the value $ 1-k/n$ at $ X^{(k)}$, since it uniformly converges to the distribution function $ F$, for sufficiently large $ n,$ a $ k$ that is not too large in comparison to $ n$ yields: $ F(X^{(k)}) \approx 1- k/n$, i.e., $ \overline{F}(X^{(k)}) \approx
k/n$. Substituting this into (17.5), we obtain a Hill esitmator for the exceedance probability $ \overline{F}(x):$

$\displaystyle \overline{F}_H^ {\wedge} (x) = \frac{k}{n} \left( \frac{X^ {(k)}}{x} \right) ^
{\hat{\alpha}_H} $

By inverting this estimator we have the Hill quantile estimator for the $ q-$quantile $ x_q$ with $ q \approx 1:$
$\displaystyle \hat{x}_{q,H}$ $\displaystyle =$ $\displaystyle X^ {(k)} \left\{ \frac{n}{k} (1-q)\right\}
^ {- 1/ \hat{\alpha}_H}$  
  $\displaystyle =$ $\displaystyle X^ {(k)} + X^ {(k)} \left[ \left\{ \frac{n}{k} (1-q) \right\}
^ {- \hat{\gamma}_H} - 1 \right]$  

with $ \hat{\gamma}_H = 1/\hat{\alpha}_H$, where the second representation clearly shows the similarities and differences to the POT quantile estimator.
31152 SFEhillquantile.xpl