1.4 Estimation of Parameters

Like simulation, the estimation of stable law parameters is in general severely hampered by the lack of known closed-form density functions for all but a few members of the stable family. Either the pdf has to be numerically integrated (see the previous section) or the estimation technique has to be based on a different characteristic of stable laws.

All presented methods work quite well assuming that the sample under consideration is indeed $ \alpha $-stable. However, if the data comes from a different distribution, these procedures may mislead more than the Hill and direct tail estimation methods. Since the formal tests for assessing $ \alpha $-stability of a sample are very time consuming we suggest to first apply the ``visual inspection'' tests to see whether the empirical densities resemble those of $ \alpha $-stable laws.

Figure 1.4: A double logarithmic plot of the right tail of an empirical symmetric $ 1.9$-stable distribution function for a sample of size $ N = 10^4$ (left panel) and $ N = 10^6$ (right panel). Thick red lines represent the linear regression fit. The tail index estimate ( $ \hat\alpha=3.7320$) obtained for the smaller sample is close to the initial power-law like decay of the larger sample ( $ \hat\alpha=3.7881$). The far tail estimate $ \hat\alpha=1.9309$ is close to the true value of $ \alpha $.
\includegraphics[width=.7\defpicwidth]{STFstab04a.ps} \includegraphics[width=.7\defpicwidth]{STFstab04b.ps}


1.4.1 Tail Exponent Estimation

The simplest and most straightforward method of estimating the tail index is to plot the right tail of the empirical cdf on a double logarithmic paper. The slope of the linear regression for large values of $ x$ yields the estimate of the tail index $ \alpha $, through the relation $ \alpha = -\textrm{slope}$.

This method is very sensitive to the size of the sample and the choice of the number of observations used in the regression. For example, the slope of about $ -3.7$ may indicate a non-$ \alpha $-stable power-law decay in the tails or the contrary - an $ \alpha $-stable distribution with $ \alpha\approx 1.9$. This is illustrated in Figure 1.4. In the left panel a power-law fit to the tail of a sample of $ N = 10^4$ standard symmetric ( $ \beta =\mu =0$, $ \sigma =1$) $ \alpha $-stable distributed variables with $ \alpha=1.9$ yields an estimate of $ \hat\alpha = 3.732$. However, when the sample size is increased to $ N = 10^6$ the power-law fit to the extreme tail observations yields $ \hat\alpha=1.9309$, which is fairly close to the original value of $ \alpha $.

The true tail behavior (1.1) is observed only for very large (also for very small, i.e. the negative tail) observations, after a crossover from a temporary power-like decay (which surprisingly indicates $ \alpha \approx 3.7$). Moreover, the obtained estimates still have a slight positive bias, which suggests that perhaps even larger samples than $ 10^6$ observations should be used. In Figure 1.4 we used only the upper 0.15% of the records to estimate the true tail exponent. In general, the choice of the observations used in the regression is subjective and can yield large estimation errors, a fact which is often neglected in the literature.

Figure: Plots of the Hill statistics $ \hat\alpha_{n,k}$ vs. the maximum order statistic $ k$ for $ 1.8$-stable samples of size $ N = 10^4$ (top panel) and $ N = 10^6$ (left and right panels). Red horizontal lines represent the true value of $ \alpha $. For better exposition, the right panel is a magnification of the left panel for small $ k$. A close estimate is obtained only for $ k=500,...,1300$ (i.e. for $ k<0.13\%$ of sample size).
\includegraphics[width=.7\defpicwidth]{STFstab05a.ps}
\includegraphics[width=.7\defpicwidth]{STFstab05b.ps} \includegraphics[width=.7\defpicwidth]{STFstab05c.ps}

A well known method for estimating the tail index that does not assume a parametric form for the entire distribution function, but focuses only on the tail behavior was proposed by Hill (1975). The Hill estimator is used to estimate the tail index $ \alpha $, when the upper (or lower) tail of the distribution is of the form: $ 1-F(x) = C x^{-\alpha}$, see Figure 1.5. Like the log-log regression method, the Hill estimator tends to overestimate the tail exponent of the stable distribution if $ \alpha $ is close to two and the sample size is not very large. For a review of the extreme value theory and the Hill estimator see Härdle, Klinke, and Müller (2000, Chapter 13) or Embrechts, Klüppelberg, and Mikosch (1997).

These examples clearly illustrate that the true tail behavior of $ \alpha $-stable laws is visible only for extremely large data sets. In practice, this means that in order to estimate $ \alpha $ we must use high-frequency data and restrict ourselves to the most ``outlying'' observations. Otherwise, inference of the tail index may be strongly misleading and rejection of the $ \alpha $-stable regime unfounded.

We now turn to the problem of parameter estimation. We start the discussion with the simplest, fastest and ... least accurate quantile methods, then develop the slower, yet much more accurate sample characteristic function methods and, finally, conclude with the slowest but most accurate maximum likelihood approach. Given a sample $ x_1,...,x_n$ of independent and identically distributed $ S_{\alpha}(\sigma,\beta,\mu)$ observations, in what follows, we provide estimates $ \hat\alpha$, $ \hat\sigma$, $ \hat\beta$, and $ \hat\mu$ of all four stable law parameters.


1.4.2 Quantile Estimation

Already in 1971 Fama and Roll provided very simple estimates for parameters of symmetric ( $ \beta=0, \mu=0$) stable laws when $ \alpha >1$. McCulloch (1986) generalized and improved their method. He analyzed stable law quantiles and provided consistent estimators of all four stable parameters, with the restriction $ \alpha\ge 0.6$, while retaining the computational simplicity of Fama and Roll's method. After McCulloch define:

$\displaystyle v_\alpha=\frac{x_{0.95}-x_{0.05}}{x_{0.75}-x_{0.25}},$ (1.9)

which is independent of both $ \sigma$ and $ \mu$. In the above formula $ x_f$ denotes the $ f$-th population quantile, so that $ S_{\alpha}(\sigma,\beta,\mu)(x_f)=f$. Let $ \hat v_\alpha$ be the corresponding sample value. It is a consistent estimator of $ v_\alpha$. Now, define:

$\displaystyle v_\beta=\frac{x_{0.95}+x_{0.05}-2x_{0.50}}{x_{0.95}-x_{0.05}},$ (1.10)

and let $ \hat v_\beta$ be the corresponding sample value. $ v_\beta$ is also independent of both $ \sigma$ and $ \mu$. As a function of $ \alpha $ and $ \beta$ it is strictly increasing in $ \beta$ for each $ \alpha $. The statistic $ \hat v_\beta$ is a consistent estimator of $ v_\beta$.

Statistics $ v_\alpha$ and $ v_\beta$ are functions of $ \alpha $ and $ \beta$. This relationship may be inverted and the parameters $ \alpha $ and $ \beta$ may be viewed as functions of $ v_\alpha$ and $ v_\beta$:

$\displaystyle \alpha=\psi_1(v_\alpha,v_\beta), ~~~\beta=\psi_2(v_\alpha,v_\beta).$ (1.11)

Substituting $ v_\alpha$ and $ v_\beta$ by their sample values and applying linear interpolation between values found in tables provided by McCulloch (1986) yields estimators $ \hat\alpha$ and $ \hat\beta$.

Scale and location parameters, $ \sigma$ and $ \mu$, can be estimated in a similar way. However, due to the discontinuity of the characteristic function for $ \alpha =1$ and $ \beta\ne 0$ in representation (1.2), this procedure is much more complicated. We refer the interested reader to the original work of McCulloch (1986).


1.4.3 Characteristic Function Approaches

Given a sample $ x_1,...,x_n$ of independent and identically distributed (i.i.d.) random variables, define the sample characteristic function by

$\displaystyle \hat\phi(t) = \frac{1}{n} \sum\limits_{j=1}^{n} e^{itx_j}.$ (1.12)

Since $ \vert\hat\phi(t)\vert$ is bounded by unity all moments of $ \hat\phi(t)$ are finite and, for any fixed $ t$, it is the sample average of i.i.d. random variables $ \exp(itx_j)$. Hence, by the law of large numbers, $ \hat\phi(t)$ is a consistent estimator of the characteristic function $ \phi(t)$.

Press (1972) proposed a simple estimation method, called the method of moments, based on transformations of the characteristic function. The obtained estimators are consistent since they are based upon estimators of $ \phi(t)$, $ {\rm Im}\{\phi(t)\}$ and $ {\rm Re}\{\phi(t)\}$, which are known to be consistent. However, convergence to the population values depends on a choice of four points at which the above functions are evaluated. The optimal selection of these values is problematic and still is an open question. The obtained estimates are of poor quality and the method is not recommended for more than preliminary estimation.

Koutrouvelis (1980) presented a regression-type method which starts with an initial estimate of the parameters and proceeds iteratively until some prespecified convergence criterion is satisfied. Each iteration consists of two weighted regression runs. The number of points to be used in these regressions depends on the sample size and starting values of $ \alpha $. Typically no more than two or three iterations are needed. The speed of the convergence, however, depends on the initial estimates and the convergence criterion.

The regression method is based on the following observations concerning the characteristic function $ \phi(t)$. First, from (1.2) we can easily derive:

$\displaystyle \ln(-\ln\vert\phi(t)\vert^{2})=\ln(2\sigma^{\alpha})+\alpha\ln\vert t\vert.$ (1.13)

The real and imaginary parts of $ \phi(t)$ are for $ \alpha\ne 1$ given by
$\displaystyle \Re\{\phi(t)\}=\exp(-\vert\sigma t\vert^\alpha)
\cos\left[\mu t+\vert\sigma t\vert^\alpha \beta {\rm sign}(t)\tan\frac{\pi\alpha}{2}\right],$      

and
$\displaystyle \Im\{\phi(t)\}=\exp(-\vert\sigma t\vert^\alpha)
\sin\left[\mu t+\vert\sigma t\vert^\alpha \beta {\rm sign}(t)\tan\frac{\pi\alpha}{2}\right].$      

The last two equations lead, apart from considerations of principal values, to

$\displaystyle \arctan\left(\frac{\Im\{\phi(t)\}}{\Re\{\phi(t)\}}\right) =\mu t+\beta\sigma^\alpha \tan\frac{\pi\alpha}{2} {\rm sign}(t) \vert t\vert^\alpha.$ (1.14)

Equation (1.13) depends only on $ \alpha $ and $ \sigma$ and suggests that we estimate these parameters by regressing $ y=\ln(-\ln\vert\phi_{n}(t)\vert^{2})$ on $ w=\ln\vert t\vert$ in the model
$\displaystyle y_k=m+\alpha w_k + \epsilon_k,$   $\displaystyle k=1,2,...,K ,$ (1.15)

where $ {t_k}$ is an appropriate set of real numbers, $ m=\ln(2\sigma^\alpha)$, and $ \epsilon_k$ denotes an error term. Koutrouvelis (1980) proposed to use $ t_{k}=\frac{\pi k}{25},
k=1,2,...,K$; with $ K$ ranging between 9 and 134 for different estimates of $ \alpha $ and sample sizes.

Once $ \hat\alpha$ and $ \hat\sigma$ have been obtained and $ \alpha $ and $ \sigma$ have been fixed at these values, estimates of $ \beta$ and $ \mu$ can be obtained using (1.14). Next, the regressions are repeated with $ \hat\alpha$, $ \hat\sigma$, $ \hat\beta$ and $ \hat\mu$ as the initial parameters. The iterations continue until a prespecified convergence criterion is satisfied.

Kogon and Williams (1998) eliminated this iteration procedure and simplified the regression method. For initial estimation they applied McCulloch's (1986) method, worked with the continuous representation (1.3) of the characteristic function instead of the classical one (1.2) and used a fixed set of only 10 equally spaced frequency points $ t_k$. In terms of computational speed their method compares favorably to the original method of Koutrouvelis (1980). It has a significantly better performance near $ \alpha =1$ and $ \beta\ne 0$ due to the elimination of discontinuity of the characteristic function. However, it returns slightly worse results for very small $ \alpha $.


1.4.4 Maximum Likelihood Method

The maximum likelihood (ML) estimation scheme for $ \alpha $-stable distributions does not differ from that for other laws, at least as far as the theory is concerned. For a vector of observations $ x=(x_1,...,x_n)$, the ML estimate of the parameter vector $ \theta=(\alpha, \sigma, \beta, \mu)$ is obtained by maximizing the log-likelihood function:

$\displaystyle L_\theta(x) = \sum_{i=1}^n \ln \tilde{f}(x_i; \theta),$ (1.16)

where $ \tilde{f}(\cdot; \theta)$ is the stable pdf. The tilde denotes the fact that, in general, we do not know the explicit form of the density and have to approximate it numerically. The ML methods proposed in the literature differ in the choice of the approximating algorithm. However, all of them have an appealing common feature - under certain regularity conditions the maximum likelihood estimator is asymptotically normal.

Modern ML estimation techniques either utilize the FFT-based approach for approximating the stable pdf (Mittnik et al.; 1999) or use the direct integration method (Nolan; 2001). Both approaches are comparable in terms of efficiency. The differences in performance result from different approximation algorithms, see Section 1.2.2.

Simulation studies suggest that out of the five described techniques the method of moments yields the worst estimates, well outside any admissible error range (Stoyanov and Racheva-Iotova; 2004; Weron; 2004). McCulloch's method comes in next with acceptable results and computational time significantly lower than the regression approaches. On the other hand, both the Koutrouvelis and the Kogon-Williams implementations yield good estimators with the latter performing considerably faster, but slightly less accurate. Finally, the ML estimates are almost always the most accurate, in particular, with respect to the skewness parameter. However, as we have already said, maximum likelihood estimation techniques are certainly the slowest of all the discussed methods. For example, ML estimation for a sample of a few thousand observations using a gradient search routine which utilizes the direct integration method is slower by 4 orders of magnitude than the Kogon-Williams algorithm, i.e. a few minutes compared to a few hundredths of a second on a fast PC! Clearly, the higher accuracy does not justify the application of ML estimation in many real life problems, especially when calculations are to be performed on-line.