12.5 The Empirical Likelihood concept


12.5.1 Introduction into Empirical Likelihood

Let us now as in Owen (1988) and Owen (1990) introduce the empirical likelihood (EL) concept. Suppose a sample $ (U_1, \ldots ,U_n)$ of independent identically distributed random variables in $ \mathbb{R}^1$ according to a probability law with unknown distribution function $ F$ and unknown density $ f$. For an observation $ (u_1, \ldots ,u_n)$ of $ (U_1, \ldots ,U_n)$ the likelihood function is given by

$\displaystyle {\bar L}(f) = \prod_{i=1}^{n} f(u_i)$ (12.8)

The empirical density calculated from the observations $ (u_1, \ldots ,u_n)$ is

$\displaystyle f_n(u) \stackrel{\mathrm{def}}{=}\frac{1}{n} \sum_{i=1}^n \boldsymbol{1}\{u_i = u\}$ (12.9)

where $ \boldsymbol{1}$ denotes the indicator function. It is easy to see that $ f_n$ maximizes $ {\bar L}(f)$ in the class of all probability density functions.

The objective of the empirical likelihood concept is the construction of tests and confidence intervals for a parameter $ \theta = \theta(F)$ of the distribution of $ U_i$. To keep things simple we illustrate the empirical likelihood method for the expectation $ \textrm{E}[U_i]$. The null hypothesis is $ \textrm{E}[U_i] = \theta$. We can test this assumption based on the empirical likelihood ratio

$\displaystyle R(F) \stackrel{\mathrm{def}}{=}\frac{{\bar L} \{ f({\theta}) \} }{{\bar L}(f_n)}$ (12.10)

where $ f({\theta})$ maximizes $ {\bar L}(f)$ subject to

$\displaystyle \int U_i dF = \theta.$ (12.11)

On a heuristic level we can reject the null hypothesis ``under the true distribution $ F$, $ U$ has expectation $ \theta $'' if the ratio $ R(F)$ is small relative to $ 1$, i.e. the test rejects if $ R(F) < r$ for a certain level $ r \in (0,1)$. More precisely, Owen (1990) proves the following

THEOREM 12.1   Let $ (U_1, \ldots ,U_n)$ be iid one-dimensional random variables with expectation $ \theta $ and variance $ \sigma^2$. For a positive $ r<1$ let

$\displaystyle C_{r,n} = \left\{ \int U_i dF \; \Big\vert \; F \ll F_n, R(F) \geq r \right\} $

be the set of all possible expectations of $ U$ with respect to distributions $ F$ dominated by $ F_n$ ($ F \ll F_n$). Then it follows
$\displaystyle \lim_{n \rightarrow \infty} \textrm{P}[ \theta \in C_{r,n} ]
= \textrm{P}[ \chi^2 \leq -2 \log r ]$     (12.12)

where $ \chi^2$ is a $ \chi^2$-distributed random variable with one degree of freedom.

From Theorem 12.1 it follows directly

$\displaystyle \lim_{n \rightarrow \infty}
\textrm{P}\Big[
-2 \log \left\{\max_...
...\leq r \; \Big\vert \; \textrm{E}U_i = \theta\Big]
= \textrm{P}[\chi^2 \leq r]
$

This result suggests therefore to use the log-EL ratio

$\displaystyle -2 \log \left\{\max_{\{F \vert F \ll F_n, \int U_i dF = \theta\}}...
...t U_i dF = \theta\}} \frac{{\bar L} \{ f({\theta}) \} }{{\bar L}(f_n)} \right\}$    

as the basic element of a test about a parametric hypothesis for the drift function of a diffusion process.


12.5.2 Empirical Likelihood for Time Series Data

We will now expand the results in Section 12.5.1 to the case of time series data. For an arbitrary $ x \in [0,1]$ and any function $ \mu$ we have

$\displaystyle \textrm{E}\left[ K\left({x-X_i\over h}\right) \{Y_i-\mu(x)\} \; \Big\vert \; \textrm{E}[Y_i\vert X_i=x] = \mu(x) \right] = 0 .$ (12.13)

Let $ p_i(x)$ be nonnegative numbers representing a density for

$\displaystyle K\left({x-X_i\over h}\right) \{Y_i-\mu(x)\} \qquad i=1, \ldots, n$

The empirical likelihood for $ \mu(x)$ is

$\displaystyle L\{\mu(x)\} \stackrel{\mathrm{def}}{=}\max \prod^{n}_{i=1}p_i(x)$ (12.14)

subject to $ \sum^{n}_{i=1} p_i(x) = 1$ and $ \sum^{n}_{i=1} p_i(x) K\left({x-X_i\over h}\right) \{
Y_i-\mu(x)\} = 0$. The second condition reflects (12.13).

We find the maximum by introducing Lagrange multipliers and maximizing the Lagrangian function

$\displaystyle {\cal L} (p,\lambda_1, \lambda_2)$ $\displaystyle =$ $\displaystyle \sum_{i=1}^n \log p_i(x)$  
    $\displaystyle - \lambda_1 \sum_{i=1}^n p_i(x)
K\left({x-X_i\over h}\right) \{ Y_i-\mu(x)\}
- \lambda_2 \left\{ \sum_{i=1}^n p_i(x) -1 \right\}$  

The partial derivatives are

$\displaystyle \frac{\partial {\cal L} (p,\lambda_1, \lambda_2)}{\partial p_i(x)...
...over h}\right) \{ Y_i-\mu(x)\}
- \lambda_2 \qquad \forall i = 1, \ldots, n \; .$

With $ \lambda = \lambda_1/\lambda_2$ we obtain as a solution to (12.14) the optimal weights

$\displaystyle p_i(x) = n^{-1} \left[ 1 + \lambda(x) K\left({x-X_i\over h}\right) \{ Y_i-\mu(x)\} \right]^{-1}$ (12.15)

where $ \lambda(x)$ is the root of

$\displaystyle \sum^{n}_{i=1} { K\left({x-X_i\over h}\right) \{ Y_i-\mu(x)\} \over 1 + \lambda(x) K\left({x-X_i\over h}\right) \{ Y_i-\mu(x)\} } = 0.$ (12.16)

Note, that $ \lambda_2 = n$ follows from

$\displaystyle \sum_{i=1}^n p_i(x) + \lambda \sum_{i=1}^n p_i(x)
K\left({x-X_i\over h}\right) \{ Y_i-\mu(x)\} = 1 \; .$

The maximum empirical likelihood is achieved at $ p_i(x) = n^{-1}$ corresponding to the nonparametric curve estimate $ \mu(x) = \hat{m}(x)$. For a parameter estimate $ {\hat \theta}$ we get the maximum empirical likelihood for the smoothed parametric model $ L\{\tilde{m}_{\hat{\theta}}(x)\}$. The log-EL ratio is

$\displaystyle \ell\{\tilde{m}_{\hat{\theta}}(x)\} \stackrel{\mathrm{def}}{=}
-2...
...x)\}}
{ L\{ {\hat m}(x) \}}
=
-2 \log[L \{\tilde{m}_{\hat{\theta}}(x)\} n^{n}].$

To study properties of the empirical likelihood based test statistic we need to evaluate $ \ell\{\tilde{m}_{\hat{\theta}}(x)\}$ at an arbitrary $ x$ first, which requires the following lemma on $ \lambda(x)$ that is proved in Chen et al. (2001).

LEMMA 12.1   Under the assumptions (i)-(vi),

$\displaystyle \sup_{x \in [0,1]} \vert\lambda(x)\vert ={\scriptstyle \mathcal{O}}_p\{ (nh)^{-1/2} \log(n)\}.$

Let $ \gamma(x)$ be a random process with $ x \in [0,1]$. Throughout this chapter we use the notation $ \gamma(x)=\tilde{{\mathcal{O}}}_p(\delta_n)$ ( $ \tilde{{\scriptstyle \mathcal{O}}}_p(\delta_n)$) to denote the facts that $ \sup_{x \in [0,1]} \vert\gamma(x)\vert = {\mathcal{O}}_p(\delta_n)$ ( $ {\scriptstyle \mathcal{O}}_p(\delta_n)$) for a sequence $ \delta_n$.

Let $ \bar{U}_j(x) = (nh)^{-1}\sum_{i=1}^n \biggl[ K\left({x-X_i\over h}\right) \lbrace Y_i - \tilde{m}_{\hat{\theta}}(x) \rbrace
\biggr]^j$ for $ j=1,2,\ldots $. An application of the power series expansion of $ 1/(1-\bullet)$ applied to (12.16) and Lemma 12.1 yields

$\displaystyle \sum^{n}_{i=1} K\left({x-X_i\over h}\right) \{
Y_i-\tilde{m}_{\ha...
... \left({x-X_i\over h}\right) \{
Y_i-\tilde{m}_{\hat{\theta}}(x)\}^j \biggr ]=0.$

Inverting the above expansion, we have

$\displaystyle \lambda(x) = \bar{U}_2^{-1}(x) \bar{U}_1(x) + \tilde{{\scriptstyle \mathcal{O}}}_p\{ (nh)^{-1} \log^2(n)\}.$ (12.17)

From (12.15), Lemma 12.1 and the Taylor expansion of $ \log(1+\bullet)$ we get
$\displaystyle \ell\{\tilde{m}_{\hat{\theta}}(x)\}$ $\displaystyle =$ $\displaystyle -2 \log[L \{\tilde{m}_{\hat{\theta}}(x)\} n^{n}]$  
  $\displaystyle =$ $\displaystyle 2\sum_{i=1}^n \log [ 1 + \lambda(x) K\left({x-X_i\over h}\right) \{ Y_j-\tilde{m}_{\hat{\theta}}(x)\}]$  
  $\displaystyle =$ $\displaystyle 2 nh \lambda(x) \bar{U}_1 - nh \lambda^2(x) \bar{U}_2 +\tilde{{\scriptstyle \mathcal{O}}}_p\{(nh)^{-1/2} \log^3(n)\}$  

Inserting (12.17) in (12.18) yields

$\displaystyle \ell\{\tilde{m}_{\hat{\theta}}(x)\} = n h \bar{U}_2^{-1}(x) \bar{U}_1^2(x) + \tilde{{\scriptstyle \mathcal{O}}}_p \{ (nh)^{-1/2}\log^3(n)\}.$ (12.18)

For any $ x \in [0,1]$, let

$\displaystyle v(x;h) = h \int_{0}^1 K_h^2 (x-y) dy \textrm{ and }
b(x;h)=h \int_0^1 K_h (x-y) dy $

be the variance and the bias coefficient functions associated with the NW estimator, respectively, see Wand and Jones (1995). Let

$\displaystyle S_{I,h} = \{ x \in [0,1] \vert \min
\left(\vert x - 1\vert, \vert x\vert\right) >h \}.$

For $ h \rightarrow 0$, $ S_{I,h}$ converges to the set of interior points in $ [0,1]$. If $ x \in S_{I,h}$, we have $ v(x;h) \stackrel{\mathrm{def}}{=}\int K^2(x) dx$ and $ b(x;h)=1$. Define

$\displaystyle V(x;h) = \frac{v(x;h) \sigma^2(x)}{f(x) b^2(x;h)}.$

Clearly, $ V(x;h)/(nh)$ is the asymptotic variance of $ \hat{m}(x)$ when $ nh \to \infty$ which is one of the conditions we assumed.

It was shown by Chen et al. (2001), that

$\displaystyle \bar{U}_1(x)$ $\displaystyle =$ $\displaystyle n^{-1} \sum_{i=1}^n K_h(x-X_i) \{ Y_i - \tilde{m}_{\hat{\theta}}(x)\}$  
  $\displaystyle =$ $\displaystyle n^{-1} \sum_{i=1}^n K_h(x-X_i) \{ Y_i - m_{\theta}(X_i)\} + \tilde{{\mathcal{O}}}_p(n^{-1/2})$  
  $\displaystyle =$ $\displaystyle \hat{f}(x) \{\hat{m}(x) - \tilde{m}_{{\theta}}(x)\} + \tilde{{\mathcal{O}}}_p(n^{-1/2})$  
  $\displaystyle =$ $\displaystyle f(x) b(x;h) \{ \hat{m}(x) - \tilde{m}_{{\theta}}(x)\} + \tilde{{\mathcal{O}}}_p\{n^{-1/2} + (nh)^{-1} \log^2(n)\}.$  

In the same paper it is shown, that condition (iii) entails $ \sup_{x \in [0,1]} \vert\bar{U}_2(x) - f(x) v(x;h) \sigma^2(x)\vert ={\mathcal{O}}_p(h)$. These and (12.19) mean that
$\displaystyle \ell\{\tilde{m}_{\hat{\theta}}(x)\}$ $\displaystyle =$ $\displaystyle (nh) \bar{U}_2^{-1} \bar{U}_1^2 + \tilde{{\scriptstyle \mathcal{O}}}_p \{ (nh)^{-1/2}\log^3(n)\}$  
  $\displaystyle =$ $\displaystyle V^{-1}(x;h) \{ \hat{m}(x) - \tilde{m}_{{\theta}}(x)\}^2
+ \tilde{{\mathcal{O}}}\{(nh)^{-1} h \log^2(n)\}$ (12.19)

Therefore, $ \ell\{\tilde{m}_{\hat{\theta}}(x)\}$ is asymptotically equivalent to a studentized $ L_2$ distance between $ \tilde{m}_{\hat{\theta}}(x)$ and $ \hat{m}(x)$. It is this property that leads us to use $ \ell\{\tilde{m}_{\hat{\theta}}(x)\}$ as the basic building block in the construction of a global test statistic for distinction between $ \tilde{m}_{\hat{\theta}}$ and $ \hat{m}$ in the next section. The use of the empirical likelihood as a distance measure and its comparison with other distance measures have been discussed in Owen (1991) and Baggerly (1998).