3.5 Confidence Intervals and Confidence Bands

In order to derive confidence intervals or confidence bands for $ \widehat{f}_{h}(x)$ we have to know its sampling distribution. The distribution for finite sample sizes is not known but the following result concerning the asymptotic distribution of $ \widehat{f}_{h}(x)$ can be derived. Suppose that $ f''$ exists and $ h=cn^{-1/5}$. Then

$\displaystyle n^{2/5} \left\{\widehat{f}_{h}(x)-f(x)\right\} \mathrel{\mathop{\...
...K) }_{b_x}, \underbrace{\frac{1}{c}f(x)\Vert K \Vert^{2}_{2} }_{v^2_x} \right),$ (3.49)

where $ N(b_x,v^2_x)$ denotes the normal distribution with mean $ b_x$ and variance $ v^2_x$. Denoting the $ (1-\frac{\alpha}{2})$ quantile of the standard normal distribution with $ z_{1-\frac{\alpha}{2}}$ and probabilities with $ P$, we get
$\displaystyle 1-\alpha$ $\displaystyle \approx$ $\displaystyle P\left(b_x-z_{1-\frac{\alpha}{2}}v_x\leq
n^{2/5}\{\widehat{f}_{h}(x)-f(x)\}\leq
b_x+z_{1-\frac{\alpha}{2}}v_x\right)$  
  $\displaystyle =$ $\displaystyle P\Big(\widehat{f}_{h}(x)-n^{-2/5} \{b_x+z_{1-\frac{\alpha}{2}}v_x\}$  
    $\displaystyle \quad\quad
\leq f(x) \leq \widehat{f}_{h}(x)-n^{-2/5} \{b_x-z_{1-\frac{\alpha}{2}}
v_x\}\Big).$  

Employing the relation $ h=cn^{-1/5}$ we get an asymptotic confidence interval for $ f(x)$:
$\displaystyle \Bigg[\;\widehat{f}_{h}(x)-\frac{h^{2}}{2}f''(x)\mu_{2}(K)
-z_{1-\frac{\alpha}{2}}\sqrt{\frac{f(x)\Vert K
\Vert _{2}^{2}}{nh}}\, ,\quad\quad$     (3.50)
$\displaystyle \quad\quad
\widehat{f}_{h}(x)-\frac{h^{2}}{2}f''(x)\mu_{2}(K)
+z_{1-\frac{\alpha}{2}}\sqrt{\frac{f(x)\Vert K
\Vert _{2}^{2}}{nh}}\;\Bigg].$      

Unfortunately, the interval boundaries still depend on $ f(x)$ and $ f''(x)$. If $ h$ is small relative to $ n^{-1/5}$ we can neglect the second term of each boundary. Replacing $ f(x)$ with $ \widehat{f}_{h}(x)$ gives an approximate confidence interval that is applicable in practice

$\displaystyle \left[\;\widehat{f}_{h}(x)-z_{1-\frac{\alpha}{2}}\sqrt{\frac{\wid...
...alpha}{2}} \sqrt{\frac{\widehat{f}_{h}(x)\Vert K \Vert _{2}^{2}}{nh}}\;\right].$ (3.51)

Note that this is a confidence interval for $ f(x)$ and not the entire density $ f$. Confidence bands for $ f$ have only been derived under some rather restrictive assumptions. Suppose that $ f$ is a density on $ [0,1]$ and given that certain regularity conditions are satisfied then for $ h=n^{-\delta}$, $ \delta
\in (\frac{1}{5},\frac{1}{2})$, and for all $ x \in [0,1]$ the following formula has been derived by Bickel & Rosenblatt (1973)

$\displaystyle {\lim_{n \to \infty}
P\Bigg(\widehat{f}_{h}(x)-\left\{\frac{\wide...
...t\}^{1/2}\left\{\frac{z}{(2\delta
\log{n})^{1/2}}+d_{n}\right\}^{1/2}\leq f(x)}$
    $\displaystyle \quad\quad\leq \widehat{f}_{h}(x)+\left\{\frac{\widehat{f}_{h}(x)...
...ight\}^{1/2} \left\{\frac{z}{(2\delta
\log{n})^{1/2}}+d_{n}\right\}^{1/2}\Bigg)$  
  $\displaystyle =$ $\displaystyle \exp\{-2\exp(-z)\} ,$ (3.52)

with

$\displaystyle d_{n}=(2\delta\log{n})^{1/2}+(2\delta\log{n})^{-1/2}
\log{\left(\frac{1}{2\pi}\frac{\Vert K'\Vert _{2}}{\Vert K
\Vert _{2}}\right)}.$

A confidence band for a given significance level $ \alpha$ can be found by searching the value of $ z$ that satisfies

$\displaystyle \exp\{-2\exp(-z)\}=1-\alpha.$

For instance, if we take $ \alpha=0.05$ then $ z\approx 3.663$.

When using nonparametric density estimators in practice, another important question arises: Can we find a parametric estimate that describes the data in a sufficiently satisfactory manner? That is, given a specific sample, could we justify the use of a parametric density function? Note that using parametric estimates is computationally far less intensive. Moreover, many important properties, e.g. the moments and derivatives, of parametric density functions are usually well known so they can easily be manipulated for analytical purposes. To verify whether a parametric density function describes the data accurately enough in a statistical sense, we can make use of confidence bands.

Figure: Parametric (lognormal, thin line) versus nonparametric density estimate for average hourly earnings (Quartic kernel, $ h=5$, thick solid line)
\includegraphics[width=0.03\defepswidth]{quantlet.ps}SPMcps85dist
\includegraphics[width=1.2\defpicwidth]{SPMcps85distA.ps}

EXAMPLE 3.2  
For an example, consider Figure 3.8. Here, we have used a sample of 534 randomly selected U.S. workers Berndt (1991) taken from the May 1985 Current Population Survey (CPS). The value of each worker's average hourly earnings is marked by a $ +$ on the abscissa. The nonparametrically estimated density of these values is shown by the solid line. Using the quartic kernel, the bandwidth was set to $ h=5$.

Now, as with many size distributions, the nonparametric density estimate closely resembles the lognormal distribution. Thus, we calculated the estimated parameters of the lognormal distribution,

$\displaystyle \widehat{\mu}=\frac{1}{n}\sum_{i=1}^{n}\log(Y_i), \quad
\widehat{\sigma}=\sqrt{\frac{1}{n}\sum_{i=1}^{n}(\log(Y_i)
-\widehat{\mu})^2}$

to fit a parametric estimate to the data. The parametric estimate is represented by the dotted line in Figure 3.8$ \Box$

Note that the nonparametrically estimated density is considerably flatter than its parametric counterpart. Now, let us see whether the sample provides some justification for using the lognormal density. To this end, we computed the 95% confidence bands around our nonparametric estimate, as shown in Figure 3.9.

Figure: Confidence intervals versus parametric (lognormal, thin line) and kernel (thick solid line) density estimates for average hourly earnings
\includegraphics[width=0.03\defepswidth]{quantlet.ps}SPMcps85dist
\includegraphics[width=1.2\defpicwidth]{SPMcps85distB.ps}

We found that in the neighborhood of the mode, the parametric density exceeds the upper limit of the confidence band. Hence, in a strict sense the data reject the lognormal distribution as the ``true" distribution of average hourly earnings (at the 5% level). Yet, it is quite obvious from the picture that the lognormal density captures the shape of the nonparametrically estimated density quite well. Hence, if all we are interested in are qualitative features of the underlying distribution like skewness or unimodality the lognormal distribution seems to work sufficiently well.

Let us remark that checking whether the parametric density estimate does not exceed the confidence bands is a very conservative test for its correct specification. In contrast to fully parametric approaches, it is often possible to find a nonparametric test yielding better rates of convergence than the nonparametric density estimate. Moreover, there exist more powerful model checks than looking at confidence bands. Therefore, as nonparametric testing is an exhaustive topic on its own, we will not discuss nonparametric tests in detail in this book. Instead, we restrict ourselves to the presentation of the main ideas and for further details refer to the bibliographic notes.

Figure: Confidence bands (solid lines) versus confidence intervals (dashed lines) for average hourly earnings
\includegraphics[width=0.03\defepswidth]{quantlet.ps}SPMcps85dist
\includegraphics[width=1.2\defpicwidth]{SPMcps85distC.ps}

In the problem considered in the preceeding paragraph, we were concerned with how well the lognormal distribution works as an estimate of the true density for the entire range of possible values of average hourly earnings. This global check of the adequacy of the lognormal was provided by comparing it with confidence bands around the nonparametrically estimated density. Suppose that instead we are merely interested in how well the lognormal fits at a single given value of average hourly earnings. Then, the proper comparison confronts the lognormal density at this point with a confidence interval around the nonparametric estimate. As confidence intervals, by construction, only refer to a single point, they are narrower (at this point) than a confidence band which is supposed to hold simultaneously at many points. Figure 3.10 shows both the asymptotic confidence bands (solid line) and the asymptotic confidence intervals (broken line) of our kernel estimate. Both are computed for the same 95% significance level. Note that, as expected, the confidence intervals are narrower than the confidence bands.