The goal of density estimation is to
approximate the probability density function
of a random variable
.
Assume we have
independent observations
from
the random variable
.
The kernel density estimator
for the estimation
of the density value
at point
is defined as
In order to illustrate how the kernel density estimator works,
let us rewrite equation (6.1) for the Uniform kernel:
![]() |
![]() |
![]() |
|
![]() |
![]() |
The Uniform kernel implies that all observations
in the neighborhood of
are given the same weight, since
observations close to
carry also information about
.
It is reasonable to attribute more weights to observations
that are near to
and less weight to distant observations. All other kernels
in Table 6.1 have this property. Figure 6.1
visualizes the construction of a kernel density estimate of 10 data points
(
red +
) using the Gaussian kernel.
To understand the structure of the
smoother
library, we need to explain
shortly how kernel estimates can be computed in practice.
Consider the equation (6.1) again. To obtain the kernel
density estimates for all observations
we must
compute
For exploratory purposes, i.e. for graphing the density
estimate it is not necessary to evaluate the
at
all observations
. Instead, the estimate
can be computed for example on an equidistant grid
:
The number of
evaluations of the
kernel function
is however
time consuming if the sample size is large. An alternative
and faster way is to approximate the kernel density estimate
by the WARPing method (Härdle and Scott; 1992).
The basic idea of WARPing (Weighted Average of Rounded Points)
is the ``binning'' of the data in bins of length starting
at the origin
. Each observation point is then replaced by the
bincenter of the corresponding bin (that means in fact that
each point is rounded to the precision given by
). A usual
choice for
is to use
or
.
In the latter case, the effective sample size
for the computation
(the number of nonempty bins) can be at most 101.
For the WARPing method, the kernel function needs to
be evaluated only at
,
, where
is the number of bins which contains the
support of the kernel function.
The calculation of the estimated density reduces then to
To have a choice between both the exact and the WARPing computation of kernel estimates and related functions, the smoother library offers most functionality in two functions:
Functionality | Exact | WARPing |
density estimate | ![]() |
![]() |
density confidence intervals | ![]() |
![]() |
density confidence bands | ![]() |
![]() |
density bandwidth selection | - | ![]() |
The easiest way to compute kernel density estimates is the function
denest
. This function is based on the WARPing method which
makes the computation very fast.
Let us apply the
denest
routine to the
nicfoo
data which contain observations on household netincome in the first
column (see Data Sets (B.1)).
denest
requires the data as a minimum input. Optionally,
the bandwidth h, the kernel function K and the
discretization parameter d which is used for the WARPing can
be specified. For example,
netinc=read("nicfoo") netinc=netinc[,1] h=(max(netinc)-min(netinc))*0.15 fh=denest(netinc,h)
To display the estimated function you may use these lines of code:
fh=setmask(fh,"line") plot(fh) tl="Density Estimate" xl="netincome" yl="density fh" setgopt(plotdisplay,1,1,"title",tl,"xlabel",xl,"ylabel",yl)The resulting density is plotted in Figure 6.2.
As an alternative to
denest
, the function
denxest
can be used for obtaining a density estimate by exact computation.
As with
denest
the data are the minimum input. Optionally,
the bandwidth h, the kernel function K and a grid v
for the computation can be specified. The following code computes
the density estimate for all observations:
h=(max(netinc)-min(netinc))*0.15 fh=denxest(netinc,h)To compute the density on a grid of 30 equidistant points within the range of x, the previous example can be modified as follows:
h=(max(netinc)-min(netinc))*0.15 v=grid( min(netinc), (max(netinc)-min(netinc))/29, 30) fh=denxest(netinc,h,"qua",v)
Finally, let us calculate the density estimate for a family of different bandwidths in order to see the dependence on the bandwidth parameter. We modify our first code in the following way:
hfac=(max(netinc)-min(netinc)) fh =setmask( denest(netinc,0.15*hfac) ,"line") fh1=setmask( denest(netinc,0.05*hfac) ,"line","thin","magenta") fh2=setmask( denest(netinc,0.10*hfac) ,"line","thin","red") fh3=setmask( denest(netinc,0.20*hfac) ,"line","thin","blue") fh4=setmask( denest(netinc,0.25*hfac) ,"line","thin","green") plot(fh1,fh2,fh3,fh4,fh) tl="Family of Density Estimates" xl="netincome" yl="density fh" setgopt(plotdisplay,1,1,"title",tl,"xlabel",xl,"ylabel",yl)
|
Kernel density estimation requires two parameters: the kernel
function and the bandwidth parameter
. In practice, the choice
of the kernel is not nearly as important as the choice of the kernel.
The theoretical background of this observation is that kernel
functions can be rescaled such that the difference between two kernel
density estimates using two different kernels is almost negligible
(Marron and Nolan; 1988).
Instead of rescaling kernel functions, we can also rescale the bandwidths.
This rescaling works as follows:
Suppose we have estimated an unknown density using some kernel
and bandwidth
and we are considering estimating
with
a different kernel, say
. Then the appropriate bandwidth
to use with
is
XploRe
offers the function
canker
to do this recomputation of
bandwidths:
hqua=canker(1.0,"gau","qua") hquacomputes the bandwidth
Contents of hqua [1,] 2.6226that can be used with the Quartic kernel "qua" in order to find the kernel density estimate that is equivalent to using the Gaussian kernel "gau" and bandwidth 1.0.
Now we can consider the problem of bandwidth selection. The optimal bandwidth for a kernel density estimate is typically calculated on the basis of an estimate for the integrated squared error
Plug-in approaches attempt to estimate .
Under the asymptotic conditions
and
,
it is possible to approximate
by
If a different kernel function is used, the bandwidth
can be rescaled by using equation
(6.2). For example, the rule-of-thumb
bandwidth for the Quartic kernel computes as
Of course,
is a very crude estimate, in particular
if the true density
is far from looking like a normal density.
Refined plug-in methods such as the Park and Marron plug-in
and the Sheater and Jones plug-in use nonparametric
estimates for
.
The quantlet
denbwsel
needs the data vector as input:
netinc=read("nicfoo") netinc=netinc[,1] tmp=denbwsel(netinc)
Choose for instance the Least Squares CV item to use
.
Here, a graphical display appears which shows the
criterion
in the upper left, the chosen optimal bandwidth in the upper right,
the resulting kernel density estimate in the lower left, and some
information about the search grid and the kernel in the lower right.
This graphical display is shown in Figure 6.4.
You can play around now with the different bandwidth selectors. Try out the Park and Marron plug-in for example. Here, in contrast to the cross-validation methods, the aim is to find the root of the bandwidth selection criterion. The graphical display for the Park and Marron selector is shown in Figure 6.5.
During the use of
denbwsel
, all currently obtained results
are displayed in the output window.
Note that for some selectors, it will be necessary to increase the
search grid for
. This can be done by using the menu item
(Change h range or K).
It is difficult to give a general decision of which bandwidth selector
is the best among all. As mentioned above, Silverman's rule of thumb
works well when the true density is approximately normal. It may
oversmooth for bimodal or multimodal densities.
A nice feature of the cross-validation methods is that the selected
bandwidth automatically adapts to the smoothness of . From a theoretical
point of view, the
and
bandwidth selectors have the
optimal asymptotic rate of convergence, but need either a very large
sample size or pay with a large variance.
As a consequence, we recommend determining bandwidths by
different selection methods and comparing the resulting
density estimates.
For detailed overviews on plug-in and cross-validation methods of bandwidth selection, see Härdle (1991), Park and Turlach (1992), and Turlach (1993). Practical experiences are available from the simulation studies in Marron (1989), Jones et al. (1992), Park and Turlach (1992), and Cao et al. (1992).
|
To derive confidence intervals or confidence bands for
one has to know its sampling distribution. The
following result about the asymptotic distribution of
is known from the literature (Härdle; 1991, p. 62).
Suppose that
exists and
. Then
![]() |
(6.7) |
![]() |
(6.8) |
![]() |
(6.9) |
The functions
denci
and
denxci
compute pointwise confidence intervals for a grid
of points (
denci
using WARPing) or all
observations (
denxci
using exact computation).
The following quantlet code computes the confidence intervals
for bandwidth 0.2 (Gaussian kernel) and significance
level
. For graphical exploration, the
pointwise intervals are drawn as curves and displayed
in a plot.
The resulting graphics is shown in Figure 6.6.
{fh,fl,fu}=denci(netinc,0.2,0.05,"gau") fh=setmask(fh,"line","blue") fl=setmask(fl,"line","blue","thin","dashed") fu=setmask(fu,"line","blue","thin","dashed") plot(fl,fu,fh) setgopt(plotdisplay,1,1,"title","Density Confidence Intervals")
Uniform confidence bands for can only be derived under
some rather restrictive assumptions, see
Härdle (1991, p. 65).
Suppose that
is a density on
and
,
. Then the following formula holds under
some regularity for all
(Bickel and Rosenblatt; 1973):
![]() |
(6.10) |
{fh,fl,fu}=dencb(netinc,0.2,0.05,"gau") fh=setmask(fh,"line","blue") fl=setmask(fl,"line","blue","thin","dashed") fu=setmask(fu,"line","blue","thin","dashed") plot(fl,fu,fh) setgopt(plotdisplay,1,1,"title","Density Confidence Bands")
The resulting graph is shown in Figure 6.7.
It can be clearly seen that the confidence bands are wider
than the confidence intervals (Figure 6.6).
This is explained by the fact
that here the whole function
lies within the bands
with probability 95%.