We have still not found a way to select the bandwidth that is both applicable in practice as well as theoretically desirable. In the following two subsections we will introduce two of the most frequently used methods of bandwidth selection, the plug-in method and the method of cross-validation. The treatments of both methods will describe one representative of each method. In the case of the plug-in method we will focus on the ``quick & dirty" plug-in method introduced by Silverman. With regard to cross-validation we will focus on least squares cross-validation. For more complete treatments of plug-in and cross-validation methods of bandwidth selection, see e.g. Härdle (1991) and Park & Turlach (1992).
Generally speaking, plug-in methods derive their name from their
underlying principle: if you have an expression involving an
unknown parameter, replace the unknown parameter with an estimate.
Take (3.19) as an example. The
expression on the right hand side involves the unknown quantity
.
Suppose we knew or assumed that the unknown density
belongs to the family of normal distributions with mean and
variance
then we have
(3.21) | |||
(3.22) |
You may object by referring to what we said at the beginning of Chapter 2. Isn't assuming normality of just the opposite of the philosophy of nonparametric density estimation? Yes, indeed. If we knew that had a normal distribution then we could estimate its density much easier and more efficiently if we simply estimate with the sample mean and with the sample variance, and plug these estimates into the formula of the normal density.
What we achieved by working under the normality assumption is an explicit, applicable formula for bandwidth selection. In practice, we do not know whether is normally distributed. If it is, then in (3.24) gives the optimal bandwidth. If not, then in (3.24) will give a bandwidth not too far from the optimum if the distribution of is not too different from the normal distribution (the ``reference distribution''). That's why we refer to (3.24) as a rule-of-thumb bandwidth that will give reasonable results for all distributions that are unimodal, fairly symmetric and do not have tails that are too fat.
A practical problem with the rule-of-thumb bandwidth is its sensitivity to outliers. A single outlier may cause a too large estimate of and hence implies a too large bandwidth. A more robust estimator is obtained from the interquartile range
(3.25) |
(3.26) | |||
As mentioned earlier, we will focus on least squares cross-validation. To get started, consider an alternative distance measure between and , the integrated squared error ():
If we look at this term more closely, we observe that is the expected value of , where the expectation is calculated w.r.t. an independent random variable . We can estimate this expected value by
Let us repeat the formula of the integrated squared error (), the criterion function we seek to minimize with respect to :
(3.34) |
(3.35) |
A nice feature of the cross-validation method is that the selected bandwidth automatically adapts to the smoothness of . This is in contrast to plug-in methods like Silverman's rule-of-thumb or the refined methods presented in Subsection 3.3.3. Moreover, the cross-validation principle can analogously be applied to other density estimators (different from the kernel method). We will also see these advantages later in the context of regression function estimation.
Finally, it can be shown that the bandwidth selected by minimizing fulfills an optimality property. Denote the bandwidth selected by the cross-validation criterion by and assume that the density is a bounded function. Stone (1984) proved that this bandwidth is asymptotically optimal in the following sense
With Silverman's rule-of-thumb we introduced in Subsection 3.3.1 the simplest possible plug-in bandwidth. Recall that essentially we assumed a normal density for a simple calculation of . This procedure yields a relatively good estimate of the optimal bandwidth if the true density function is nearly normal. However, if this is not the case (as for multimodal densities) Silverman's rule-of-thumb will fail dramatically. A natural refinement consists of using nonparametric estimates for as well. A further refinement is the use of a better approximation to . The following approaches apply these ideas.
In contrast to the cross-validation method plug-in bandwidth selectors try to find a bandwidth that minimizes . This means we are looking at another optimality criteria than these from the previous section.
A common method of assessing the quality of a selected bandwidth is to compare it with , the optimal bandwidth, in relative value. We say that the convergence of to is of order if
Park & Marron (1990) proposed to estimate in by using a nonparametric estimate of and taking the second derivative from this estimate. Suppose we use a bandwidth here, then the second derivative of can be computed as
In Subsection 3.3.2 we introduced
Most of the other existing bandwidth choice methods attempt to minimize . A condition analogous to (3.41) for is usually much more complicated to prove. Hence, most of the literature is concerned with investigating the relative rate of convergence of a selected bandwidth to . Fan & Marron (1992) derived a Fisher-type lower bound for the relative errors of a bandwidth selector. It is given by
The biased cross validation method of Hall et al. (1991) has this property, however, this selector is only superior for very large samples. Another -convergent method is the smoothed cross-validation method but this selector pays with a larger asymptotic variance.
In summary: one best method does not exist! Moreover,
even asymptotically
optimal criteria may show bad behavior in simulations. See the
bibliographic notes for references on such simulation studies.
As a consequence, we recommend determining bandwidths by
different selection methods and comparing the resulting
density estimates.