Contrary to the treatment of the histogram in statistics textbooks we have shown that the histogram is more than just a convenient tool for giving a graphical representation of an empirical frequency distribution. It is a serious and widely used method for estimating an unknown pdf. Yet, the histogram has some shortcomings and hopefully this chapter will persuade you that the method of kernel density estimation is in many respects preferable to the histogram.
Recall that the shape of the histogram is governed by two parameters: the binwidth , and the origin of the bin grid, . We have seen that the averaged shifted histogram is a slick way to free the histogram from the dependence on the choice of an origin. You may recall that we have not had similar success in providing a convincing and applicable rule for choosing the binwidth . There is no choice-of-an-origin problem in kernel density estimation but you will soon discover that we will run into the binwidth-selection problem again. Hopefully, the second time around we will be able to give a better answer to this challenge.
Even if the ASH seemed to solve the choice-of-an-origin problem the histogram retains some undesirable properties:
Recall that our derivation of the histogram was based on the intuitively plausible idea that
(3.3) |
Then we can write (3.2) as
Note from (3.5) that for each observation that falls into the interval the indicator function takes on the value , and we get a contribution to our frequency count. But each contribution is weighted equally (namely by a factor of one), no matter how close the observation is to (provided that it is within of ). Maybe we should give more weight to contributions from observations very close to than to those coming from observations that are more distant.
For instance, consider the formula
Kernel | |
Uniform | |
Triangle | |
Epanechnikov | |
Quartic (Biweight) | |
Triweight | |
Gaussian | |
Cosine |
If you look at (3.6) it will be clear that one could think of the procedure as a slick way of counting the number of observations that fall into the interval around , where contributions from that are close to are weighted more than those that are further away. The latter property is shared by the Epanechnikov kernel with many other kernels, some of which we introduce in Table 3.1. Figure 3.1 displays some of the kernel functions.
|
Now we can give the following general form of the kernel density estimator of a probability density , based on a random sample from :
Similar to the histogram, controls the smoothness of the estimate and the choice of is a crucial problem. Figure 3.2 shows density estimates for the stock returns data using the Quartic kernel and different bandwidths.
|
Again, it is hard to determine which value of provides the optimal degree of smoothness without some formal criterion at hand. This problem will be handled further below, especially in the Section 3.3.
Kernel functions are usually probability density functions, i.e. they integrate to one and for all in the domain of . An immediate consequence of is , i.e. the kernel density estimator is a pdf, too. Moreover, will inherit all the continuity and differentiability properties of . For instance, if is times continuously differentiable then the same will hold true for . On a more intuitive level this ``inheritance property'' of is reflected in the smoothness of its graph. Consider Figure 3.3 where, for the same data set (stock returns) and a given value of , kernel density estimates have been graphed using different kernel functions.
Note how the estimate based on the Uniform kernel (right) reflects the box shape of the underlying kernel function with its ragged behavior. The estimate that employed the smooth Quartic kernel function (left), on the other hand, gives a smooth and continuous picture.
Differences are not confined to estimates that are based on kernel functions that are continuous or non-continuous. Even among estimates based on continuous kernel functions there are considerable differences in smoothness (for the same value of ) as you can confirm by looking at Figure 3.4. Here, for density estimates are graphed for income data from the Family Expenditure Survey, using the Epanechnikov kernel (left) and the Triweight kernel (right), respectively. There is quite a difference in the smoothness of the graphs of the two estimates.
You might wonder how we will ever solve this dilemma: on one hand we will be trying to find an optimal bandwidth but obviously a given value of does not guarantee the same degree of smoothness if used with different kernel functions. We will come back to this problem in Section 3.4.2.
Before we turn to the statistical properties of kernel density estimators let us present another view on kernel density estimation that provides both further motivation as well as insight into how the procedure works. Look at Figure 3.5 where the kernel density estimate for an artificial data set is shown along with individual rescaled kernel functions.
What do we mean by a rescaled kernel function? The rescaled kernel function is simply