There are several approaches to computing upper and lower bands, and . One approach is to use pointwise confidence intervals on a very fine grid of the observation interval. The level of these confidence intervals can be adjusted by the Bonferroni method in order to obtain uniform confidence bands. The gaps between the grid points can be bridged via smoothness conditions on the regression curve.
Another approach is to consider as a stochastic process (in ) and then to derive asymptotic Gaussian approximations to that process. The extreme value theory of Gaussian processes yields the level of these confidence bands.
A third approach is based on the bootstrap. By resampling one attempts to
approximate the distribution of
The approach by Hall and Titterington (1986b) is based on the discretization
method, that is, constructing simultaneous error bars at different locations.
They considered a fixed design regression model on the unit interval, that is,
Divide the region of interest into cells, the th cell containing those
observations such that
If the error variance is not known, Hall and Titterington (1986b) recommend
constructing an estimate of using differences of observations, that
is
Algorithm 4.3.1
Define the cells and compute the kernel estimator as in
4.3.15.
STEP 1.
Define
STEP 2.
Let the bound of variation be
STEP 3.
Define as in 4.3.18
STEP 4.
Draw and around .
The band is a level uniform confidence band for as was shown by Hall and Titterington (1986b). The discretization method was also employed by Knafl, Sacks and Ylvisaker (1985) and Knafl et al. (1984).
Figure 4.5 gives an example of a uniform confidence band for the radiocarbon data set (Suess 1980). The variables are those of radiocarbon age and tree-ring age, both measured in years before 1950 A.D. and thinned and rounded so as to achieve equal spacing of the tree-ring ages. For more background on this calibration problem see Scott, Baxter and Aitchison (1984). Altogether, 180 points were included. Hall and Titterington have chosen so that . The bands are constructed with and under the assumption of a single derivative with uniform bound on .
|
A different approach to handling the fluctuation of in between the
grid points could be based on the arc length of between two successive
knot points. Adrian Bowman suggested that instead of bounding derivatives of
, one could assume an upper bound on the arc length,
It was shown in Theorem 4.2.1 that the suitably scaled kernel smoother
has an asymptotic normal distribution,
In the setting of nonparametric regression use of this approach was made by
Johnston (1982) for the kernel weights (with known marginal density )
The basic idea of the approach taken by these authors is to standardize the
process
and to approximate it by a suitable
Gaussian process. More precisely, Johnston (1982) has shown that a suitably
rescaled version of
30pt to 30pt(A1) and are twice differentiable;
30pt to 30pt(A2) is a differentiable kernel with bounded support ;
30pt to 30pt(A3) ;
30pt to 30pt(A4) is strictly positive on
30pt to 30pt(A5) .
Then the maximal deviation between and over has the limiting distribution
From this theorem one can obtain approximate confidence bands for . Take the quartic kernel , for instance. For this kernel and it vanishes at the boundary of its support , so in the Theorem 4.3.. The following algorithm is designed for the quartic kernel.
Define from and
STEP 2.
Plot the asymptotic confidence bands and around .
pt
Note that this theorem does not involve a bias correction as in Algorithm 4.3.1. The bias term is suppressed by assuming that the bandwidth tends to zero slightly faster than the optimal rate . A bias term could be included, but it has the rather complicated form 4.2.8; see Theorem 4.2.1. A means of automatically correcting for the bias is presented subsequently in the wild bootstrapping algorithm.
Figure 4.6 shows an application of Algorithm 4.3.2, a uniform confidence band for the expenditure Engel curve for food with .
|
The idea of bootstrap bands is to approximate the distribution of
REPEAT
b=b+1
STEP 1.
Sample from the data .
STEP 2.
Construct a kernel smooth from the bootstrap sample and define .
UNTIL b=B (=number of bootstrap samples).
STEP 3.
Define as the quantile of the B bootstrap deviations . By analogy, define as the quantile.
STEP 4.
Draw the interval around every point of the observation interval.
pt The above algorithm is extremely computer intensive since on a fine grid of points the statistic has to be computed times. A computationally less intensive procedure is to consider not a band but rather a collection of error bars based on bootstrapping. This method is based on a pointwise bootstrap approximation to the distribution of . In the following I describe the approach taken by Härdle and Marron (1988), which uses the wild bootstrap technique to construct pointwise confidence intervals. How can these pointwise confidence intervals be modified in order to cover the true curve with simultaneous coverage probability ?
A straightforward way of extending bootstrap pointwise intervals to simultaneous confidence intervals is by applying the Bonferroni method. A drawback to the Bonferroni approach is that the resulting intervals will quite often be too long. The reason is that this method does not make use of the substantial positive correlation of the curve estimates at nearby points.
A more direct approach to finding simultaneous error bars is to consider the simultaneous coverage on pointwise error bars, and then adjust the pointwise level to give a simultaneous coverage probability of . Fisher (1987, p. 394) called this a ``confidence ribbon'' since the pointwise confidence intervals are extended until they have the desired simultaneous coverage probability of . A general framework, which includes both the Bonferroni and direct methods, can be formulated by thinking in terms of groups of grid points.
First, partition into groups as in the Hall and Titterington approach, the set of locations where error bars are to be computed. Suppose the groups are indexed by and the locations within each group are denoted by . The groups should be chosen so that for each the values in each group are within of each other. In the one-dimensional case this is easily accomplished by dividing the axis into intervals of length roughly .
In order to define a bootstrap procedure that takes advantage of this positive
correlation consider a set of grid points ,
that
have the same asymptotic location (not depending on ) relative to
some reference point in each group . Define
Now, within each group use the wild bootstrap replications to approximate
the joint distribution of
Recall Theorem 4.2.2 for the wild bootstrap approximation to this
distribution. There is was shown that
For each , , define the interval to have endpoints which are the and the quantiles of the distribution. Then define to be the empirical simultaneous size of the confidence intervals, that is, the proportion of curves which lie outside at least one of the intervals in the group . Next find the value of , denoted by , which makes . The resulting intervals within each group will then have confidence coefficient . Hence, by the Bonferroni bound the entire collection of intervals , , , will simultaneously contain at least of the distribution of about . Thus the intervals will be simultaneous confidence intervals with confidence coefficient at least . The result of this process is summarized as the following theorem.
As a practical method for finding for each group we suggest the following ``halving'' approach. In particular, first try , and calculate . If the result is more than , then try , otherwise next try . Continue this halving approach until neighboring (since only finitely many bootstrap replications are made, there is only a finite grid of possible s available) values and are found so that . Finally, take a weighted average of the and the intervals where the weights are and , respectively.
Note that Theorem 4.3.2 contains, as a special case, the asymptotic validity of both the Bonferroni and the direct simultaneous error bars. Bonferroni is the special case , and the direct method is where .
The wild bootstrap simultaneous error bars are constructed according to the following algorithm.
REPEAT
b=b+1
STEP 1.
Sample from the two-point distribution
4.2.9, where
STEP 2.
Construct wild bootstrap observations
UNTIL b=B=number of bootstrap samples.
STEP 3.
Calculate as follows.
First try , and calculate .
If the result is more than , then try , otherwise next try .
Continue this halving approach until neighboring values and are found so that .
Finally, take a weighted average of the and the intervals
where the weights are
Define
STEP 4.
Draw the interval around at the distinct points .
This wild bootstrap technique was applied to the potato versus net income example. Figure 4.7 displays the error bars for this data.
|
To study the practical difference between the various types of error bars,
Härdle and Marron (1988) considered the distribution of
at
a grid of values for some specific examples. They chose the underlying
curve to be of linear form, with an added bell shaped hump,
|
The marginal distribution of is , and the conditional distribution of is , for . For each of these four distributions 200 observations were generated. Figure 4.8 shows one realization from each of the four settings. Figure 4.8 also shows , as calculated from the crosses, for each setting, together with a plot of the kernel function at the bottom which shows the effective amount of local averaging being done in each case. The bandwidth for these is , the optimal bandwidth as in Härdle and Marron (1985b), where the weight function in that paper was taken to be the indicator function of . As expected, more smoothing is required when the error variance is larger.
To study the differences among the various error bars, for each setting, 500 pseudo data sets were generated. Then kernel estimates were calculated, at the points using a standard normal density as kernel. The bandwidth was chosen to be . Figure 4.9 shows, for the distribution, overlayed with error bars whose endpoints are various types of quantiles of the distribution of . The centers of the error bars are at the means of these distributions, and show clearly the bias that is inherent in nonparametric regression estimation. Note, in particular, how substantial bias is caused by both the curvature of near the hump, and by the curvature of near . The bars in Figure 4.9a are 80% pointwise error bars. In Figure 4.9b they are 80% simultaneous bars. In Figure 4.9c, the values were split up into the neighborhoods , , , and the neighborhood method of Theorem 4.3.2 was used. Figure 4.9d shows the completely Bonferroni 80% error bars.
|
For easy comparison of the lengths of these intervals, consider Figure 4.10. This shows, for the same values, the lengths of the various bars in Figure 4.9. Of course, these bars are shorter near the center, which reflects the fact that there is more data there, so the estimates are more accurate. As expected, the lengths increase from pointwise, to actual simultaneous, to neighborhood, to Bonferroni bars. Also note that, as stated above, the difference between the actual simultaneous bars and the neighborhood simultaneous bars is really quite small, whereas the pointwise bars are a lot narrower.
|
Exercises
4.3.1 Refine Algorithm 4.3.2 to allow a kernel with .
4.3.2 Use the WARPing algorithm of Exercise 4.2.3 on the wild bootstrap smoothing to program Algorithm 4.3.4 for simultaneous error bars.
4.3.3 Compare the naive bootstrap bands with the wild bootstrap error bars. Where do you see the essential difference?
[Hint: Consider the bias of .]
4.3.4 Use Algorithm 4.3.2 to find a smooth uniform confidence band for the motorcycle data set.
4.3.5 Could you translate Theorem 4.3.1 into the world of - smoothing using the equivalence statements of Section 3.11?
An important issue is how to fine tune the choice of the pilot bandwidth . Though it is true that the bootstrap works (in the sense of giving asymptotically correct coverage probabilities) with a rather crude choice of , it is intuitively clear that specification of will play a role in how well it works for finite samples. Since the main role of the pilot smooth is to provide a correct adjustment for the bias, we use the goal of bias estimation as a criterion. We think theoretical analysis of the above type will be more straightforward than allowing the to increase, which provides further motivation for considering this general grouping framework.
In particular, recall that the bias in the estimation of by
is given by
An immediate consequence of Theorem 4.3.3 is that the rate of convergence of for should be . This makes precise the above intuition which indicated that should be slightly oversmoothed. In addition, under these assumptions reasonable choices of will be of the order . Hence, Theorem 4.3.3 shows once again that should tend to zero more slowly than . A proof of Theorem 4.3.3 is contained in Härdle and Marron (1988).