A smoothing parameter that is selected by one of the
previously described methods optimizes a global error criterion.
Such a ``global" choice need not necessarily be optimal
for the estimation of the regression curve at one particular point,
as the trivial inequality
We have already seen that the so-called wild bootstrap method (Section 4.2) allows us to approximate the distribution of . In the following, though, I would like to present a slightly different bootstrap method in the simpler setting of i.i.d. error terms. This simpler setting has the advantage that resampling can be done from the whole set of observed residuals. Let and . The stochastics of the observations are completely determined by the observation error. Resampling should therefore be performed with the estimated residuals,
In the bootstrap, any occurrence of is replaced by
, and therefore
The MSE
can then be estimated by
This choice of
local adaptive bandwidth is asymptotically optimal in the sense of
Theorem 5.1.1 as Härdle and Bowman (1988) show; that is,
This adaptive choice of is illustrated in Figure 5.15, which displays some data simulated by adding a normally distributed error, with standard deviation 0.1, to the curve evaluated at , . Cross-validation was used to select a good global smoothing parameter and the resulting estimate of the regression function shows the problems caused by bias at the peaks and troughs, where is high.
|
To see what local smoothing parameters have been actually used consider Figure 5.16. This figure plots the local smoothing parameters obtained by minimizing the bootstrap estimate as a function of .
|
For comparison, the asymptotically optimal local smoothing parameters
The so-called supersmoother proposed by Friedman (1984) is based on local linear - fits in a variable neighborhood of the estimation point . ``Local cross-validation" is applied to estimate the optimal span as a function of the predictor variable. The algorithm is based on the updating formulas as described in Section 3.4. It is therefore highly computationally efficient.
The name ``supersmoother"
stems from the fact that it uses optimizing resampling techniques at a
minimum of computational effort.
The basic idea of the supersmoother is the same as that for the bootstrap
smoother. Both methods attempt to minimize
the local mean squared error. The supersmoother is constructed from three
initial smooths, the tweeter, midrange and woofer. They are
intended to reproduce the three main parts of the frequency spectrum of
and are defined by - smooths with , and ,
respectively. Next, the cross-validated residuals
Since a smooth based on this span sequence would, in practice, have an unnecessarily high variance, smoothing the values against is recommended using the resulting smooth to select the best span values, . In a further step the span values are smoothed against (with a midrange smoother). The result is an estimated span for each observation with a value between the tweeter and the woofer values.
The resulting curve estimate,
the supersmoother, is obtained by interpolating between the two (out of
the three) smoothers with closest span values.
Figure 5.17 shows pairs
with
uniform on ,
|
Figure 5.18 shows the estimated optimal span as a function of . In the ``low-noise high-curvature" region the tweeter span is proposed. In the remaining regions a span value about the midrange is suggested.
|
When is very smooth, more accurate curve estimates can
be obtained by biasing the smoothing parameter toward larger span values.
One way of doing this would be to use a smoothing parameter selection
criterion that penalizes more than the ``no smoothing" point . For
example, Rice's (Figure 5.10) would bias the estimator toward
smoother curves. Friedman (1984) proposed parameterizing this ``selection
bias" for enhancing the bass component of the smoother output. For this
purpose, introduce the span
Exercises
5.3.1Prove that the term
is an approximation of lower order
than to
,
5.3.2What is the difference between the method here in Section 5.3 and the wild bootstrap? Can you prove Theorem 5.3.1 without the bias estimate?
[Hint: Use an oversmooth resampling mean to construct the
boostrap observations,
. The
difference
5.3.3Show that the cross-validated residuals 5.3.18 stem from the leave-out technique applied to - smoothing.
5.3.4Try the woofer, midrange and tweeter on the simulated data set from Table 2, Appendix. Compare it with the supersmoother. Can you comment on where and why the supersmoother changed the smoothing parameter?
[Hint: Use XploRe (1989) or a similar interactive package.]