# 6.1 Resistant smoothing techniques

A linear local average of the response variable is, per se, not robust against outliers. Moving a response observation to infinity would drag the smooth to infinity as well. In this sense, local averaging smoothing has unbounded capacity to be influenced by far out" observations. Resistance or bounded influence" against outliers can be achieved by downweighting large residuals which would otherwise influence the smoother.

We have already encountered a straightforward resistant technique: median smoothing. It is highly robust since the extreme response observations (stemming from predictor variables in a neighborhood around ) do not have any effect on the (local) median of the response variables. A slight disadvantage of median smoothing, though, is that it produces a rough and wiggly curve. Resmoothing and twicing are data-analytic techniques to ameliorate median smoothing in this respect; see Velleman (1980) and Mallows (1980).

## 6.1.1 LOcally WEighted Scatter plot Smoothing (LOWESS)

Cleveland (1979) proposed the following algorithm, LOWESS, a resistant method based on local polynomial fits. The basic idea is to start with a local polynomial least squares fit and then to robustify" it. Local" means here a - type neighborhood. The procedure starts from a - pilot estimate and iteratively defines robustness weights and re-smoothes several times.

Algorithm 6.1.1

LOWESS

STEP 1. Fit a polynomial regression in a neighborhood of , that is, find coefficients which minimize

where denote - weights.

FOR TO maxiter DO BEGIN

STEP 2.

Compute from the estimated residuals the scale estimate med and define robustness weights , where denotes the quartic kernel, .

STEP 3.

Fit a polynomial regression as in STEP 1 but with weights

END (* i *).

Cleveland recommends the choice (as for the supersmoother ) as striking a good balance between computational ease and the need for flexibility to reproduce patterns in the data. The smoothing parameter can be determined by cross-validation as in Section 5.1. Figure 6.2 shows an application of Cleveland's algorithm to a simulated data set. It is quite obvious that the LOWESS smooth is resistant to the far out" response variables at the upper borderline of the plot.

## 6.1.2 -smoothing

Another class of resistant smoothers is given by local trimmed averages of the response variables. If denotes the order statistic from observations , a trimmed average (mean) is defined by

the mean of the inner percent of the data." A local trimmed average at the point from regression data is defined as a trimmed mean of the response variables such that is in a neighborhood of ." (The neighborhood could be parameterized, for instance, by a bandwidth sequence .) Adopting terminology from robust theory of estimation, this type of smoothing is called -smoothing .

-smoothing is a resistant technique: the far out extremes" at a point do not enter the local averaging procedure. More generally, one considers a conditional -functional

 (6.1.1)

where denotes the conditional quantile function associated with , the conditional distribution function of given . For reduces to the regression function since by substituting ,

The same occurs in the case where with symmetric conditional distribution function. Median smoothing is a special case of -smoothing with .

In practice, we do not know and we have to estimate it. If denotes an estimator of one obtains from formula 6.1.1 the -smoothers. Estimates of can be constructed, for example, by the kernel technique,

to obtain

Stute (1984) and Owen (1987) show asymptotic normality of such conditional functionals. Härdle, Janssen and Serfling (1988) derive (optimal) uniform consistency rates for -smoothers.

## 6.1.3 -smoothing

Yet another class of smoothers are the -smoothers derived from -estimates of location. Assume that is symmetric around and that is a nondecreasing function defined on such that . Then the score

is zero for . The idea now is to replace by an estimate . If denotes such an estimate of the conditional distribution function , then this score should be roughly zero for a good estimate of . The motivation for this -smoothing technique stems from rank tests.

Consider a two-sample rank test for shift based on the sample and , that is, a mirror image of the first sample serves as a stand-in for the second sample. Now try to adjust in such a way that the test statistic based on the scores

of the ranks of in the combined sample is roughly zero (see Huber 1981, chapter 3.4). This would make the two samples and almost indistinguishable or, in other words, would make a good estimate of location. If this is translated into the setting of smoothing then the above form of is obtained.

A solution of is, in general, not unique or may have irregular behavior. Cheng and Cheng (1986) therefore suggested

 (6.1.2)

as an estimate for the regression curve . Consistency and asymptotic normality of this smoothing technique are derived in Cheng and Cheng (1987).

## 6.1.4 -smoothing

Resistant smoothing techniques based on -estimates of location are called -smoothers . Recall that all smoothers of the form

can be viewed as solutions to (local) least squares problems; see 3.1.8. The basic idea of -smoothers is to reduce the influence of outlying observations by the use of a non-quadratic loss function in 3.1.8. A well-known example (see Huber 1981) of such a loss function with lighter tails" is
 (6.1.3)

The constant regulates the degree of resistance. For large values of one obtains the ordinary quadratic loss function. For small values ( one or two times the standard deviation of the observation errors) one achieves more robustness.

In the setting of spline smoothing, an -type spline was defined by Cox(1983)

 (6.1.4)

where, again, is a loss function with lighter" tails than the quadratic. Related types of -smoothers were considered by Huber (1979), Nemirovskii, Polyak and Tsybakov (1983, 1985), and Silverman (1985).

Kernel smoothers can be made resistant by similar means. Assume that the conditional distribution is symmetric. This assumption ensures that we are still estimating , the conditional mean curve. Define a robust kernel -smoother as

 (6.1.5)

where denotes a positive kernel weight sequence. Differentiating 6.1.5 with respect to yields, with ,
 (6.1.6)

Since the kernel -smoother is implicitly defined, it requires iterative numerical methods. A fast algorithm based on the Fast Fourier Transform and a one-step" approximation to are given in Härdle (1987a). A wide variety of possible -functions yield consistent estimators . (Consistency follows by arguments given in Huber (1981, chapter 3).) Note that the special case with linear reproduces the ordinary kernel smoother . To understand what resistant -smoothers are actually doing to the data, define unobservable pseudo-observations

with

The following theorem can be derived using methods given in Tsybakov (1982b) and Härdle (1984b).

Theorem 6.1.1   Let be the kernel -smoother computed from
and let be the ordinary kernel smoother applied to the pseudo-data ; then and have the same asymptotic normal distribution with mean as in (4.2.1) and asymptotic variance

This result deserves some discussion. First, it shows that kernel -smoothers can be interpreted as ordinary kernel smoothers applied to nonobservable pseudo-data with transformed errors . This sheds some light on how the resistance of -smoothers is achieved: The extreme" observation errors are downweighted" by the nonlinear, bounded function . Second, Theorem 6.1.1 reveals that the bias of the ordinary kernel smoother is the same as that for the kernel -smoother. The nonlinear definition of does not affect the (asymptotic) bias properties. Third, the product form of the asymptotic variance as a product of and allows optimization of simply by considering and separately.

The first of these two separate problems was solved in Section 4.5. By utilizing classical theory for -estimates of location, the second problem can be treated as in Huber (1981, chapter 4). The details of this optimization technique are rather delicate; the reader is referred to the standard literature on robust estimation. Optimization of the smoothing parameter is discussed in Härdle (1984c) and more recently by Leung (1988). Both authors consider the direct analogue of cross-validation, namely, construct robust leave-one-out smoothers and then to proceed as in Section 5.1.

A natural question to ask is, how much is gained or lost in asymptotic accuracy when using an -smoother? The bias is the same as for the kernel smoother. A way of comparing the nonresistant and the resistant technique is therefore to study the ratio of asymptotic variances,

 (6.1.7)

of the Nadaraya-Watson kernel smoother to the kernel -smoother (based on the same kernel weights). But this relative efficiency 6.1.7 is the same as for the estimation of location. The reader is therefore referred to the literature on robust estimation (see e.g. Huber 1981).

As an example, I would like to present a smoothing problem in physical chemistry. Raman spectra are an important diagnostic tool in that field. One would like to identify the location and size of peaks and troughs of spectral bands; see Hillig and Morris (1982) and Bussian and Härdle (1984). Unfortunately, small-scale instrumental noise and a certain proportion of observation error which is caused by random external events blur the observations. The latter type of error causes high frequency signals or bubbles in the sample and produces single spikes like those in Figure 6.3.

Estimating with the ordinary Nadaraya-Watson kernel smoother, results in the curve depicted in Figure 6.4.

The single spike outliers obviously produced two spurious neighboring peaks. The resistant smoothing technique, on the other hand, leads to Figure 6.5.

The influence of the outliers is obviously reduced. Uniform confidence bands -- based on asymptotic extreme value theory -- may be constructed using the methods presented in Section 4.3; see Härdle (1987b). Figure 6.6 depicts a kernel -smoother together with uniform confidence bands, and , the Nadaraya-Watson kernel smoother, for the data presented in Figure 6.1.

Optimal uniform convergence rates (see Section 4.1) for kernel
-smoothers have been derived in Härdle and Luckhaus (1984). In the context of time series, robust estimation and prediction has been discussed by Velleman (1977, 1980), Mallows (1980) and Härdle and Tuan (1986). Robust nonparametric prediction of time series by -smoothers has been investigated by Robinson (1984, 1987b), Collomb and Härdle (1986) and Härdle (1986c). Robust kernel smoothers for estimation of derivatives have been investigated in Härdle and Gasser (1985) and Tsybakov (1986).

Exercises

6.1.1 Find conditions such that -smoothers, as defined in 6.1.1, are consistent estimators for the regression curve.

6.1.2 Find conditions such that -smoothers, as defined in 6.1.2, asymptotically converge to the true regression curve.

6.1.3 Do you expect the general -smoothers 6.1.1 to produce smoother curves than the running median?

6.1.4 Construct a fast algorithm for -smoothers 6.1.1. Based on the ideas of efficent running median smoothing (Section 3.8) you should be able to find a code that runs in steps ( is the number of neighbors).

6.1.5 Prove consistency for the -smoother 6.1.4 for monotone functions.

[Hint: Follow the proof of Huber (1981, chapter 3).]

6.1.6 Can you extend the proof of Exercise 6.1.4 to nonmonotone functions such as Hampels three part redescender?"