This section intends to give a short overview of nonparametric smoothing
techniques that differ from the kernel method. For further
references on these and other nonparametric smoothers
see the bibliographic notes at the end of this chapter.
As we have seen above, kernel regression estimation can be viewed as a
method of computing weighted averages of the response
variables in a fixed neighborhood around
, the width of this neighborhood
being governed by the bandwidth
. The
-nearest-neighbor (
-NN)
estimator can also be viewed as a weighted average of the response variables
in a neighborhood around
, with the important difference that
the neighborhood width is not fixed but variable. To be more specific,
the values of
used in computing the average, are those which belong
to the
observed values of
that are nearest to the point
, at
which we would like to estimate
. Formally, the
-NN estimator can
be written as
 |
(4.23) |
where the weights
are defined as
 |
(4.24) |
with the set of indices
If we estimate
at a point
where the data are
sparse then it might happen
that the
nearest neighbors are rather far away from
(and each other),
thus consequently we end up with a wide neighborhood around
for which an average
of the corresponding values of
is computed. Note that
is the smoothing
parameter of this estimator. Increasing
makes the estimate smoother.
EXAMPLE 4.5
A

-NN estimation of the Engel curve (net-income vs. food) is shown in
Figure
4.5.

Figure:
-Nearest-neighbor
regression,
, U.K. Family Expenditure Survey 1973
SPMknnreg
|
The
-NN estimator can be viewed as a kernel estimator with uniform kernel
I
and variable bandwidth
, with
being the distance between
and its furthest
-nearest neighbor:
 |
(4.25) |
The
-NN estimator can be generalized in this sense by considering kernels
other than the uniform kernel. Bias and variance of this more general
-NN estimator are given in the following theorem by Mack (1981).
THEOREM 4.4
Let

,

and

.
Then
Obviously, unlike the variance of the Nadaraya-Watson kernel regression
estimator, the variance of the
-NN regression estimator does not depend
on
, which makes sense since the
-NN estimator always averages
over
observations, regardless of how dense the data is in the neighborhood
of the point
where we estimate
. Consequently,
.
By choosing
 |
(4.28) |
we obtain a
-NN estimator that is approximately identical to
a kernel estimator with bandwidth
in the leading terms of the
.
Median smoothing may be described as the nearest-neighbor technique
to solve the problem of estimating the conditional median
function, rather than the conditional expectation function,
which has been our target so far.
The conditional median
is more robust to outliers than
the conditional expectation
.
Moreover, median smoothing allows us
to model discontinuities in the regression curve
. Formally, the median smoother is defined as
 |
(4.29) |
where
That is, the median of those
s is computed, for which the
corresponding
is one of the
nearest neighbors of
.
EXAMPLE 4.6
We display such a median smoother for our running Engle curve
example in Figure
4.6. Note that in contrast to the

-NN estimator, extreme values of food expenditures do no longer
affect the estimator.

Figure:
Median smoothing regression,
,
U.K. Family Expenditure Survey 1973
SPMmesmooreg
|
Spline smoothing can be motivated by considering the residual
sum of squares (
) as a criterion for the goodness of
fit of a function
to the data.
The residual sum of squares is defined as
Yet, one can define the function
that is minimizing the
but is merely interpolating
the data, without exploiting any structure that might be present in the
data.
Spline smoothing solves this problem by adding a stabilizer that
penalizes
non-smoothness of
. One possible stabilizer is
given by
 |
(4.30) |
The use of
can be motivated by the fact that the curvature
of
increases with
.
Using the penalty term (4.30) we may restate the minimization problem as
 |
(4.31) |
with
 |
(4.32) |
If we consider the class of all twice differentiable functions on the
interval
(where
denotes
th order statistic) then the (unique)
minimizer of (4.32) is given by the cubic spline estimator
, which consists of cubic polynomials
between adjacent
-values.
The parameter
controls the weight given to the stabilizer
in the minimization. The higher
is, the more weight is given to
and the smoother the estimate.
As
is merely an
interpolation of the observations of
. If
tends to a linear function.
Let us now consider the spline estimator in more detail.
For the estimator to be twice continuously differentiable
we have to make sure that there are no jumps in the function, as well
in its first and second derivative if evaluated at
.
Formally, we require
Additionally a boundary condition has to be fulfilled. Typically
this is
These restrictions, along with the conditions for minimizing
w.r.t. the coefficients of
, define a system of linear equations which can be
solved in only
calculations.
To illustrate this, we present some
details on the computational algorithm introduced by Reinsch, see
Green & Silverman (1994).
Observe that the residual sum of squares
 |
(4.33) |
where
with
the corresponding
value to
and
If
were indeed a piecewise
cubic polynomial on intervals
then the penalty term
could be expressed as a quadratic form in
 |
(4.34) |
with a matrix
that can be decomposed to
Here,
and
are band matrices and functions of
. More precisely,
is a
matrix with elements
and
for
.
is
a symmetric
matrix with elements
and
for
.
From (4.33) and (4.34) it follows that the
smoothing spline is obtained by
 |
(4.35) |
with
denoting the
-dimensional identity matrix.
Because of the band structure of
and
(4.35) can be solved indeed in
steps
using a Cholesky decomposition.
EXAMPLE 4.7
In Figure
4.7 we illustrate the resulting cubic
spline estimate for our running Engel curve example.

Figure:
Spline
regression,
, U.K. Family Expenditure Survey 1973
SPMspline
|
From (4.35) we see that the spline smoother
is a linear estimator in
, i.e.
weights
exist, such that
It can be shown that under certain conditions
the spline smoother is asymptotically equivalent to a kernel smoother
that employs the so-called spline kernel
with local bandwidth
(Silverman, 1984).
Under regularity conditions, functions can be represented
as a series of basis functions (e.g. a Fourier series).
Suppose that
can be represented by such a
Fourier series. That is,
suppose that
 |
(4.36) |
where
is a known
basis of functions and
are the unknown Fourier coefficients.
Our goal is to estimate the unknown Fourier coefficients.
Note that we indeed have an infinite sum in (4.36)
if there are infinitely many non-zero
s.
Obviously, an infinite number of coefficients cannot be estimated
from a finite number of observations. Hence, one has to choose
the number of
terms
(which, as indicated, is a function of the sample size
) that
will be included in the Fourier series representation.
Thus, in principle, series estimation proceeds in three steps:
- (a)
- select a basis of functions,
- (b)
- select
, where
is an integer less than
, and
- (c)
- estimate the
unknown coefficients by a suitable method.
is the smoothing parameter of series estimation. The larger
is
the more terms are included in the Fourier series and the estimate tends
toward interpolating the data. On the contrary, small values of
will produce relatively smooth estimates.
Regarding the estimation of the coefficients
there are basically two methods:
One method involves looking at the finite version (i.e. the sum
up to
) of
(4.36) as a regression equation and estimating the
coefficients by regressing the
on
.
EXAMPLE 4.8
We have applied this
for the Engel curve again. As functions

we used
the Legendre polynomials (orthogonalized polynomials, see
Example
4.9
below) in Figure
4.8.

Figure:
Orthogonal
series regression using Legendre polynomials,
, U.K. Family Expenditure Survey 1973
SPMorthogon
|
An alternative approach is concerned with choosing the basis of functions
to be orthonormal.
The orthonormality requirement
can be formalized as
The following two examples show such orthonormal bases.
EXAMPLE 4.9
Consider the Legendre polynomials on
Higher order Legendre polynomials can be computed by

EXAMPLE 4.10
Consider the wavelet basis

on

generated from
a mother wavelet

. A wavelet

can be computed by
where

is a scale factor and

a location shift.
A simple example of a mother wavelet is the Haar wavelet
which is a simple step function.

The coefficients
can be calculated
from
 |
(4.37) |
If we find a way to estimate the unknown function
in
(4.37) then we
will end up with an estimate of
. For Fourier
series expansions and wavelets
the
can be approximated
using the fast Fourier and fast wavelet transform (FFT and FWT),
respectively. See Härdle, Kerkyacharian, Picard & Tsybakov (1998) for more details.
EXAMPLE 4.11
Wavelets are particularly suited to fit regression functions
that feature varying frequencies and jumps.
Figure
4.9 shows the wavelet fit (Haar basis) for simulated
data from a regression curve that combines a sine part with varying
frequency and a constant part. To apply the
fast wavelet transform,

data points were generated.

Figure:
Wavelet
regression and original curve for simulated data,
SPMwavereg
|