In the previous section several techniques for estimating the
conditional expectation function of the
bivariate distribution of the random variables
and
were presented.
Recall that the conditional expectation function is an interesting target for
estimation since it tells us how
and
are related
on average.
In practice, however, we will mostly be interested in specifying how
the response variable
depends on a vector of exogenous
variables, denoted by
. This means we aim to estimate
the conditional expectation
Note also, that the multivariate Nadaraya-Watson estimator is a local constant estimator. The definition of local polynomial kernel regression is a straightforward generalization of the univariate case. Let us illustrate this with the example of a local linear regression estimate. The minimization problem here is
![]() |
(4.70) |
The asymptotic conditional variances of the Nadaraya-Watson estimator
and the local linear
are identical and their derivation can be
found in
detail in Ruppert & Wand (1994):
![]() |
(4.71) |
In the following we will sketch the derivation of the asymptotic
conditional bias. We have seen this remarkable difference between both
estimators already in the univariate
case.
Denote
the second order Taylor expansion of
, i.e.
![]() |
![]() |
![]() |
|
![]() |
|||
![]() |
![]() |
![]() |
![]() |
|
![]() |
![]() |
![]() |
![]() |
|
![]() |
|||
![]() |
![]() |
![]() |
Let us now turn to the local linear case.
Recall that we use the notation
for the
first unit
vector in
. Then we can write the local linear
estimator as
![]() | |||
![]() |
![]() |
||
![]() |
![]() |
![]() |
![]() |
![]() |
|
![]() |
![]() |
![]() |
For all omitted details
we refer again to Ruppert & Wand (1994). They also point
out that the local linear estimate has the same order conditional bias in
the interior as well as in the boundary of the support of .
The computation of
local polynomial estimators can be done by any
statistical package that is able to run weighted least squares
regression. However, since we estimate a function, this weighted
least squares regression has to be performed in all observation
points or on a grid of points in
. Therefore, explicit
formulae, which can be derived at least for lower dimensions
are useful.
![]() |
![]() |
![]() |
|
![]() |
![]() |
![]() |
Figure 4.18 shows the Nadaraya-Watson and the
local linear two-dimensional
estimate for simulated data. We use design points
uniformly distributed in
and the
regression function
Nonparametric kernel regression function estimation is not limited to bivariate distributions. Everything can be generalized to higher dimensions but unfortunately some problems arise. A practical problem is the graphical display for higher dimensional multivariate functions. This problem has already been considered in Chapter 3 when we discussed the graphical representation of multivariate density estimates. The corresponding remarks for plotting functions of up to three-dimensional arguments apply here again.
A general problem in multivariate nonparametric estimation is the so-called
curse of dimensionality.
Recall that the nonparametric regression estimators are based on the idea of
local (weighted) averaging. In higher dimensions the observations are
sparsely distributed even for large sample sizes, and consequently
estimators based on local averaging perform unsatisfactorily in this situation.
Technically, one can explain this effect by looking at the
again.
Consider a multivariate regression estimator with
the same bandwidth
for all components, e.g. a Nadaraya-Watson or
local linear estimator with bandwidth matrix
.
Here the asymptotic
will also depend on
:
An introduction to kernel regression methods can be found in the monographs of Silverman (1986), Härdle (1990), Bowman & Azzalini (1997), Simonoff (1996) and Pagan & Ullah (1999). The books of Scott (1992) and Wand & Jones (1995) deal particularly with the multivariate case. For detailed derivations of the asymptotic properties we refer to Collomb (1985) and Gasser & Müller (1984). The latter reference also considers boundary kernels to reduce the bias at the boundary regions of the explanatory variables.
Locally weighted least squares were originally studied by Stone (1977), Cleveland (1979) and Lejeune (1985). Technical details of asymptotic expansions for bias and variance can be found in Ruppert & Wand (1994). Monographs concerning local polynomial fitting are Wand & Jones (1995), Fan & Gijbels (1996) and Simonoff (1996). Computational aspects, in particular the WARPing technique (binning) for kernel and local polynomial regression, are discussed in Härdle & Scott (1992) and Fan & Marron (1994). The monograph of Loader (1999) discusses local regression in combination with likelihood-based estimation.
For comprehensive works on spline smoothing see Eilers & Marx (1996), Wahba (1990) and and Green & Silverman (1994). Good resources for wavelets are Daubechies (1992), Donoho & Johnstone (1994), Donoho & Johnstone (1995) and Donoho et al. (1995). The books of Eubank (1999) and Schimek (2000b) provide extensive overviews on a variety of different smoothing methods.
For a monograph on testing in nonparametric models see Hart (1997). The concepts presented by equations (4.63)-(4.66) are in particular studied by the following articles: González Manteiga & Cao (1993) and Härdle & Mammen (1993) introduced (4.63), Gozalo & Linton (2001) studied (4.64) motivated by Lagrange multiplier tests. Equation (4.65) was originally introduced by Zheng (1996) and independently discussed by Fan & Li (1996). Finally, (4.66) was proposed by Dette (1999) in the context of testing for parametric structures in the regression function. For an introductory presentation see Yatchew (2003). In general, all test approaches are also possible for multivariate regressors.
More sophisticated is the minimax approach
for testing nonparametric alternatives studied by
Ingster (1993). This approach tries to maximize the power
for the worst alternative case, i.e. the one that is closest
to the hypothesis but could still be detected.
Rather different approaches have been introduced by
Bierens (1990) and Bierens & Ploberger (1997), who consider
(integrated) conditional moment tests or by Stute (1997)
and Stute et al. (1998), who verify, via bootstrap, whether
the residuals of the hypothesis integrated over the empirical
distribution of the regressor variable converge to a centered
Gaussian process.
There is further literature about adaptive testing
which tries to find the smoothing parameter that maximizes
the power when holding the type one error, see for example
Ledwina (1994), Kallenberg & Ledwina (1995), Spokoiny (1996) and
Spokoiny (1998).
![]() |
![]() |
![]() |
|
![]() |