In this section we review kernel smoothing methods for
density and regression function estimation in the case of
multidimensional variables .
As in the univariate case, density and regression functions can be estimated by exact computation or by WARPing approximation. However, the effect of WARPing is different in the multivariate case. WARPing is still relatively fast in the two-dimensional case. For three- and higher-dimensional estimates, exact estimation may be preferred. To have a choice between both the exact and the WARPing computation, all estimation routines are offered in two versions:
Functionality | Exact | WARPing |
density estimation | ![]() |
![]() |
Nadaraya-Watson regression | ![]() |
![]() |
local linear regression | ![]() |
![]() |
|
The kernel density estimator can be generalized to the multivariate
case in a straightforward way. Suppose we now have observations
where each of the observations is a
-dimensional
vector
.
The multivariate kernel density estimator at a point
is defined as
What form should the multidimensional kernel function
take on? The easiest solution
is to use a multiplicative or product kernel
The following quantlet computes a two-dimensional density estimate
for the
geyser
data
(see Data Sets (B.11)).
These are two-dimensional
data featuring a bimodal density. The function
denxestp
can be
called with only the data as input. In this case, the bandwidth vector
is computed by Scott's rule (Scott; 1992). This rule of thumb is also
separately implemented in
denrotp
. The default kernel function
is the product Quartic kernel "qua". The resulting surface
plot is shown in Figure 6.13.
geyser = read("geyser") fh = denxestp(geyser) fh = setmask(fh,"surface","blue") axesoff() cu = grcube(fh) ; box plot(cu.box,cu.x,cu.y, fh) ; plot box and fh setgopt(plotdisplay,1,1,"title","2D Density Estimate") axeson()
The second example of this subsection shows a three-dimensional density estimate. This estimate can only be graphed in the form of a contour plot. See Graphics (3) for an introduction to contour plots. The estimated data are columns 4 to 6 of the bank2 data (see Data Sets (B.7). This data set consists of two clusters which can be easily detected from the contour plot in Figure 6.14.
bank = read("bank2.dat") bank456 = bank[,4:6] ; columns 4 to 6 fh = denxestp(bank456,1.5) axesoff() fhr = (max(fh[,4])-min(fh[,4])) ; range of fh cf1= grcontour3(fh,0.4*fhr,2) ; contours cf2= grcontour3(fh,0.6*fhr,4) ; contours cu = grcube(cf1|cf2) ; box plot(cu.box, cf1,cf2) ; graph contours setgopt(plotdisplay,1,1,"title","3D Density Estimate") axeson()
|
Multivariate nonparametric regression aims to estimate
the functional relation between a univariate
response variable and a
-dimensional
explanatory variable
, i.e. the conditional expectation
The following quantlet compares the two-dimensional Nadaraya-Watson and the two-dimensional local linear estimate for a generated data set. For the bandwidth vector and the kernel function, we accept the defaults which are 20% of the range of the data and the product Quartic kernel "qua", respectively. Figure 6.15 shows the surface plots of both estimates.
randomize(0) n=200 x=uniform(n,2) m=sin(2*pi*x[,1])+x[,2] y=m+normal(n)/4 mh= regestp(x~y) ml=lregestp(x~y) mh=setmask(mh,"surface","red") ml=setmask(ml,"surface","blue") c=grcube(mh) d=createdisplay(1,2) axesoff() show(d,1,1,mh,c.box,c.x,c.y) show(d,1,2,ml,c.box,c.x,c.y) axeson() setgopt(d,1,1,"title","Nadaraya-Watson") setgopt(d,1,2,"title","Local Linear")