6.3 Multivariate Density and Regression Functions

In this section we review kernel smoothing methods for density and regression function estimation in the case of multidimensional variables $ X$.


6.3.1 Computational Aspects

As in the univariate case, density and regression functions can be estimated by exact computation or by WARPing approximation. However, the effect of WARPing is different in the multivariate case. WARPing is still relatively fast in the two-dimensional case. For three- and higher-dimensional estimates, exact estimation may be preferred. To have a choice between both the exact and the WARPing computation, all estimation routines are offered in two versions:

Functionality Exact WARPing
density estimation 13914 denxestp 13917 denestp
Nadaraya-Watson regression 13920 regxestp 13923 regestp
local linear regression 13926 lregxestp 13929 lregestp


6.3.2 Multivariate Density Estimation


hrot = 14075 denrotp (x {,K {,opt}})
computes a rule-of-thumb bandwidth for multivariate density estimation
fh = 14078 denestp (x {,h {,K} {,d}})
computes the multivariate kernel density estimate on a grid using the WARPing method
fh = 14081 denxestp (x {,h {,K} {,v}})
computes the multivariate kernel density estimate for all observations or on a grid v by exact computation

The kernel density estimator can be generalized to the multivariate case in a straightforward way. Suppose we now have observations $ x_1,\ldots,x_n$ where each of the observations is a $ d$-dimensional vector $ x_i=(x_{i1},\ldots,x_{id})^T$. The multivariate kernel density estimator at a point $ x=(x_{1},\ldots,x_{d})^T$ is defined as

$\displaystyle \widehat{f}_{h}(x)= \frac{1}{n}\sum_{i=1}^{n}\frac{1}{h_{1}\ldots...
...K}\left(\frac{x_{i1}-x_{1}}{h_{1}},\ldots,\frac{x_{id}-x_{d}}{h_{d}}\right), \ $ (6.18)

with $ \mathcal{K}$ denoting a multivariate kernel function, i.e. a function working on $ d$-dimensional arguments. Note that (6.18) assumes that the bandwidth $ h$ is a vector of bandwidths $ h=\left(h_{1},\ldots, h_{d}\right)^{T}$.

What form should the multidimensional kernel function $ \mathcal{K}(u)=\mathcal{K}(u_{1},\dots,u_{d})$ take on? The easiest solution is to use a multiplicative or product kernel

$\displaystyle \mathcal{K}(u)=K(u_{1})\cdotp \ldots \cdotp K(u_{p})$

with $ K$ denoting an univariate kernel function. This means if $ K$ is a univariate kernel with support $ [-1,1]$ (e.g. the Quartic kernel), observations in a cube around $ x$ are used to estimate the density at the point $ x$. An alternative is to use a genuine multivariate kernel function $ \mathcal{K}(u)$, e.g. the radial symmetric Quartic kernel

$\displaystyle \mathcal{K}(u) \propto (1-u^{T}u)^2\;{\boldsymbol{I}}(u^{T}u\le 1).$

Radial symmetric kernels can be obtained from univariate by defining $ \mathcal{K}(u) \propto K(\Vert u\Vert)$, where $ \Vert u\Vert = \sqrt{u^Tu}$ denotes the Euclidean norm of the vector $ u$. $ \propto$ indicates that the appropriate constant has to be multiplied. Radial symmetric kernels use observations from a ball around $ x$ to estimate the density at $ x$. Table 6.4 shows which product and which radial symmetric kernel functions are available in XploRe .

Table 6.4: Radial symmetric kernel functions.
Kernel Product Radial symmetric
     
Uniform 14099 uni 14102 runi
Triangle 14105 trian 14108 rtrian
Epanechnikov 14111 epa 14114 repa
Quartic 14117 qua 14120 rqua
Triweight 14123 tri 14126 rtri
Gaussian 14129 gau 14132 gau


The following quantlet computes a two-dimensional density estimate for the geyser data (see Data Sets (B.11)). These are two-dimensional data featuring a bimodal density. The function 14137 denxestp can be called with only the data as input. In this case, the bandwidth vector is computed by Scott's rule (Scott; 1992). This rule of thumb is also separately implemented in 14140 denrotp . The default kernel function is the product Quartic kernel "qua". The resulting surface plot is shown in Figure 6.13.

  geyser = read("geyser") 
  fh = denxestp(geyser) 
  fh = setmask(fh,"surface","blue")
  axesoff()
  cu = grcube(fh)              ; box
  plot(cu.box,cu.x,cu.y, fh)   ; plot box and fh
  setgopt(plotdisplay,1,1,"title","2D Density Estimate")
  axeson()
14144 XLGsmoo13.xpl

Figure 6.13: Two-dimensional density estimate.
\includegraphics[scale=0.425]{smootherd2d}

The second example of this subsection shows a three-dimensional density estimate. This estimate can only be graphed in the form of a contour plot. See Graphics (3) for an introduction to contour plots. The estimated data are columns 4 to 6 of the bank2 data (see Data Sets (B.7). This data set consists of two clusters which can be easily detected from the contour plot in Figure 6.14.

  bank    = read("bank2.dat") 
  bank456 = bank[,4:6]                ; columns 4 to 6
  fh = denxestp(bank456,1.5)
  axesoff()
  fhr  = (max(fh[,4])-min(fh[,4]))    ; range of fh
  cf1= grcontour3(fh,0.4*fhr,2)       ; contours
  cf2= grcontour3(fh,0.6*fhr,4)       ; contours
  cu = grcube(cf1|cf2)                ; box
  plot(cu.box, cf1,cf2)               ; graph contours
  setgopt(plotdisplay,1,1,"title","3D Density Estimate")
  axeson()
14153 XLGsmoo14.xpl

Figure 6.14: Contours of three-dimensional density estimate.
\includegraphics[scale=0.425]{smootherd3d}


6.3.3 Multivariate Regression


mh = 14457 regestp (x {,h {,K} {,d}})
computes the multivariate kernel regression on a grid using the WARPing method
mh = 14460 regxestp (x {,h {,K} {,v}})
computes the multivariate kernel regression for all observations or on a grid v by exact computation
mh = 14463 lregestp (x {,h {,K} {,d}})
computes the multivariate local linear kernel regression on a grid using the WARPing method
mh = 14466 lregxestp (x {,h {,K} {,v}})
computes the multivariate local linear kernel regression for all observations or on a grid v by exact computation

Multivariate nonparametric regression aims to estimate the functional relation between a univariate response variable $ Y$ and a $ d$-dimensional explanatory variable $ X$, i.e. the conditional expectation

$\displaystyle E(Y\vert X)=E\left(Y\vert X_{1},\ldots,X_{d}\right)
=m(X).$

The multivariate Nadaraya-Watson estimator can then be written as a generalization of the univariate case. Suppose that we have independent observations $ (x_1,y_1),\ldots,(x_n,y_n)$, then this estimator is defined as

$\displaystyle \widehat m_{h}(x)=
\frac{\sum\limits_{i=1}^n
\mathcal{K}\left(\f...
..._{1}},\ldots,
\frac{\displaystyle x_{ip}-x_{p}}{\displaystyle h_{p}}\right)}\,.$

As in the univariate case, local polynomial approaches can be used. Due to the computational complexity one computes typically only local linear estimates.

The following quantlet compares the two-dimensional Nadaraya-Watson and the two-dimensional local linear estimate for a generated data set. For the bandwidth vector and the kernel function, we accept the defaults which are 20% of the range of the data and the product Quartic kernel "qua", respectively. Figure 6.15 shows the surface plots of both estimates.

  randomize(0)
  n=200
  x=uniform(n,2)
  m=sin(2*pi*x[,1])+x[,2]
  y=m+normal(n)/4
  mh= regestp(x~y)
  ml=lregestp(x~y)
  mh=setmask(mh,"surface","red")
  ml=setmask(ml,"surface","blue")
  c=grcube(mh)
  d=createdisplay(1,2)
  axesoff()
  show(d,1,1,mh,c.box,c.x,c.y)
  show(d,1,2,ml,c.box,c.x,c.y)
  axeson()
  setgopt(d,1,1,"title","Nadaraya-Watson")
  setgopt(d,1,2,"title","Local Linear")
14472 XLGsmoo15.xpl

Figure 6.15: Bivariate Nadaraya-Watson and local linear estimate.
\includegraphics[scale=0.425]{smootherr2ld}