Next: References Up: 5. Smoothing: Local Regression Previous: 5.4 Statistics for Linear

Subsections

# 5.5 Multivariate Smoothers

When there are multiple predictor variables, the smoothing problem becomes multivariate: is now a surface. The definition of kernel and local regression smoothers can be extended to estimate a regression surface with any number of predictor variables, although the methods become less useful for more than 2 or 3 variables. There are several reasons for this:

• Data sparsity - the curse of dimensionality.
• Visualization issues - how does one view and interpret a high dimensional smooth regression surface?
• Computation is often much more expensive in high dimensions.
For these reasons, use of local polynomials and other smoothers to model high dimensional surfaces is rarely recommended, and the presentation here is restricted to the two-dimensional case. In higher dimensions, smoothers can be used in conjunction with dimension reduction procedures (Chap. III.6), which attempt to model the high-dimensional surface through low-dimensional components. Examples of this type of procedure include Projection Pursuit ([9]), Additive Models ([14]), Semiparametric Models ([26] and Chap. III.10) and recursive partitioning (Chap. III.14).

## 5.5.1 Two Predictor Variables

Suppose the dataset consists of vectors , where and are considered predictor variables, and is the response. For simplicity, we'll use to denote a vector of the predictor variables. The data are modeled as

Bivariate smoothers attempt to estimate the surface . Kernel and local regression methods can be extended to the bivariate case, simply by defining smoothing weights on a plane rather than on a line. Formally, a bivariate local regression estimate at a point can be constructed as follows:
1. Define a distance measure between the data points and fitting point. A common choice is Euclidean distance,

2. Define the smoothing weights using a kernel function and bandwidth:

3. Define a local polynomial approximation, such as a local linear approximation

when is close to . More generally, a local polynomial approximation can be written

where is a vector of coefficients, and is a vector of basis polynomials.

4. Estimate the coefficient vector by local least squares. That is, choose to minimize

5. The local polynomial estimate is then

## 5.5.2 Likelihood Smoothing

A likelihood smoother replaces the model (5.1) with a distributional assumption

where is a specified family of densities, parameterized so that . The family may be chosen depending on the response variable. If is a count, then the Poisson family is a natural choice:

If is a  (or no/yes) response, then the Bernoulli family is appropriate:

Given the data, the log-likelihood is

The goal is to estimate the mean function, for an observed set of covariates . A generalized linear model (Chap. III.7) uses a parametric model for the mean function. Likelihood smoothers assume only that the mean is a smooth function of the covariates.

The earliest work on likelihood smoothing is [16], who used a penalized binomial likelihood to estimate mortality rates. The local likelihood method described below can be viewed as an extension of local polynomial regression, and was introduced by [30].

### 5.5.2.1 Local Likelihood Estimation.

Local likelihood estimation is based on a locally weighted version of the log-likelihood:

A local polynomial approximation is then used for a transformation of the mean function. For example, a local quadratic approximation is

The function is the link function. Its primary goal is to remove constraints on the mean by mapping the parameter space to . For example, in the Poisson case, the parameter space is . If the log transformation is used, then the parameter space becomes .

Let where , so that the locally weighted log-likelihood becomes

The maximizer satisfies the likelihood equations,

 (5.20)

where

In matrix notation, this system of equations can be written in a form similar to (5.7):

 (5.21)

This system of equations is solved to find parameter estimates and . The local likelihood estimate is defined as

### 5.5.2.2 Solving the Local Likelihood Equations.

The local likelihood equations (5.20) are usually non-linear, and so the solution must be obtained through iterative methods. The Newton-Raphson updating formula is

 (5.22)

where is a diagonal matrix with entries

For many common likelihoods is concave. Under mild conditions on the design points, this implies that the local likelihood is also concave, and has a unique global maximizer. If the Newton-Raphson algorithm converges, it must converge to this global maximizer.

The Newton-Raphson algorithm (5.22) cannot be guaranteed to converge from arbitrary starting values. But for concave likelihoods, is guaranteed to be an ascent direction, and convergence can be ensured by controlling the step size.

### 5.5.2.3 Statistics for the Local Likelihood Estimate.

Since the local likelihood estimate does not have an explicit representation, statistical properties cannot be derived as easily as in the local regression case. But a Taylor series expansion of the local likelihood gives an approximate linearization of the estimate, leading to theory parallel to that developed in Sects. 5.3 and 5.4 for local regression. See Chap. 4 of [21].

## 5.5.3 Extensions of Local Likelihood

The local likelihood method has been formulated for regression models. But variants of the method have been derived for numerous other settings, including robust regression, survival models, censored data, proportional hazards models, and density estimation. References include [30], [17], Loader ([19], [21]).

### 5.5.3.1 Robust Smoothing.

Robust smoothing combines the ideas of robust estimation (Chap. III.9) with smoothing. One method is local M-estimation: choose to minimize

and estimate . If , this corresponds to local least squares estimation. If is a symmetric function that increases more slowly than , then the resulting estimate is more robust to outliers in the data. One popular choice of is the Huber function:

References include [11] and [21]. Another variant of M-estimation for local regression is the iterative procedure of [4].

### 5.5.3.2 Density Estimation.

Suppose are an independent sample from a density . The goal is to estimate . The local likelihood for this problem is

Letting be the maximizer of the local log-likelihood, the local likelihood estimate is . See [17] and [19].

The density estimation problem is discussed in detail, together with graphical techniques for visualizing densities, in Chap. III.4.

Acknowledgements. This work was supported by National Science Foundation Grant DMS 0306202.

Next: References Up: 5. Smoothing: Local Regression Previous: 5.4 Statistics for Linear