Next: 5.2 Linear Smoothing Up: 5. Smoothing: Local Regression Previous: 5. Smoothing: Local Regression

5.1 Smoothing

Given a dataset consisting of several variables and multiple observations, the goal of smoothing is to construct a functional relationship among the variables.

The most common situation for smoothing is that of a classical regression setting, where one assumes that observations occur in (predictor, response) pairs. That is, the available data has the form

$\displaystyle \{ (x_i,Y_i){};\quad i = 1,\ldots,n\}{},$

where

is a measurement of the predictor (or independent) variable, and

is the corresponding response. A functional model relating the variables takes the form

$\displaystyle Y_i = \mu(x_i) + \epsilon_i{},$

(5.1)

where $\mu(x_i)$ is the mean function, and $\epsilon_i$ is a random error term. In classical regression analysis, one assumes a parametric form for the mean function; for example, $\mu(x) = a_0 + a_1 x$ . The problem of estimating the mean function then reduces to estimating the coefficients

and

The idea of smoothing methods is not to specify a parametric model for the mean function, but to allow the data to determine an appropriate functional form. Loosely stated, one assumes only that the mean function is smooth. Formal mathematical analysis may state the smoothness condition as a bound on derivatives of $\mu$ ; for example, $\vert\mu''(x)\vert \le M$ for all and a specified constant .

Section 5.2 describes some of the most important smoothing methods. These all fall into a class of linear smoothers, and Sect. 5.3 develops important properties, including bias and variance. These results are applied to derive statistical procedures, including bandwidth selection, model diagnostics and goodness-of-fit testing in Sect. 5.4. Multivariate smoothing, when there are multiple predictor variables, is discussed in Sect. 5.5. Finally, Sect. 5.5.2 discusses extensions to likelihood smoothing.

Next: 5.2 Linear Smoothing Up: 5. Smoothing: Local Regression Previous: 5. Smoothing: Local Regression