1.1 Motivation

The nonparametric approach to estimating a regression curve has four main purposes. First, it provides a versatile method of exploring a general relationship between two variables. Second, it gives predictions of observations yet to be made without reference to a fixed parametric model. Third, it provides a tool for finding spurious observations by studying the influence of isolated points. Fourth, it constitutes a flexible method of substituting for missing values or interpolating between adjacent $X$-values.

The flexibility of the method is extremely helpful in a preliminary and exploratory statistical analysis of a data set. If no a priori model information about the regression curve is available, the nonparametric analysis could help in suggesting simple parametric formulations of the regression relationship. An example is depicted in Figure 1.3 In that study of human longitudinal growth curves the target of interest was the first (respectively, second) derivative of the regression function (Gasser et al. (1984a); Pflug (1985)).

Figure 1.3: Human height growth versus age. The small graph gives raw data of height connected by straight lines (solid line) with cross-sectional sample quantiles (dashed lines). Velocity of height growth of a girl (above) and acceleration (below) modeled by a nonparametric smoother (solid line) and a parametric fit (dashed line). Units are cm (for height), cm/year (for velocity) and cm/year$^2$ (for acceleration). From Gasser and Müller (1984) (figure 1) with the permission of the Scandinavian Journal of Statistics.
\includegraphics[scale=0.2]{ANR1,3.ps}

The nonparametric regression smoothing method revealed an extra peak in the first derivative, the so-called mid-growth spurt at the age of about eight years. Other approaches based on ad hoc parametric modeling made it extremely difficult to detect this extra peak (dashed line Figure 1.3).

An analogous situation in the related field of density estimation was reported by Hildenbrand (1986) for the income density income of British households. It is important in economic theory, especially in demand and equilibrium theory, to have good approximations to income distributions. A traditional parametric fit - the Singh-Madalla model - resulted in Figure 1.4

Figure 1.4: Net income densities over time. A Singh-Madalla fit to the densities of $X={}$net income from 1969 to 1983. Units are mean income for each year. 2238 ANRnilognormal.xpl Survey (1968-1983).
\includegraphics[scale=0.7]{ANRnilognormal.ps}

The parametric model class of Singh-Madalla densities can only produce unimodal densities per se. By contrast, the more flexible nonparametric smoothing method produced Figure 1.5 The nonparametric approach makes it possible to estimate functions of greater complexity and suggests instead a bimodal income distribution. This bimodality is present over the thirteen years from 1968-1981 and changes its shape!More people enter the ``lower income range'' and the ``middle class'' peak becomes less dominant.

Figure 1.5: Net income densities over time. A nonparametric kernel fit (bandwidth $h=0.2$) to the densities of $X={}$net income from 1969 to 1981. Units are mean income for each year. 2242 ANRnidensity.xpl Survey (1968-1983).
\includegraphics[scale=0.7]{ANRnidensity.ps}

An example which once more underlines this flexibility of modeling regression curves is presented in Engle et al. (1986). They consider a nonlinear relationship between electricity sales and temperature using a parametric-nonparametric estimation procedure. Figure 1.6 shows the result of a spline smoothing procedure that nicely models a kink in the electricity sales.


Figure 1.6: Temperature response function for Georgia. The nonparametric estimate is given by the solid curve and two parametric estimates by the dashed curves. From Engle et al. (1986) with the permission of the American Statistical Association.
\includegraphics[scale=0.2]{ANR1,6.ps}

Another example arises in modeling alcohol concentration curves. A commonly used practice in forensic medicine is to approximate ethanol reduction curves with parametric models. More specifically, a linear regression model is used which in a simple way gives the so-called $\beta_{60}$ value, the ethanol reduction rate per hour. In practice, of course, this model can be used only in a very limited time interval, an extension into the ``late ethanol reduction region'' would not be possible. A nonparametric analysis based on splines suggested a mixture of a linear and exponential reduction curve. (Mattern et al.; 1983).

The prediction of new observations is of particular interest in time series analysis. It has been observed by a number of people that in certain applications classical parametric models are too restrictive to give reasonable explanations of observed phenomena. The nonparametric prediction of times series has been investigated by Robinson (1983) and Doukhan and Ghindès (1983). Ullah (1987) applies kernel smoothing to a time series of stock market prices and estimates certain risk indexes. Deaton (1988) uses smoothing methods to examine demand patterns in Thailand and investigates how knowledge of those patterns affects the assessment of pricing policies. Yakowitz (1985b) applies smoothing techniques for one-day-ahead prediction of river flow. Figure 1.7 below shows a nonparametric estimate of the flow probability for the St. Mary's river.

Figure 1.7: Nonparametric flow probability for the St. Mary's river. From Yakowitz (1985b) with permission of the Water Resources Research.
\includegraphics[scale=0.2]{ANR1,7.ps}

A treatment of outliers is an important step in highlighting features of a data set. Extreme points affect the scale of plots so that the structure of the main body of the data can become invisible. There is a rich literature on robust parametric methods in which different kinds of outlier influence are discussed. There are a number of diagnostic techniques for parametric models which can usually cope with outliers. However, with some parametric models one may not even be able to diagnose an implausible value since the parameters could be completely distorted by the outliers. This is true in particular for isolated (leverage) points in the predictor variable $X$. An example is given in Rousseouw and Yohai (1984) in which a linear regression line fitted a few outliers but missed the main body of the data. Nonparametric smoothing provides a versatile pre-screening method for outliers in the $x$-direction without reference to a specific parametric model. Figure 1.8 shows a nonparametric smoother applied to analysis of simulated side impact studies. The curve shown is an approximation to the probability of a fatal injury as a function of anthropometric and biokinetic parameters. The $Y$-ordinates are binary in this case ($Y=1$ denoting fatal injury). The curve shows visually what could also be derived from an influence analysis: it makes a dip at the isolated $x$-points in the far right. The points could be identified as observations from young persons which had a rather unnormal reaction behavior in these experiments; see Kallieris and Mattern (1984). This example is discussed in more detail in Section 10.4.

Figure 1.8: Indicators of fatal injury $(Y=1)$ as a function of an injury stress index together with an estimate of the regression curve. From Härdle and Scott (1992).
\includegraphics[scale=0.2]{ANR1,8.ps}

Missing data is a problem quite often encountered in practice. Some response variables may not have been recorded since an instrument broke down or a certain entry on an inquiry form was not answered. Nonparametric smoothing bridges the gap of missing data by interpolating between adjacent data points, whereas parametric models would involve all the observations in the interpolation. An approach in spatial statistics is to interpolate points by the ``kriging'' method. This method is used by statisticians in hydrology, mining, petroleum engineering and is related to predicting values of noisy data in a nonparametric fashion; see Yakowitz and Szidarovsky (1986). Schmerling and Peil (1985) use local polynomial interpolation in anatomy to extrapolate missing data.