# 4.4 Confidence Regions and Tests

As in the case of density estimation, confidence intervals and bands can be based on the asymptotic normal distribution of the regression estimator. We will restrict ourselves to the Nadaraya-Watson case in order to show the essential concepts. In the latter part of this section we address the related topic of specification tests, which test the hypothesis of a parametric against the alternative nonparametric regression function.

## 4.4.1 Pointwise Confidence Intervals

Now that you have become familiar with nonparametric regression, you may want to know: How close is the smoothed curve to the true curve? Recall that we asked the same question when we introduced the method of kernel density estimation. There, we made use of (pointwise) confidence intervals and (global) confidence bands. But to construct this measure, we first had to derive the (asymptotic) sampling distribution.

The following theorem establishes the asymptotic distribution of the Nadaraya-Watson kernel estimator for one-dimensional predictor variables.

THEOREM 4.5
Suppose that and are twice differentiable, for some , is a continuity point of and and . Take . Then

with

The asymptotic bias is proportional to the second moment of the kernel and a measure of local curvature of . This measure of local curvature is not a function of alone but also of the marginal density. At maxima or minima, the bias is a multiple of alone; at inflection points it is just a multiple of only.

We now use this result to define confidence intervals. Suppose that the bias is of negligible magnitude compared to the variance, e.g. if the bandwidth is sufficiently small. Then we can compute approximate confidence intervals with the following formula:

 (4.55)

where is the -quantile of the standard normal distribution and the estimate of the variance is given by

with the weights from The Nadaraya-Watson estimator.

EXAMPLE 4.14
Figure 4.15 shows the Engel curve from the 1973 U.K. net-income versus food example with confidence intervals. As we can see, the bump in the right part of the regression curve is not significant at 5% level.

## 4.4.2 Confidence Bands

As we have seen in the density case, uniform confidence bands for need rather restrictive assumptions. The derivation of uniform confidence bands is again based on Bickel & Rosenblatt (1973).

THEOREM 4.6
Suppose that the support of is , on , and that , and are twice differentiable. Moreover, assume that is differentiable with support with , is bounded for all . Then for ,

where

In practice, the data are transformed to the interval , then the confidence bands are computed and rescaled to the original scale of .

The following comprehensive example covers local polynomial kernel regression as well as optimal smoothing parameter selection and confidence bands.

EXAMPLE 4.15
The behavior of foreign exchange (FX) rates has been the subject of many recent investigations. A correct understanding of the foreign exchange rate dynamics has important implications for international asset pricing theories, the pricing of contingent claims and policy-oriented questions.

In the past, one of the most important exchange rates was that of Deutsche Mark (DM) to US Dollar (USD). The data that we consider here are from Olsen & Associates, Zürich. They contains the following numbers of quotes during the period Oct 1 1992 and Sept 30 1993. The data have been transformed as described in Bossaerts et al. (1996).

We present now the regression smoothing approach with local linear estimation of the conditional mean (mean function) and the conditional variance (variance function) of the FX returns

with being the FX rates. An extension of the autoregressive conditional heteroscedasticity model (ARCH model) is the conditional heteroscedastic autoregressive nonlinear model (CHARN model)

 (4.56)

The task is to estimate the mean function and the variance function . As already mentioned we use local linear estimation here. For details of assumptions and asymptotics of the local polynomial procedure in time series see Härdle & Tsybakov (1997). Here, local linear estimation means to compute the following weighted least squares problems

Denoting the true regression function of

then the estimators of and are the first elements of the vectors and , respectively. Consequently, a possible variance estimate is

with

and the first unit vector in .

The estimated functions are plotted together with approximate 95% confidence bands, which can be obtained from the asymptotic normal distribution of the local polynomial estimator. The cross-validation optimal bandwidth is used for the local linear estimation of the mean function in Figure 4.16. As indicated by the 95% confidence bands, the estimation is not very robust at the boundaries. Therefore, Figure 4.16 covers a truncated range. Analogously, the variance estimate is shown in Figure 4.17, using the cross-validation optimal bandwidth .

The basic results are the mean reversion and the smiling'' shape of the conditional variance. Conditional heteroscedasticity appears to be very distinct. For DM/USD a reverted leverage effect'' can be observed, meaning that the conditional variance is higher for positive lagged returns than for negative ones of the same size. But note that the difference is still within the 95% confidence bands.

## 4.4.3 Hypothesis Testing

In this book we will treat the topic of testing not as a topic of its own, being aware that this would be an enormous task. Instead, we concentrate on cases where regression estimators have a direct application in specification testing. We will only concentrate on methodology and skip any discussion about efficiency.

As this is the first section where we deal with testing, let us start with some brief, but general considerations about non- and semiparametric testing. Firstly, you should free your mind of the facts that you know about testing in the parametric world. No parameter is estimated so far, consequently it cannot be the target of interest to test for significance or linear restrictions of the parameters. Looking at our nonparametric estimates typical questions that may arise are:

• Is there indeed an impact of on ?
• Is the estimated function significantly different from the traditional parameterization (e.g. the linear or log-linear model)?
Secondly, in contrast to parametric regression, with non- and semiparametrics the problems of estimation and testing are not equivalent anymore. We speak here of equivalence in the sense that, in the parametric world, interval estimation corresponds to parameter testing. It turns out that the optimal rates of convergence are different for nonparametric estimation and nonparametric testing. As a consequence, the choice of smoothing parameter is an issue to be discussed separately in both cases. Moreover, the optimality discussion for nonparametric testing is in general quite a controversial one and far from being obvious. This unfortunately concerns all aspects of nonparametric testing. For instance, the construction of confidence bands around a nonparametric function estimate to decide whether it is significantly different from being linear, can lead to a much too conservative and thus inefficient testing procedure.

Let us now turn to the fundamentals of nonparametric testing. Indeed, the appropriateness of a parametric model may be judged by comparing the parametric fit with a nonparametric estimator. This can be done in various ways, e.g. you may use a (weighted) squared deviation between the two models. A simple (but in many situations inefficient) approach would be to use critical values from the asymptotic distribution of this statistic. Better results are usually obtained by approximating the distribution of the test statistics using a resampling method.

Before introducing a specific test statistic we have to specify the hypothesis and the alternative . To make it easy let us start with a nonparametric regression . Our first null hypothesis is that has no impact on . If we assume (otherwise take ), then we may be interested to test

As throughout this chapter we do not want to make any assumptions about the function other than smoothness conditions. Having an estimate of at hand, e.g. the Nadaraya-Watson estimate from (4.6), a natural measure for the deviation from zero is

 (4.57)

where denotes a weight function (typically chosen by the empirical researcher). This weight function often serves to trim the boundaries or regions of sparse data. If the weight function is equal to , i.e.

with being the density of and another weight function, one could take the empirical version of (4.57)

 (4.58)

as a test statistic.

It is clear that under both test statistics and must converge to zero, whereas under the condition also lets the statistic increase to infinity. Note that under our estimate does not have any bias (cf. Theorem 4.3) that could matter in the squared deviation . Actually, with the same assumptions we needed for the kernel estimator , we find that under the null hypothesis and converge to a distribution with

 (4.59)

As used previously, denotes the conditional variance .

Let us now consider the more general null hypothesis. Suppose we are interested in a specific parametric model given by and is a (parametric) function, known up to the parameter . This means

A consistent estimator for is usually easy to obtain (by least squares, maximum likelihood, or as a moment estimator, for example). The analog to statistic (4.58) is then obtained by using the deviation from , i.e.

 (4.60)

However, this test statistic involves the following problem: Whereas is (asymptotically) unbiased and converging at rate , our nonparametric estimate has a kernel smoothing'' bias and converges at rate . For that reason, Härdle & Mammen (1993) propose to introduce an artificial bias by replacing with
 (4.61)

in statistic (4.60). More specifically, we use

 (4.62)

As a result of this, under the bias of cancels out that of and the convergence rates are also the same.

EXAMPLE 4.16
Consider the expected wage () as a function of years of professional experience (). The common parameterization for this relationship is

and we are interested in verifying this quadratic form. So, we firstly estimate by least squares and set . Secondly, we calculate the kernel estimates and , as in (4.6) and (4.61). Finally, we apply test statistic (4.62) to these two smoothers. If the statistic is large'', we reject

The remaining question of the example is: How to find the critical value for large''? The typical approach in parametric statistics is to obtain the critical value from the asymptotic distribution. This is principally possible in our nonparametric problem as well:

THEOREM 4.7
Assume the conditions of Theorem 4.3, further that is a -consistent estimator for and . Then it holds

As in the parametric case, we have to estimate the variance expression in the normal distribution. However, with an appropriate estimate for this is no obstacle. The main practical problem here is the very slow convergence of towards the normal distribution.

For that reason, approximations of the critical values corresponding to the finite sample distribution are used. The most popular way to approximate this finite sample distribution is via a resampling scheme: simulate the distribution of your test statistic under the hypothesis (i.e. resample'') and determine the critical values based on that simulated distribution. This method is called Monte Carlo method or bootstrap, depending on how the distribution of the test statistic can be simulated. Depending on the context, different resampling procedures have to be applied. Later on, for each particular case we will introduce not only the test statistic but also an appropriate resampling method.

For our current testing problem the possibly most popular resampling method is the so-called wild bootstrap introduced by Wu (1986). One of its advantages is that it allows for a heterogeneous variance in the residuals. Härdle & Mammen (1993) introduced wild bootstrap into the context of nonparametric hypothesis testing as considered here. The principal idea is to resample from the residuals , , that we got under the null hypothesis. Each bootstrap residual is drawn from a distribution that coincides with the distribution of up to the first three moments. The testing procedure then consists of the following steps:

(a)
Estimate the regression function under the null hypothesis and construct the residuals .
(b)
For each , draw a bootstrap residual so that

(c)
Generate a bootstrap sample by setting

(d)
From this sample, calculate the bootstrap test statistic in the same way as the original is calculated.
(e)
Repeat steps (b) to (d) times ( being several hundred or thousand) and use the generated test statistics to determine the quantiles of the test statistic under the null hypothesis. This gives you approximative values for the critical values for your test statistic .

One famous method which fulfills the conditions in step (c) is the so-called golden cut method. Here we draw from the two-point distribution with probability mass at

occurring with probabilities and , respectively. In the second part of this book you will see more examples of Monte Carlo and bootstrap methods.

Let us mention that besides the type of test statistics that we introduced here, other distance measures are plausible. However, all test statistics can be considered as estimates of one of the following expressions:

 (4.63) (4.64) (4.65) (4.66)

being the residuum under at point , and the weight function as above. Furthermore, is the error variance under the hypothesis, and the one under the alternative. Obviously, our test statistics (4.58) and (4.62) are estimates of expression (4.63).

The question Which is the best test statistic?'' has no simple answer. An optimal test should keep the nominal significance level under the hypothesis and provide the highest power under the alternative. However, in practice it turns out that the behavior of a specific test may depend on the model, the error distribution, the design density and the weight function. This leads to an increasing number of proposals for the considered testing problem. We refer to the bibliographic notes for additional references to (4.63)-(4.66) and for further test approaches.