14.1 Nonparametric Regression

In this section we introduce several basic terms and ideas from the theory of nonparametric regressions and explain in particular the method of local polynomial regression. To conclude we explain how this can be applied to (financial) time series. A detailed representation can be found in Härdle et al. (2004).

In nonparametric regression one is interested in the (functional) relationship between an explanatory variable and a dependent variable , i.e., one is interested in obtaining an estimation for the unknown function $m(x) = {\mathop{\text{\rm\sf E}}}[Y \, \vert \, X=x]$ . In doing this, in contrast to parametric statistics, no special assumptions on the form of the function is made. Only certain regularity and smoothing assumptions are made about .

One way to estimate is to use the method of local polynomial regression (LP Method). The idea is based on the fact that the function can be locally approximated with a Taylor polynomial, i.e., in a neighborhood around a given point it holds that

$\displaystyle m(x) \approx \sum_{k=0}^p \frac{m^{(k)}(x_0)}{k!}(x-x_0)^k.$

(14.2)

In order to find an estimate for

at point

, one therefore tries to find a polynomial based on observations $(X_1, Y_1), \dots, (X_n, Y_n)$ that is a good approximation of

around

. As a measure of the quality of the approximation one usually chooses a LS criterion, i.e., one wants to minimize the expression

$\displaystyle \sum_{i=1}^n \Big\{ Y_{i} - \sum_{j=0}^p \beta_j (X_{i} - x_0)^j \Big\}^2$

(14.3)

with respect to $\beta = (\beta_0,\dots,\beta_p)^\top$ . Since the representation (13.2) holds only locally, one still has to take into consideration that some of the observations

may not lie close enough to

and thus (13.2) no longer applies to them. One must then sufficiently localize the observations, i.e., only consider those observations that lie close enough to

One of the classical methods for localization is based on weighting the data with the help of a kernel. A kernel is a function $K: \mathbb{R}\longrightarrow [0,\infty)$ with $\int\, K(u)\, du =1$ . The most useful kernels are also symmetric and disappear outside of a suitable interval around the zero point.

If is a kernel and , then the kernel

$\displaystyle K_h (u) = \frac{1}{h} K\big( \frac{u}{h} \big)$

is re-scaled with the bandwidth

, which again integrates to 1. If, for example, the initial kernel

disappears outside of the interval

, then

is zero outside of the interval

. By weighting the

th term in (13.3) with

, one has a minimization problem which, due to the applied localization, can be formulated to be independent of the point

. The coefficient vector $\hat{\beta} = \hat{\beta}(x) = (\hat{\beta}_0 (x), \dots, \hat{\beta}_p (x))^\top$ that determines the polynomial of the point

is thus given by

$\displaystyle \hat{\beta} = \arg \min_{\beta} \sum_{i=1}^n \Big\{ Y_{i} - \sum_{j=0}^p \beta_j (x-X_i)^j \Big\}^2 \, K_h(x-X_i).$

(14.4)

It is obvious that $\hat{\beta}$ depends heavily on the choice of kernel and the bandwidth. Different methods for determining

and

are introduced in Härdle et al. (2004).

With the representation

$\displaystyle \mathbf{X}$	$\displaystyle =$	$\displaystyle \left( \begin{array}{ccccc} 1 & X_1-x & (X_1-x)^2 & \dots & (X_1-... ...bf{Y} = \left( \begin{array}{c} Y_1 \\ \vdots \\ Y_n \\ \end{array} \right),$
$\displaystyle \mathbf{W}$	$\displaystyle =$	$\displaystyle \left( \begin{array}{ccc} K_h (x-X_1) & & 0 \\ & \ddots & \\ 0 & & K_h(x-X_n) \\ \end{array} \right)$

the solution $\hat{\beta}$ to the weighted least squares problem (13.4) can be explicitly written as

$\displaystyle \hat{\beta}(x) = \big( \mathbf{X}^\top \mathbf{W X} \big)^{-1} \, \mathbf{X}^\top \mathbf{W Y}$

(14.5)

The estimation $\hat{m}(x)$ for can be obtained only by calculating the approximating polynomial at :

$\displaystyle \hat{m}(x) = \hat{\beta}_0 (x).$

(14.6)

The remaining components of $\hat{\beta}(x)$ , due to equations (13.2) and (13.3) deliver estimators for the derivatives of : $\hat{m}^{(j)}(x) = j! \, \hat{\beta}_j (x), j=1, ...,p$ , which will not be discussed in further detail here. In the special case where , $\hat{m}(x)$ is a typical kernel estimator of Nadaraya-Watson type, see Härdle (1990).

The similarly derived method of local polynomial approximation, or LP method for short, will now be applied to a time series . As mentioned before, one is most interested in creating forecasts.

For the simplest case a one-step-ahead forecast means that the functional relationship between $Y_{i-1}$ and a function $\lambda(Y_i)$ of will be analyzed, i.e., we want to obtain an estimate for the unknown function

$\displaystyle m(x) = {\mathop{\text{\rm\sf E}}}\big[ \lambda(Y_i)\, \vert\, Y_{i-1} = x \big].$

In order to apply the LP Method mentioned above, consider a given sample $Y_0, \dots, Y_n$ as observations of the form $(Y_0, Y_1),\dots,(Y_{n-1}, Y_n)$ . The process

must fulfil certain conditions, so that these observations are identically distributed and in particular so that the function

is independent of the time index

. Such is the case when

is stationary. By substituting $X_{i} = Y_{i-1}$ into (13.4) and replacing

with $\lambda(Y_i)$ , we obtain in this situation

$\displaystyle \hat{\beta} = \arg \min_{\beta} \sum_{i=1}^n \Big\{\lambda(Y_i) - \sum_{j=0}^p \beta_j (x-Y_{i-1})^j \Big\}^2 \, K_h(x-Y_{i-1}) ,$

(14.7)

and the estimate for

is again given by $\hat{\beta}_0 (x).$