3.3 Proof of Proposition

The proof of this proposition (3.1.1) follows a technique used by Parzen (1962) in the setting of density estimation. Recall the definition of the kernel weights,

$\begin{displaymath}W_{hi}(x)=K_h(x-X_i)/ \hat f_h(x).\end{displaymath}$

Consider the denominator and numerator separately. I show that

$\begin{displaymath} \hat r_h(x) = n^{-1} \sum^n_{i=1} K_h(x-X_i) Y_i\ {\buildrel p \over \to} \ m(x) f(x)=r(x), \end{displaymath}$

(3.3.14)

$\begin{displaymath} \hat f_h(x) = n^{-1} \sum^n_{i=1} K_h(x-X_i) \ {\buildrel p \over \to} \ f(x). \end{displaymath}$

(3.3.15)

From (3.1.9) and (3.3.15) it follows by Slutzky's Theorem Schönfeld (1969, Chapter 6), that

$\begin{displaymath}\hat r_h(x)/\hat f_hx) \ {\buildrel p \over \to} \ r(x)/f(x) = {m(x)f(x)\over f(x)} = m(x). \end{displaymath}$

Only (3.3.14) is shown; the statement (3.3.15) can be proved very similarly. Note that

$\begin{displaymath}E \hat r_h(x)=\int \int K_h(x-u)y f(u,y) du dy,\end{displaymath}$

where

denotes the joint density of the distribution of

Conditioning on

gives

$\begin{displaymath}\int K_h(x-u) r(u)du,\end{displaymath}$

since

$\begin{displaymath}m(u) = \int yf(y \vert u) dy = \int yf(u,y) dy/\int f(u,y) dy. \end{displaymath}$

Using integration by substitution it can be shown (see Lemma 3.1 in these Complements) that for $\delta >0$

$\begin{eqnarray*} \left\vert E \hat r_h(x)-r(x) \right\vert \le & \sup_{\left\ve... ...t s \right\vert \ge \delta/h} \left\vert K(s) \right\vert ds.& \end{eqnarray*}$

The last two terms of this bound tend to zero, by (A1) and (A2), as $n \to \infty$ . Now let $\delta$ tend to zero; then the first term by continuity of $r(\cdot)$ will tend to zero. This proves that $E\hat r_h(x) - r(x) = o(1)$ , as $n \to \infty$ . Now let $s^2(x)=E(Y^2 \vert X=x)$ . Use integration by substitution and the above asymptotic unbiasedness of $\hat r_h(x)$ to see that the variance of $\hat r_h(x)$ is

$\begin{eqnarray*} var(\hat r_h(x))&=& n^{-2} \sum^n_{i=1} var(K_h(x-X_i) Y_i) \\... ... \\ &\approx& n^{-1} h^{-1} \int K^2(u)s^2(x+uh)f(x+uh)du.\\ \end{eqnarray*}$

This is asymptotically equal to $n^{-1}h^{-1} \int K^2(u) du$ using the techniques of splitting up the same integrals as above. Observe now that the variance tends to zero as $nh \to \infty$ . This completes the argument since the mean squared error $E(\hat r_h(x)-r(x))^2 = var(\hat r_h(x)) + [E\hat r_h(x)-r(x)]^2 \to 0$ as $n \to \infty,\ nh \to \infty,\ h\to 0$ . Thus we have seen that

$\begin{displaymath}\hat r_h(x) \ {\buildrel 2 \over \to} \ r(x).\end{displaymath}$

This implies

$\begin{displaymath}\hat r_h(x) \ {\buildrel p \over \to} \ r(x),\end{displaymath}$

(see Schönfeld; 1969, chapter 6) proof can be adapted to kernel estimation with higher dimensional

. If

-dimensional, change

to $h^{-d} K(x/h)$ , where

: $\mathbb{R}^d\to \mathbb{R}$ and the ratio in the argument of

has to be understood coordinatewise.

LEMMA 3.1 The estimator $\hat r_h(x)$ is asymptotically unbiased as an estimator for

Use integration by substitution and the fact that the kernel integrates to one to bound

$\begin{eqnarray*} \left\vert E \hat r_h(x)-r(x) \right\vert &=&\int K_h(x-u)(r(u... ... \left\vert r(x) \right\vert ds\\ &=&T_{1n}+T_{2n}+T_{3n}. \end{eqnarray*}$

The first term can be bounded in the following way:

$\begin{displaymath}T_{1n} \le \sup_{\left\vert s \right\vert \le \delta} \left\v... ...r(x-s)-r(x) \right\vert \int \left\vert K(s) \right\vert ds. \end{displaymath}$

The third term

$\begin{displaymath}T_{3n} \le \left\vert r(x) \right\vert \int \left\vert K(s) \right\vert ds. \end{displaymath}$

The second term can be bounded as follows:

$\begin{eqnarray*} T_{2n}&=& \int_{\left\vert s \right\vert > \delta} \left\vert ... ...ft\vert sK(s) \right\vert \int \left\vert r(s) \right\vert ds. \end{eqnarray*}$

Note that the last integral exists by assumption (A3) of Proposition 3.1.1

3.3.1 Sketch of Proof for Proposition

The derivative estimator $\hat m_h^{(k)}(x)$ is asymptotically unbiased.

$\displaystyle E\hat m_h^{(k)}(x)$	$\textstyle =$	$\displaystyle n^{-1} h^{-(k+1)} \sum_{i=1}^n K^{(k)}\left({x-X_i \over h}\right)\ m(X_i)$	(3.3.16)
	$\textstyle \approx$	$\displaystyle h^{-k}\ \int K^{(k)}(u)\ m(x-uh) du$
	$\textstyle =$	$\displaystyle h^{-k+1}\ \int K^{(k-1)}(u) \ m^{(1)}(x-uh)du$
	$\textstyle =$	$\displaystyle \int K(u)\ m^{(k)}(x-uh)du$
		$\displaystyle \sim m^{(k)}(x) + h^ {d^{(k)}_K} m^{(k+2)}(x)/(k+2)!, \; h \to 0,$

using partial integration, (A0) and (A4).

The variance of $\hat m_h^{(k)}(x)$ tends to zero if $nh^{2k+1}\to \infty$ , as the following calculations show:

$\displaystyle var\{ \hat m_h^{(k)}(x)\}$	$\textstyle =$	$\displaystyle n^{-1}h^{-2(k+1)}\,\sum\limits_{i=1}^n \left[ K^{(k)}\ \left({x-X_i\over h}\right) \right]^2 \sigma^2$	(3.3.17)
	$\textstyle \approx$	$\displaystyle n^{-1}h^{-2k-1}\ \int [K^{(k)}(u)]^2 du \, \sigma^2.$