3.3 Proof of Proposition

The proof of this proposition (3.1.1) follows a technique used by Parzen (1962) in the setting of density estimation. Recall the definition of the kernel weights,

\begin{displaymath}W_{hi}(x)=K_h(x-X_i)/ \hat f_h(x).\end{displaymath}

Consider the denominator and numerator separately. I show that
\hat r_h(x) = n^{-1} \sum^n_{i=1} K_h(x-X_i)
Y_i\ {\buildrel p \over \to} \ m(x) f(x)=r(x),
\end{displaymath} (3.3.14)

\hat f_h(x) = n^{-1} \sum^n_{i=1} K_h(x-X_i)
\ {\buildrel p \over \to} \ f(x).
\end{displaymath} (3.3.15)

From (3.1.9) and (3.3.15) it follows by Slutzky's Theorem Schönfeld (1969, Chapter 6), that

\begin{displaymath}\hat r_h(x)/\hat f_hx)
\ {\buildrel p \over \to} \ r(x)/f(x) = {m(x)f(x)\over f(x)} = m(x). \end{displaymath}

Only (3.3.14) is shown; the statement (3.3.15) can be proved very similarly. Note that

\begin{displaymath}E \hat r_h(x)=\int \int K_h(x-u)y f(u,y) du dy,\end{displaymath}

where $f(u,y)$ denotes the joint density of the distribution of $(X,Y).$ Conditioning on $u$ gives

\begin{displaymath}\int K_h(x-u) r(u)du,\end{displaymath}


\begin{displaymath}m(u) = \int yf(y \vert u) dy = \int yf(u,y) dy/\int f(u,y) dy. \end{displaymath}

Using integration by substitution it can be shown (see Lemma 3.1 in these Complements) that for $\delta >0$

\left\vert E \hat r_h(x)-r(x) \right\vert \le & \sup_{\left\ve...
...t s \right\vert \ge \delta/h} \left\vert K(s)
\right\vert ds.&

The last two terms of this bound tend to zero, by (A1) and (A2), as $n \to \infty$. Now let $\delta$ tend to zero; then the first term by continuity of $r(\cdot)$ will tend to zero. This proves that $E\hat r_h(x) - r(x) = o(1)$, as $n \to \infty$. Now let $s^2(x)=E(Y^2 \vert X=x)$. Use integration by substitution and the above asymptotic unbiasedness of $\hat r_h(x)$ to see that the variance of $\hat r_h(x)$ is

var(\hat r_h(x))&=& n^{-2} \sum^n_{i=1} var(K_h(x-X_i) Y_i) \\...
... \\
&\approx& n^{-1} h^{-1} \int K^2(u)s^2(x+uh)f(x+uh)du.\\

This is asymptotically equal to $n^{-1}h^{-1} \int K^2(u) du$ $s^2(x)f(x)$ using the techniques of splitting up the same integrals as above. Observe now that the variance tends to zero as $nh \to \infty$. This completes the argument since the mean squared error $E(\hat r_h(x)-r(x))^2 = var(\hat r_h(x)) + [E\hat r_h(x)-r(x)]^2 \to 0$ as $n \to \infty,\ nh \to \infty,\ h\to 0$. Thus we have seen that

\begin{displaymath}\hat r_h(x)
\ {\buildrel 2 \over \to} \ r(x).\end{displaymath}

This implies

\begin{displaymath}\hat r_h(x)
\ {\buildrel p \over \to} \ r(x),\end{displaymath}

(see Schönfeld; 1969, chapter 6) proof can be adapted to kernel estimation with higher dimensional $X$. If $X$ is $d$-dimensional, change $K_h$ to $h^{-d} K(x/h)$, where $K$: $\mathbb{R}^d\to \mathbb{R}$ and the ratio in the argument of $K$ has to be understood coordinatewise.

LEMMA 3.1   The estimator $\hat r_h(x)$ is asymptotically unbiased as an estimator for $r(x)$.

Use integration by substitution and the fact that the kernel integrates to one to bound

\left\vert E \hat r_h(x)-r(x) \right\vert &=&\int K_h(x-u)(r(u...
\left\vert r(x) \right\vert ds\\

The first term can be bounded in the following way:

\begin{displaymath}T_{1n} \le \sup_{\left\vert s \right\vert \le \delta} \left\v...
...r(x-s)-r(x) \right\vert
\int \left\vert K(s) \right\vert ds. \end{displaymath}

The third term

\begin{displaymath}T_{3n} \le \left\vert r(x) \right\vert \int \left\vert K(s) \right\vert ds. \end{displaymath}

The second term can be bounded as follows:

T_{2n}&=& \int_{\left\vert s \right\vert > \delta} \left\vert ...
...ft\vert sK(s) \right\vert \int
\left\vert r(s) \right\vert ds.

Note that the last integral exists by assumption (A3) of Proposition 3.1.1

3.3.1 Sketch of Proof for Proposition

The derivative estimator $\hat m_h^{(k)}(x)$ is asymptotically unbiased.

$\displaystyle E\hat m_h^{(k)}(x)$ $\textstyle =$ $\displaystyle n^{-1} h^{-(k+1)}
\sum_{i=1}^n K^{(k)}\left({x-X_i \over h}\right)\ m(X_i)$ (3.3.16)
  $\textstyle \approx$ $\displaystyle h^{-k}\ \int K^{(k)}(u)\ m(x-uh) du$  
  $\textstyle =$ $\displaystyle h^{-k+1}\ \int K^{(k-1)}(u) \ m^{(1)}(x-uh)du$  
  $\textstyle =$ $\displaystyle \int K(u)\ m^{(k)}(x-uh)du$  
    $\displaystyle \sim m^{(k)}(x) + h^ {d^{(k)}_K} m^{(k+2)}(x)/(k+2)!, \; h \to 0,$  

using partial integration, (A0) and (A4).

The variance of $\hat m_h^{(k)}(x)$ tends to zero if $nh^{2k+1}\to \infty
$, as the following calculations show:

$\displaystyle var\{ \hat m_h^{(k)}(x)\}$ $\textstyle =$ $\displaystyle n^{-1}h^{-2(k+1)}\,\sum\limits_{i=1}^n
\left[ K^{(k)}\ \left({x-X_i\over h}\right) \right]^2 \sigma^2$ (3.3.17)
  $\textstyle \approx$ $\displaystyle n^{-1}h^{-2k-1}\ \int [K^{(k)}(u)]^2 du \, \sigma^2.$