6.2 Complements

In order to make an $M$-type kernel estimate scale invariant, it must be coupled with an estimate of scale. This coupling can be done by simultaneously estimating the regression and the scale curve. To fix ideas, assume that

\begin{displaymath}f(y\vert x)=(1/\sigma(x)) f_0((y-m(x))/ \sigma(x)),
x \in \mathbb{R}^d,\end{displaymath}

with an unknown $f_0$ and regression curve $m(x)$ and scale curve $\sigma (x)$. Define, for the moment,

\begin{displaymath}\psi(u) = -(d/du) \log f(u \vert x)\end{displaymath}

and

\begin{displaymath}\chi(u) = (\psi(u) u - 1).\end{displaymath}

Also define for $v \in \mathbb{R}, w \in \mathbb{R}^+$ and fixed $x \in \mathbb{R}^d$,
$\displaystyle T_1(v,w)$ $\textstyle =$ $\displaystyle \int \psi\left({y-v \over w}\right) d F(y \vert x)$ (6.2.8)
$\displaystyle T_2(v,w)$ $\textstyle =$ $\displaystyle \int \chi \left({y-v \over w}\right) d F(y \vert x).$ (6.2.9)

The curves $(m(x),\sigma(x))$ satisfy by definition

\begin{displaymath}T_1(m(x), \sigma(x)) =T_2(m(x),\sigma(x))=0.\end{displaymath}

In practice, one does not know $F(\cdot\vert x)$ and hence cannot compute $T_1$ or $T_2$. The approach taken is to replace $F(\cdot\vert x)$ by $F_n(\cdot \vert x)$, a kernel estimate of the conditional distribution function, and to assume that $\psi$ and $\chi$ are bounded functions to achieve desirable robustness properties. Huber (1981, chapter 6.4) gives examples of functions $\psi$ and $\chi$. One of them is

\begin{eqnarray*}
\psi(u) &= &\min (c, \max(-c,u)),\ c>0, \cr
\chi(u) &=& \psi^2(u) - \beta, \end{eqnarray*}



with $\beta= E_{\Phi} \psi^2(u),$ where $\Phi$ denotes the standard normal distribution. Consistency for the scale estimate may be obtained for the normal model: Under the assumption that the error is standard normally distributed, the functions $\psi(u)=u$ and $\chi(u)= \psi^2(u)- \beta = u^2-1
$ give the conditional mean as regression curve $m(x)$ and the conditional standard deviation as scale curve $\sigma (x)$. In fact, the parameter $\beta$ plays the role of a normalizing constant: If one wishes to ``interpret" the scale curve with respect to some other distribution $G$ different from the normal $\Phi$, one can set $\beta = E_G \psi^2(u)$.

The functions $T_1$ and $T_2$ can be estimated by Nadaraya-Watson kernel weights $\{ W_{hi} (x) \}^n_{i=1}$ (as in 3.1.1)

$\displaystyle \hat T_{1h}(v,w)$ $\textstyle =$ $\displaystyle n^{-1} \sum^n_{i=1} W_{hi}
(x) \psi\left({Y_i-v \over w}\right),$ (6.2.10)
$\displaystyle \hat T_{2h}(v,w)$ $\textstyle =$ $\displaystyle n^{-1} \sum^n_{i=1} W_{hi}(x) \chi \left({Y_i-v \over
w}\right)\cdotp$ (6.2.11)

Call a joint solution of $T_{1h}(v,w)=T_{2h}(v,w)=0$ a resistant regression and scale curve smoother $(\hat m^M_h (x), \hat\sigma^M_h(x))$. Consistency and asymptotic normality of this smoother were shown under regularity conditions on the kernel and the functions $(m(x),\sigma(x))$ in Härdle and Tsybakov (1988). Optimization of the smoothing parameter for this procedure was considered by Tsybakov (1987).