17.1 Deconvolution density and regression estimates

Deconvolution kernel estimates have been described and extensively discussed in the context of estimating a probability density from independent and identically distributed data (Stefansky and Carroll; 1990; Carroll and Hall; 1988). To explain the basic idea behind this type of estimates we consider the deconvolution problem first. Let $ \xi _1 , \ldots, \xi_N$ be independent and identically distributed real random variables with density $ p_\xi
(x)$ which we want to estimate. We do not, however, observe the $ \xi_k$ directly but only with additive errors $ \eta_1, \ldots, \eta_N$. Let us assume that the $ \eta_k$ as well are independent and identically distributed with density $ p_\eta (x)$ and independent of the $ \xi_k.$ Hence, the available data are

$\displaystyle X_k = \xi_k + \eta_k\ ,\quad k = 1, \ldots, N. $

To be able to identify the distribution of the $ \xi_k$ from the errors $ \eta_k$ at all, we have to assume that $ p_\eta (x)$ is known. The density of the observations $ X_k$ is just the convolution of $ p_\xi$ with $ p_\eta$:

$\displaystyle p_x (x) = p_\xi (x) \star p_\eta (x)\,. $

We can therefore try to estimate $ p_x(x)$ by a common kernel estimate and extract an estimate for $ p_\xi
(x)$ out of it. This kind of deconvolution operation is preferably performed in the frequency domain, i.e. after applying a Fourier transform. As the subsequent inverse Fourier transform includes already a smoothing part we can start with the empirical distribution of $ X_1, \ldots, X_N$ instead of a smoothed version of it. In detail, we calculate the Fourier transform or characteristic function of the empirical law of $ X_1, \ldots, X_N$, i.e. the sample characteristic function

$\displaystyle \widehat{\phi} _x (\omega) = \frac{1}{N} \, \sum^ N_{k=1} e^ {i\omega X_k} \,. $

Let

$\displaystyle \phi_\eta (\omega) = \textrm{E}( e^ {i\omega \eta_k}) = \int^ \infty_{-\infty} e^
{i\omega u} p_\eta (u) \, du $

denote the (known) characteristic function of the $ \eta_k.$ Furthermore, let $ K$ be a common kernel function, i.e. a nonnegative continuous function which is symmetric around 0 and integrates up to 1: $ \int K(u)\, du = 1, $ and let

$\displaystyle \phi_K (\omega) = \int e^ {i\omega u} K(u)\, du $

be its Fourier transform. Then, the deconvolution kernel density estimate of $ p_\xi
(x)$ is defined as

$\displaystyle \widehat{p}_h (x) = \frac{1}{2\pi} \int^ \infty_{-\infty} e^ {-i\...
...omega h) \, \frac{\widehat{\phi} _x (\omega)}{\phi_\eta (\omega)}\,
d\omega\ . $

The name of this estimate is explained by the fact that it may be written equivalently as a kernel density estimate

$\displaystyle \widehat{p}_h (x) = \frac{1}{Nh} \sum^ N_{k=1} K^h \left(\frac{x-X_k}{h}\right) $

with deconvolution kernel

$\displaystyle K^h (u) = \frac{1}{2\pi} \int^ \infty_{-\infty} e^ {-i\omega u}
\frac{\phi _K (\omega)}{\phi_\eta (\omega /h)} \, d\omega $

depending explicitly on the smoothing parameter $ h$. Based on this kernel estimate for probability densities, Fan and Truong (1993) considered the analogous deconvolution kernel regression estimate defined as

$\displaystyle \widehat{m}_h (x) = \frac{1}{Nh} \sum^ N_{k=1} K^h \left(\frac{x-X_k}{h}\right)\,
Y_k \ / \ \widehat{p}_h (x) . $

This Nadaraya-Watson-type estimate is consistent for the regression function $ m(x)$ in an errors-in-variables regression model

$\displaystyle Y_k = m (\xi_k) + W_k,\quad X_k = \xi_k + \eta_k,\quad k=1, \ldots, N, $

where $ W_1, \ldots, W_N$ are independent identically distributed zero-mean random variables independent of the $ X_k, \xi_k, \eta_k$ which are chosen as above. The $ X_k, Y_k$ are observed, and the probability density of the $ \eta_k$ has to be known.