In this section we consider the problem of recovering the regression function from noisy data based on a wavelet decomposition.
The upper two plots in the display show the underlying regression function (left) and noisy data (right). The lower two plots show the distribution of the wavelet coefficients in the time-scale domain for the regression function and the noisy data.
For simplicity we deal with a regression estimation problem. The noise is assumed to be additive and Gaussian with zero mean and unknown variance. Since the wavelet transform is an orthogonal transform we can consider the filtering problem in the space of wavelet coefficients. One takes the wavelet transform of the noisy data and tries to estimate the wavelet coefficients. Once the estimator is obtained one takes the inverse wavelet transform and recovers the unknown regression function.
To suppress the noise in the data two approaches are normally used. The first one is the so-called linear method. We have seen in Section 14.6 that the wavelet decomposition reflects well the properties of the signal in the frequency domain. Roughly speaking, higher decomposition scales correspond to higher frequency components in the regression function. If we assume that the underlying regression is allocated in the low frequency domain then the filtering procedure becomes evident. All empirical wavelet coefficients beyond some resolution scale are estimated by zero. This procedure works well if the signal is sufficiently smooth and when there is no boundary effect in the data. But for many practical problems such an approach does not seem to be fully appropriate, e.g. images cannot be considered as smooth functions.
To avoid this shortcoming often a nonlinear filtering procedure is used to suppress the noise in the empirical wavelet coefficients. The main idea is based on the fundamental property of the wavelet transform: father and mother functions are well localized in time domain. Therefore one could estimate the empirical wavelet coefficients independently. To do this let us compare the absolute value of the empirical wavelet coefficient and the standard deviation of the noise. It is clear that if the wavelet coefficient is of the same order of the noise level, then we cannot separate the signal and the noise. In this situation a good estimator for the wavelet coefficient is zero. In the case when an empirical wavelet coefficient is greater than the noise level a natural estimator for a wavelet coefficient is the empirical wavelet coefficient itself. This idea is called thresholding.
We divide the different modifications of thresholding in mainly three methods: hard thresholding, soft thresholding and a levelwise thresholding using Stein risk estimator, see Subsection 14.7.3.
To suppress the noise we apply the following nonlinear transform to the empirical wavelet coefficients: , where is a certain threshold. The choice of the threshold is a very delicate and important statistical problem.
On the one hand, a big threshold leads to a large bias of the estimator. But on the other hand, a small threshold increases the variance of the smoother. Theoretical considerations yield the following value of the threshold:
We provide two possibilities for choosing a threshold. First of all you can do it by ``eye'' using the Hard threshold item and entering the desired value of the threshold. The threshold offered is the one described in the paragraph before. Note that this threshold value is in most cases conservative and therefore we should choose a threshold value below the offered one. The item Automatic means that the threshold will be chosen as with a suitably estimated variance.
The plots of the underlying regression function and the data corrupted by the noise are shown in top of the display. You can change the variance of the noise and the regression function by using the Noise level and the Change function items.
On the display above you can see the estimated regression function after the thresholding procedure. The left upper window shows how the threshold procedure distorts the underlying regression function when there is no noise in the data. The final estimator is shown in the right upper window.
Two plots in the bottom of the display show the thresholded and the nonthresholded wavelet coefficients. The thresholded coefficients are shown as blue circles and the nonthresholded as red ones. The radii of the circles correspond to the magnitudes of the empirical wavelet coefficients.
You can try to improve the estimator using the items Change basis and Change level. Under an appropriate choice of the parameters you can decrease the bias of your smoother.
Along with hard thresholding in many statistical applications soft thresholding procedures are often used. In this section, we study the so-called wavelet shrinkage procedure for recovering the regression function from noisy data.
The display shows the same regression function as in the lesson before, without and with noise, after soft thresholding.
The only difference between the hard and the soft thresholding procedures is in the choice of the nonlinear transform on the empirical wavelet coefficients. For soft thresholding the following nonlinear transform is used:
In addition you can use a data driven procedure for choosing the threshold. This procedure is based on Stein's principle of unbiased risk estimation (Donoho and Johnstone; 1995). The main idea of this procedure is the following. Since is a continuous function, given , one can obtain an unbiased risk estimate for the soft thresholding procedure. Then the optimal threshold is obtained by minimization of the estimated risks.
We first give an intuitive idea about adaptive thresholding using Stein's principle. Later on we give a detailed mathematical description of the procedure.
We define a risk function with , where is the unknown coefficient, known (or to be estimated) scale parameters, i.i.d. random variables and the threshold.
Since Stein enables us to estimate the risk
This procedure can be performed for each resolution level.
Mathematical derivation
Now we discuss the data driven choice of threshold, initial level and the wavelet basis by the Stein (1981) method of unbiased risk estimation. The argument below follows Donoho and Johnstone (1995).
We first explain the Stein method for the idealized one-level observation model discussed in the previous section: , for , where is the vector of unknown parameters, are known scale parameters and are i.i.d. random variables.
Let be an estimator of . Introduce the mean squared risk of the estimate of : .
Assume that the estimators have the form
If the true parameters were known, one could compute explicitly. In practice this is not possible, and one chooses a certain approximation of as a minimizer of an unbiased estimator of the risk . To construct , note that
(14.3) |
The Stein principle is to minimize with respect to and take the minimizer
(14.4) |
In the rest of this paragraph we formulate the Stein principle for the example of soft thresholding wavelet estimators. For soft thresholding (see above) we have
An equivalent expression is
The expression in square brackets in (14.5) does not depend on . Thus, the definition (14.2) is equivalent to
Let be the permutation ordering the array such that , : and . According to (14.6) one obtains where
In particular for the above equation yields the following estimator
It is easy to see that the computation of the estimate of defined above requires approximately operations provided that a quick sort algorithm is used to order the array , .
We proceed from the idealized model to a more realistic density estimation model. In the context of wavelet smoothing the principle of unbiased risk estimation gives the following possibilities for adaptation:
Please notice that in XploRe we have implemented the first version. To demonstrate these possibilities consider the family of wavelet estimators
In the third case we assume that the minimum is taken over a finite number of given wavelet bases.
Remark 1: Since in practice the values of the different variances are not available, one can use their empirical versions instead. For example if (14.7) is the wavelet density estimator based on the sample , one can replace by its estimator.
It is clear that this yields a consistent estimator for the variances under rather general assumptions on the basis functions and on the underlying density of the 's.
Remark 2: If one wants to threshold only the coefficients , which is usually the case, the function for should be identically zero. Therefore, should be replaced by and Steins unbiased risk estimator takes the form
Let us now apply the Stein principle to a regression estimation example. We choose the following step function for :
The function was observed at 128 equispaced points and disturbed with Gaussian noise with variance . We use the Stein rule only for threshold choice 1 (level by level) and not for the cases 2 and 3 where the adaptive choice of and of the basis is considered. We thus choose the threshold as the minimizer with respect to of