14.4 Data Compression

The amount of data compression in a certain basis is quite an important feature for several reasons. Obviously it is always preferable to store information with as few as possible bytes. This is even necessary for huge data sets, e.g. for Internet transfers. This can be attained if the information, i.e. a collection of signals, to be stored can be described by a small number of basis functions.

The ability to provide a sparse representation of functions is also important in nonparametric statistics, where an unknown function is observed with a certain noise. Standard problems are nonparametric regression, density estimation and spectral density estimation. In order to improve the rough information about a possibly smooth function given by noisy data one usually applies smoothing methods. A particular class of such methods is given by truncated or otherwise regularized orthonormal series estimators. These estimators are based on empirical versions of the coefficients defining the orthonormal series. Analogous to classical Fourier series, the true wavelet coefficients are given as integrals

$\displaystyle \alpha_{jk}=\int f(t)\varphi_{jk}(t)dt
$

and their empirical versions based on data $ X_1, \dots, X_n$ are calculated as

$\displaystyle \tilde{\alpha}_{jk}=\frac{1}{n}\sum\varphi_{jk}(X_i).
$

The smoothing step is then performed in the domain of the coefficients, e.g. by thresholding. Finally, the estimator is synthesized from these modified empirical coefficients by the inverse transform.

Usually, the risk of a truncated orthonormal series estimator is appropriately proportional to the squared noise level times the number of coefficients included in this estimator, plus some approximation error due to the truncation of the expansion. Hence, an efficient data compression will directly provide the possibility of constructing good statistical estimators.

There are two important mathematical characterizations of the ability of data compression.

The following display provides a comparison of the ability to compress a function with spatially inhomogeneous smoothness properties by a wavelet transform (left) and the Fourier transform (right). The lower windows show the magnitude of the ordered coefficients. The fatter tail on the right reflects that the Fourier transform needs more coefficients to grasp the functional form.


33285

You can e XploRe the possibilities of data compression with the following interactive menu.


33292