1.1 Density Estimation

Consider a continuous random variable and its probability density function (pdf). The pdf tells you ``how the random variable is distributed". From the pdf you cannot only calculate the statistical characteristics as mean and variance, but also the probability that this variable will take on values in a certain interval.

The pdf is, thus, very useful as it characterizes completely the ``behavior'' of a random variable. This fact might provide enough motivation to study nonparametric density estimation. Moreover nonparametric density estimates can serve as a building block in nonparametric regression estimation, as regression functions are fully characterized through the distribution of two (or more) variables.

The following example, which uses data from the Family Expenditure Survey of each year from 1969 to 1983, gives some illustration of the fact that density estimation has a substantial application in its own right.

**Figure:** Log-normal density estimates (upper graph) versus kernel density estimates (lower graph) of net-income, U.K. Family Expenditure Survey 1969-83 `SPMfesdensities`
$\includegraphics[width=1.2\defpicwidth]{SPMfesdensitiesA.ps}$ $\includegraphics[width=1.2\defpicwidth]{SPMfesdensitiesB.ps}$

EXAMPLE 1.1
Imagine that we have to answer the following questions: Is there a change in the structure of the income distribution during the period from 1969 to 1983? (You may recall, that many people argued that the neo-liberal policies of former Prime Minister Margaret Thatcher promoted income inequality in the early 1980s.)

To answer this question, we have estimated the distribution of net-income for each year from 1969 to 1983 both parametrically and nonparametrically. In parametric estimation of the distribution of income we have followed standard practice by fitting a log-normal distribution to the data. We employed the method of kernel density estimation (a generalization of the familiar histogram, as we will soon see) to estimate the income distribution nonparametrically. In the upper graph in Figure 1.1 we have plotted the estimated log-normal densities for each of the 15 years: Note that they are all very similar. On the other hand the analogous plot of the kernel density estimates show a movement of the net-income mode (the maximum of the density) to the left (Figure 1.1, lower graph). This indicates that the net-income distribution has in fact changed during this 15 year period. $\Box$