# 2.4 Averaged Shifted Histogram

Before we get to the details it might be a good idea to take a look at the end product of the procedure. If you look at Figure 2.6 you can see a histogram'' that has been obtained by averaging over eight histograms corresponding to different origins (four of these eight histograms are plotted in Figure 2.7).

The resulting averaged shifted histogram (ASH) is freed from the dependency of the origin and seems to correspond to a smaller binwidth than the histograms from which it was constructed. Even though the ASH can in some sense (which will be made more precise below) be viewed as having a smaller binwidth, you should be aware that it is not simply an ordinary histogram with a smaller binwidth (as you can easily see by looking at Figure 2.7 where we graphed an ordinary histogram with a comparable binwidth and origin ).

Let us move on to the details. Consider a bin grid corresponding to a histogram with origin and bins , , i.e.

Let us generate new bin grids by shifting each by the amount to the right

 (2.27)

EXAMPLE 2.1
As an example take :

Of course if we take then we get the original bin grid, i.e.

Now suppose we calculate a histogram for each of the bin grids. Then we get different estimates for at each

 (2.28)

The ASH is obtained by averaging over these estimates
 (2.29) (2.30)

As , the ASH is not dependent on the origin anymore and converts from a step function into a continuous function. This asymptotic behavior can be directly achieved by a different technique: kernel density estimation, studied in detail in the following Section 3.

Additional material on the histogram can be found in Scott (1992) who in specifically covers rules for the optimal number of bins, goodness-of-fit criteria and multidimensional histograms.

A related density estimator is the frequency polygon which is constructed by interpolating the histogram values . This yields a piecewise linear but now continuous estimate of the density function. For details and asymptotic properties see Scott (1992, Chapter 4).

The idea of averaged shifted histograms can be used to motivate the kernel density estimators introduced in the following Chapter 3. For this application we refer to Härdle (1991) and Härdle & Scott (1992).

EXERCISE 2.1   Show that equation (2.13) holds.

EXERCISE 2.2   Derive equation (2.14).

EXERCISE 2.3   Show that

EXERCISE 2.4   Derive equation (2.21).

EXERCISE 2.5   Prove that for every density function , which is a step function, i.e.

the histogram defined on the bins is the maximum likelihood estimate.

EXERCISE 2.6   Simulate a sample of standard normal distributed random variables and compute an optimal histogram corresponding to the optimal binwidth in this case.

EXERCISE 2.7   Consider and histograms using binwidths for starting at . Calculate

and the optimal binwidth . (Hint: The solution is

EXERCISE 2.8   Recall that for to be a consistent estimator of it has to be true that for any holds , i.e. it has to be true that converges in probability. Why is it sufficient to show that converges to 0?

EXERCISE 2.9   Compute for

and derive the optimal binwidth.

EXERCISE 2.10   Explain in detail why for the standard normal pdf we obtain

EXERCISE 2.11   The optimal binwidth that minimizes for is . How does this rule of thumb change for and ?

EXERCISE 2.12   How would the formula for the histogram change if we based it on intervals of the form instead of ?

EXERCISE 2.13   Show that the histogram is a maximum likelihood estimator of for an arbitrary discrete distribution, supported by {0,1,...}, if one considers and , .

EXERCISE 2.14   Consider an exponential distribution with parameter .
a)
Compute the bias, the variance, and the of .
b)
Compute the optimal binwidth that minimizes .

Summary
A histogram with binwidth and origin is defined by where and .
The bias of a histogram is
The variance of a histogram is .
The asymptotic is given by .
The optimal binwidth that minimizes is
The optimal binwidth that minimizes for is
The averaged shifted histogram (ASH) is given by The ASH is less dependent on the origin as the ordinary histogram.