2.4 Averaged Shifted Histogram

Before we get to the details it might be a good idea to take a look at the end product of the procedure. If you look at Figure 2.6 you can see a ``histogram'' that has been obtained by averaging over eight histograms corresponding to different origins (four of these eight histograms are plotted in Figure 2.7).

Figure: Averaged shifted histogram for stock returns data; average of 8 histograms with origins 0, $ 0.005$, $ 0.01$, $ 0.015$, $ 0.02$, $ 0.025$, $ 0.03$, $ 0.035$, $ 0.04$ and binwidth $ h=0.04$
\includegraphics[width=0.03\defepswidth]{quantlet.ps}SPMashstock
\includegraphics[width=1.2\defpicwidth]{SPMashstock.ps}

The resulting averaged shifted histogram (ASH) is freed from the dependency of the origin and seems to correspond to a smaller binwidth than the histograms from which it was constructed. Even though the ASH can in some sense (which will be made more precise below) be viewed as having a smaller binwidth, you should be aware that it is not simply an ordinary histogram with a smaller binwidth (as you can easily see by looking at Figure 2.7 where we graphed an ordinary histogram with a comparable binwidth and origin $ x_{0}=0$).

Figure: Ordinary histogram for stock returns; binwidth $ h=0.005$
\includegraphics[width=0.03\defepswidth]{quantlet.ps}SPMhiststock
\includegraphics[width=1.2\defpicwidth]{SPMhiststock.ps}

Let us move on to the details. Consider a bin grid corresponding to a histogram with origin $ x_{0}=0$ and bins $ B_{j} = [(j-1)h,jh)$, $ j \in \mathbb{Z}$, i.e.

$\displaystyle \ldots\ B_{1}=[0,h), \qquad B_{2}=[h,2h), \qquad B_{3}=[2h,3h),
\ \ldots$

Let us generate $ M-1$ new bin grids by shifting each $ B_{j}$ by the amount $ l h/M$ to the right

$\displaystyle B_{jl} = \left[(j-1+l/M)h,(j+l/M)h\right), \quad l \in \{1,\ldots,M-1\}.$ (2.27)

EXAMPLE 2.1  
As an example take $ M=10$:

$\displaystyle \ldots B_{11}=[0.1h,1.1h), \qquad B_{21}=[1.1h,2.1h), \qquad
B_{31}=[2.1h,3.1h), \ldots$

$\displaystyle \ldots B_{12}=[0.2h,1.2h), \qquad B_{22}=[1.2h,2.2h), \qquad
B_{32}=[2.2h,3.2h), \ldots$

$\displaystyle \vdots$

$\displaystyle \ldots B_{19}=[0.9h,1.9h), \qquad B_{29}=[1.9h,2.9h), \qquad
B_{39}=[2.9h,3.9h), \ldots$

Of course if we take $ l=0$ then we get the original bin grid, i.e. $ B_{j}=B_{j0}$$ \Box$

Now suppose we calculate a histogram for each of the $ M$ bin grids. Then we get $ M$ different estimates for $ f$ at each $ x$

$\displaystyle \widehat f_{h,l}(x)=\frac{1}{nh} \sum_{i=1}^{n} \left\{ \sum_{j}\Ind(X_{i}\in B_{jl})\Ind(x\in B_{jl})\right\}.$ (2.28)

The ASH is obtained by averaging over these estimates
$\displaystyle \widehat f_{h}(x)$ $\displaystyle =$ $\displaystyle \frac{1}{M} \sum_{l=0}^{M-1}
\frac{1}{nh} \sum_{i=1}^{n} \left\{ \sum_{j}\Ind(X_{i}\in
B_{jl})\Ind(x\in B_{jl})\right\}$ (2.29)
  $\displaystyle =$ $\displaystyle \frac{1}{n} \sum_{i=1}^{n} \left\{ \frac{1}{Mh}
\sum_{l=0}^{M-1}
\sum_{j} \Ind(X_{i}\in B_{jl})\Ind(x\in B_{jl})\right\}.$ (2.30)

As $ M\to\infty$, the ASH is not dependent on the origin anymore and converts from a step function into a continuous function. This asymptotic behavior can be directly achieved by a different technique: kernel density estimation, studied in detail in the following Section 3.

Additional material on the histogram can be found in Scott (1992) who in specifically covers rules for the optimal number of bins, goodness-of-fit criteria and multidimensional histograms.

A related density estimator is the frequency polygon which is constructed by interpolating the histogram values $ \widehat{f}(m_j)$. This yields a piecewise linear but now continuous estimate of the density function. For details and asymptotic properties see Scott (1992, Chapter 4).

The idea of averaged shifted histograms can be used to motivate the kernel density estimators introduced in the following Chapter 3. For this application we refer to Härdle (1991) and Härdle & Scott (1992).

EXERCISE 2.1   Show that equation (2.13) holds.

EXERCISE 2.2   Derive equation (2.14).

EXERCISE 2.3   Show that

$\displaystyle \mathop{\mathit{Var}}\{\widehat f_{h}(x)\} = \frac{1}{nh^{2}}
\in...
...}f(u) \, du
\left( 1-\int_{B_{j}}f(u) \, du \right)\approx \frac{1}{nh}\, f(x).$

EXERCISE 2.4   Derive equation (2.21).

EXERCISE 2.5   Prove that for every density function $ f$, which is a step function, i.e.

$\displaystyle f(x)=\sum_{j=1}^m a_{j}\Ind(x\in A_{j}) \quad, \quad
A_{j}=[(j-1)h,jh) ,$

the histogram $ \widehat f_{h}$ defined on the bins $ B_{j}=A_{j}$ is the maximum likelihood estimate.

EXERCISE 2.6   Simulate a sample of standard normal distributed random variables and compute an optimal histogram corresponding to the optimal binwidth $ h_{0}$ in this case.

EXERCISE 2.7   Consider $ f(x)=2x\cdotp\Ind(x\in[0,1])$ and histograms using binwidths $ h=\frac{1}{m}$ for $ m=1,2,\ldots$ starting at $ x_0=0$. Calculate

$\displaystyle \mise(\widehat{f}_{h}) = \int_{0}^1 \mse\left\{ \widehat{f}_{h}\left( x \right) \right\} \,dx$

and the optimal binwidth $ h_{0}$. (Hint: The solution is $ \mise(\widehat{f}_{h})= (nh)^{-1} + \frac{1}{3}h^2 - \frac{4}{3}n^{-1} + \frac{1}{3}
n^{-1}h^2.)$

EXERCISE 2.8   Recall that for $ \widehat f_{h}(x)$ to be a consistent estimator of $ f(x)$ it has to be true that for any $ \epsilon>0$ holds $ P(\vert\widehat
f_{h}(x)-f(x)\vert>\epsilon)\to 0$, i.e. it has to be true that $ \widehat f_{h}(x)$ converges in probability. Why is it sufficient to show that $ \mse\{\widehat{f}_{h}(x)\}$ converges to 0?

EXERCISE 2.9   Compute $ \Vert f'\Vert^2$ for

$\displaystyle f(x)=\frac{2}{3}\left[ \left(\frac{x}{2} +1\right)
\Ind\{x\in[-2,0)\} + (1-x)\Ind\{x\in[0,1)\}\right]$

and derive the $ \mise$ optimal binwidth.

EXERCISE 2.10   Explain in detail why for the standard normal pdf $ f(x)=\varphi(x)$ we obtain

$\displaystyle \Vert f'\Vert^{2}_{2}
=\frac{1}{\sqrt{2\pi}}\sqrt{\frac{1}{2}}
\cdot\frac{1}{2}=\frac{1}{4\sqrt{\pi}}\,.
$

EXERCISE 2.11   The optimal binwidth $ h_0$ that minimizes $ \amise$ for $ N(0,1)$ is $ h_{0} = \left(24\sqrt{\pi}\right)^{1/3} n^{-1/3}$. How does this rule of thumb change for $ N(0,\sigma^2)$ and $ N(\mu,\sigma^2)$?

EXERCISE 2.12   How would the formula for the histogram change if we based it on intervals of the form $ [m_j-h,m_j+h)$ instead of $ [m_j-\frac{h}{2},m_j+\frac{h}{2})$?

EXERCISE 2.13   Show that the histogram $ \widehat{f}_h(x)$ is a maximum likelihood estimator of $ f(x)$ for an arbitrary discrete distribution, supported by {0,1,...}, if one considers $ h=1$ and $ B_j=[j,j+1)$, $ j=0,1,\dots$ .

EXERCISE 2.14   Consider an exponential distribution with parameter $ \lambda$.
a)
Compute the bias, the variance, and the $ \amise$ of $ \widehat{f}_h$.
b)
Compute the optimal binwidth $ h_0$ that minimizes $ \amise$.


Summary
$ \ast$
A histogram with binwidth $ h$ and origin $ x_0$ is defined by $ \widehat f_{h}(x)=\frac{1}{nh} \sum_{i=1}^n \sum_{j} \Ind(X_{i}\in
B_{j}) \Ind(x\in B_{j}) $ where $ B_{j}= [x_{0}+(j-1)h,x_{0}+jh)$ and $ j \in \mathbb{Z}$.
$ \ast$
The bias of a histogram is $ E\{\widehat f_{h}(x)-f(x)\}
\approx f'\left\{\left(j-\frac{1}{2}\right)h\right\}
\left\{(j-\frac{1}{2})h-x\right\}.$
$ \ast$
The variance of a histogram is $ \mathop{\mathit{Var}}\{\widehat f_{h}(x)\} \approx
\frac{1}{nh}f(x)$.
$ \ast$
The asymptotic $ \mise$ is given by $ \amise(\widehat
f_{h})=\frac{1}{nh}+\frac{h^{2}}{12}\Vert f'\Vert^{2}_{2} $.
$ \ast$
The optimal binwidth $ h_0$ that minimizes $ \amise$ is $ h_{0} = \left(\frac{6}{n\Vert f' \Vert^{2}_{2}}\right)^{1/3}
\sim n^{-1/3}.$
$ \ast$
The optimal binwidth $ h_0$ that minimizes $ \amise$ for $ N(0,1)$ is $ h_{0} \approx 3.5\, n^{-1/3}.$
$ \ast$
The averaged shifted histogram (ASH) is given by $ \widehat f_{h}(x) =\frac{1}{n} \sum_{i=1}^{n} \left\{ \frac{1}{Mh}
\sum_{l=0}^{M-1}
\sum_{j} \Ind(X_{i}\in B_{jl})\Ind(x\in B_{jl})\right\}.$ The ASH is less dependent on the origin as the ordinary histogram.