Recall that we need a large sample sizes in order to sufficiently approximate the critical values computable by the CLT. Here large means = 50 for one-dimensional data. How can we construct confidence intervals in the case of smaller sample sizes? One way is to use a method called the Bootstrap. The Bootstrap algorithm uses the data twice:
|
|
Now draw with replacement a new sample from this empirical distribution. That is we sample with replacement observations from the original sample. This is called a Bootstrap sample. Usually one takes .
Since we sample with replacement, a single observation from the original sample may appear several times in the Bootstrap sample. For instance, if the original sample consists of the three observations , then a Bootstrap sample might look like Computationally, we find the Bootstrap sample by using a uniform random number generator to draw from the indices of the original samples.
The Bootstrap observations are drawn randomly from the empirical distribution,
i.e., the probability for each original observation to be selected into the
Bootstrap sample is for each draw.
It is easy to compute that
Figure 4.8 shows the cdf of the original observations as a solid line and two bootstrap cdf's as thin lines.
The CLT holds for the bootstrap sample. Analogously to Corollary 4.1 we have the following corollary.
How do we find a confidence interval for using the
Bootstrap method? Recall that the quantile
might be
bad for small sample sizes because the true distribution of
might be far away
from the limit distribution .
The Bootstrap idea enables us to ``simulate'' this
distribution by computing
for many Bootstrap
samples. In
this way we can estimate an empirical ()-quantile
. The bootstrap improved confidence interval is then