Recall that we need a
large sample sizes in order to sufficiently approximate the critical values
computable by the CLT.
Here large means = 50 for one-dimensional data. How can
we construct confidence intervals in the case of smaller sample
sizes? One way is to use a method called the Bootstrap. The
Bootstrap algorithm uses the data twice:
![]() |
![]() |
Now draw with replacement a new sample from this empirical distribution.
That is we sample with replacement observations
from the original sample. This is called a Bootstrap sample.
Usually one takes
.
Since we sample with replacement, a single observation from the original sample
may appear several times in the Bootstrap sample. For instance, if the original sample
consists of the three observations
,
then a Bootstrap sample
might look like
Computationally, we find the Bootstrap sample by using a uniform random number
generator to draw from the indices
of the original samples.
The Bootstrap observations are drawn randomly from the empirical distribution,
i.e., the probability for each original observation to be selected into the
Bootstrap sample is for each draw.
It is easy to compute that
Figure 4.8 shows the cdf of the original observations as a solid line and two bootstrap cdf's as thin lines.
The CLT holds for the bootstrap sample. Analogously to Corollary 4.1 we have the following corollary.
How do we find a confidence interval for using the
Bootstrap method? Recall that the quantile
might be
bad for small sample sizes because the true distribution of
might be far away
from the limit distribution
.
The Bootstrap idea enables us to ``simulate'' this
distribution by computing
for many Bootstrap
samples. In
this way we can estimate an empirical (
)-quantile
. The bootstrap improved confidence interval is then