In the last section we have pointed out how resampling can offer additional insights in a data analysis. We now want to discuss applications of bootstrap that are more in the tradition of classical statistics. We will introduce resampling approaches for the construction of confidence intervals and of testing procedures. The majority of the huge amount of the bootstrap literature is devoted to these topics. There exist two basic approaches for the construction of confidence regions:
Approaches based on pivot statistics are classical methods for the construction of confidence sets. In a statistical model a pivot statistic is a random quantity that depends on the unknown parameter and on the observation (vector) and that has the following property. The distribution of under does not depend on . Thus the distribution of is known and one can calculate quantiles such that . Then is a confidence set of the unknown parameter with coverage probability . Classical examples are i.i.d. normal observations with mean and variance . Then is a pivot statistic. Here is the sample mean and is the sample variance. Then we get, e.g. is a confidence interval for with exact coverage probability . Here is the quantile of the t-distribution with degrees of freedom.
Pivot statistics only exist in very rare cases. However for a very rich class of settings one can find statistics that have a limiting distribution that smoothly depends on . Such statistics are called asymptotic pivot statistics. If now are chosen such that under the interval has probability then we get that converges to . Here is a consistent estimate of and the confidence set is defined as above. A standard example can be easily given if an estimate of a (one-dimensional, say) parameter is given that is asymptotically normal. Then converges in distribution towards a normal limit with mean zero and variance depending on the unknown parameter . Here or the studentized version with a consistent estimate of could be used as asymptotic pivot. Asymptotic pivot confidence intervals are based on the quantiles of the asymptotic distribution of . The bootstrap idea is to simulate the finite sample distribution of the pivot statistic instead of using the asymptotic distribution of . This distribution depends on and on the unknown parameter . The bootstrap idea is to estimate the unknown parameter and to plug it in. Then bootstrap quantiles for are defined as the (random) quantiles of . For the unstudentized statistic we get the bootstrap confidence interval where is the bootstrap quantile and is the bootstrap quantile. This confidence interval has an asymptotic coverage probability equal to . We want to illustrate this approach by the data example of the last section. Suppose we fit a GARCH(1,1) model to the logreturns and we want to have a confidence interval for . It is known that a GARCH(1,1) process is covariance stationary if and only if . For values of that approximate , one gets a very high persistency of shocks on the process. We now construct a bootstrap confidence interval for . We used as asymptotic pivot statistic. The results are summarized in Table 2.1.
We also applied the GARCH(1,1) bootstrap to the first half and to the second half of our data set. The results are summarized in Table 2.2. The value of is quite similar for both halves. The fitted parameter is always contained in the confidence interval based on the other half of the sample. Both confidence intervals have a broad overlap. So there seems no reason to expect different values of for the two halves of the data. The situation becomes a little bit confused if we compare Table 2.2 with Table 2.1. Both fitted values of , the value for the first half and for the second half, are not contained in the confidence interval that is based on the whole sample. This suggests that a GARCH(1,1) model with fixed parameters for the whole sample is not an appropriate model. A model with different values seems to be more realistic. When for the whole time series a GARCH(1,1) model is fitted the change of the parameters in time forces the persistency parameter closer to and this effect increases for GARCH fits over longer periods. We do not want to discuss this point further here and refer to [61] for more details.
In [19] another approach for confidence intervals was suggested. It was supposed to use the bootstrap quantiles of a test statistic directly as bounds of the bootstrap confidence intervals. In our example then the estimate has to be calculated repeatedly for bootstrap resamples and the and empirical quantiles are used as lower and upper bound for the bootstrap confidence intervals. It can be easily checked that we then get as bootstrap confidence interval where the quantiles and are defined as above, see also [22]. Note that the interval is just reflected around . The resulting confidence interval for is shown in Table 2.3. For asymptotic normal test statistics both bootstrap confidence intervals are asymptotically equivalent. Using higher order Edgeworth expansions it was shown that bootstrap pivot intervals achieve a higher order level accuracy. Modifications of percentile intervals have been proposed that achieve level accuracy of the same order, see [22]. For a recent discussion on bootstrap confidence intervals see also [21,18]. In our data example there is only a minor difference between the two intervals, cf. Tables 2.1 and 2.3. This may be caused by the very large sample size.
The basic idea of bootstrap tests is rather simple. Suppose that for a statistical model a testing hypothesis and a test statistic is given. Then bootstrap is used to calculate critical values for . This can be done by fitting a model on the hypothesis and by generating bootstrap resamples under the fitted hypothesis model. The quantile of the test statistic in the bootstrap samples can be used as critical value. The resulting test is called a bootstrap test. Alternatively, a testing approach can be based on the duality of testing procedures and confidence regions. Each confidence region defines a testing procedure by using the following rule. A hypothesis is rejected if no hypothesis parameter lies in the confidence region. We shortly describe this method for bootstrap confidence intervals based on an asymptotic pivot statistic, say , and the hypothesis . Bootstrap resamples are generated (in the unrestricted model) and are used for estimating the quantile of by , say. The bootstrap test rejects the hypothesis, if is larger than . Higher order performance of bootstrap tests has been discussed in Hall (1992)[32]. For a discussion of bootstrap tests we also refer to Beran (1988), Beran and Ducharme (1991)[2,4].
We now compare bootstrap testing with a more classical resampling approach for testing (''conditional tests''). There exist some (important) examples where, for all test statistics, resampling can be used to achieve a correct level on the whole hypothesis for finite samples. Such tests are called similar. For some testing problems resampling tests turn out to be the only way to get similar tests. This situation arises when a statistic is available that is sufficient on the hypothesis . Then, by definition of sufficiency, the conditional distribution of the data set given this statistic is fixed on the hypothesis and does not depend on the parameter of the underlying distribution as long as the parameter lies on the hypothesis. Furthermore, because this distribution is unique and thus known, resamples can be drawn from this conditional distribution. The resampling test then has correct level on the whole hypothesis. We will now give a more formal description.
A test for a vector of observations is called similar if for all , where is the set of parameters on the null hypotheses. We suppose that a statistic is available that is sufficient on the hypothesis. Let be the family of distributions of on the hypothesis. Then the conditional distribution of given does not depend on the underlying parameter because is sufficient. In particular, does not depend on . Then any test satisfying
For a given test statistic similar tests can be constructed by choosing such that
We will consider two examples of conditional tests. The first one are permutation tests. For a sample of observations the order statistic containing the ordered sample values is sufficient on the hypothesis of i.i.d. observations. Given , the conditional distribution of is a random permutation of . The resampling scheme is very similar to the nonparametric bootstrap. In the resampling, pseudo observations are drawn from the original data sample. Now this is done without replacement whereas in the bootstrap scheme this is done with replacement. For a comparison of bootstrap and permutation tests see also [41]. Also for the subsampling (i.e. resampling with a resample size that is smaller than the sample size) both schemes (with and without replacement) have been considered. For a detailed discussion of the subsampling without replacement see [72].
The second example is a popular approach in the physical literature on nonlinear time series analysis. For odd sample size a series can be written as
We would like to highlight a major difference between bootstrap and conditional tests. Bootstrap tests work if they are based on resampling of an asymptotic pivot statistic. Then the bootstrap critical values stabilize asymptotically and converge against the quantile of the limiting distribution of the test statistic. For conditional tests the situation is quite different. They work for all test statistics. However, not for all test statistics it is guaranteed that the critical value converges to a deterministic limit. In [60] this is discussed for surrogate data tests. It is shown that also for very large data sets the surrogate data quantile may have a variance of the same order as the test statistic . Thus the randomness of may change the nature of a test. This is illustrated by a test statistic for kurtosis of the observations that is transformed to a test for circular stationarity.