Next: References Up: 2. Bootstrap and Resampling Previous: 2.3 Resampling Tests and

2.4 Bootstrap for Dependent Data

The Bootstrap for dependent data is a lively research area. A lot of ideas are around and have let to quite different proposals. In this section we do not want to give a detailed overview and description of the different proposals. We only want to sketch the main ideas. Models for dependent data may principally differ from i.i.d. models. For dependent data the data generating process is often not fully specified. Then there exists no unique natural way for resampling. The resampling should be carried out in such a way that the dependence structure should be captured. This can be easily done in case of classical finite-dimensional ARMA models with i.i.d. residuals. In these models the resamples can be generated by fitting the parameters and by using i.i.d. residuals in the resampling. We will discuss the situation when no finite-dimensional model is assumed. For other overviews on the bootstrap for time series analysis, see [12,30,70] and the time series chapter in [18] and the book [49]. In particular, [30] give an overview over the higher order performance of the different resampling schemes.

The most popular bootstrap methods for dependent data are block, sieve, local, wild and Markov bootstrap and subsampling. They all are nonparametric procedures.

2.4.1 The Subsampling

The method that works under a minimal amount of assumptions is the subsampling. It is used to approximate the distribution of an estimate $\widehat{\theta}_n$ estimating an unknown parameter $\theta$ . In the subsampling subsamples of consecutive observations of length are taken. These subsamples are drawn randomly from the whole time series. For the subsamples estimates $\widehat{\theta}^{\ast}$ are calculated. If it is known that for a sequence the statistic $a_n (\widehat{\theta}_n- \theta)$ has a limiting distribution then under very weak conditions the conditional distribution of $a_l(\widehat{\theta}^{\ast}- \widehat{\theta}_n)$ has the same limiting distribution. Higher order considerations show that the subsampling has a very poor rate of convergence, see [35]. It does not even achieve the rate of convergence of a normal approximation. It may be argued that this poor performance is the price for its quite universal applicability. Subsampling has also been used in i.i.d. settings where classical bootstrap does not work. For a detailed discussion of the subsampling see [72].

2.4.2 The Block Bootstrap

The basic idea of the block bootstrap is closely related to the i.i.d. nonparametric bootstrap. Both procedures are based on drawing observations with replacement. In the block bootstrap however instead of single observations blocks of consecutive observations are drawn. This is done to capture the dependence structure of neighbored observations. Different versions of this idea have been proposed in [31,13,45,54] and [71]. It has been shown that this approach works for a large class of stationary processes. The blocks of consecutive observations are drawn with replacement from a set of blocks. In the first proposal this was done for a set of nonoverlapping blocks of fixed length : $\{X_j: j=1,\ldots,l\}$ , $\{X_{l+j}: j=1,\ldots,l\},\ldots$ Later papers proposed to use all (also overlapping) blocks of length , i.e. the -th block consists of the observations $\{X_{k-1+j}: j=1,\ldots,l\}$ (Moving block bootstrap). The bootstrap resample is obtained by sampling blocks randomly with replacement and putting them together to a time series of length . By construction, the bootstrap time series has a nonstationary (conditional) distribution. The resample becomes stationary if the block length is random and generated from a geometric distribution. This version of the block bootstrap is called the stationary bootstrap and was introduced in [71]. Recently, [65,67] proposed another modification that uses tapering methods to smooth the effects of boundaries between neighbored blocks. With respect to higher order properties the moving block bootstrap outperforms the version with non overlapping blocks and both achieve a higher order accuracy as the stationary bootstrap (see [34,46,47,49]).

The block bootstrap has turned out as a very powerful method for dependent data. It does not achieve the accuracy of the bootstrap for i.i.d. data but it outperforms the subsampling. It works reasonably well under very weak conditions on the dependency structure. It has been applied to a very broad range of applications. For the block bootstrap no specific assumption is made on the structure of the data generating process.

We now describe some methods that use more specific assumptions on the dependency structure.

2.4.3 The Sieve Bootstrap

The i.i.d. resampling can also be applied to models of dependent data where the stochastics is driven by i.i.d. innovations. The distribution of the innovations can be estimated by using fitted residuals. In the resampling i.i.d. innovations can be generated by i.i.d. resampling from this fitted distribution. An example is an autoregressive linear model:

$\displaystyle X_t - \mu_X = \sum_{j=1}^p \rho_j \left(X_{t-j} - \mu_X\right) + \varepsilon_t{},\quad t \in {\mathbb{Z}}$

(2.3)

where $\mu_X=E(X_t)$ is the observation mean and where $\{\varepsilon_t\}$ is a sequence of i.i.d. innovations with $E(\varepsilon_t)=0$ and $\varepsilon_t$ is independent of $\{X_s, s < t\}$ . The parameters $\rho_1, \ldots, \rho_p$ can be estimated by least squares or by using Yule-Walker equations. Residuals can be fitted by putting

$\displaystyle \tilde{\varepsilon}_t = X_t -\widehat{\mu}_X - \sum_{j=1}^p \widehat{\rho}_j \left(X_{t-j} -\widehat{\mu}_X\right){},$

where $\widehat{\mu}_X =n^{-1}\sum_{t=1}^n X_t$ and $\widehat{\rho}_1, \ldots, \widehat{\rho}_p$ are the fitted parameters. Bootstrap resamples can be generated by

$\displaystyle X_t^{\ast} - \widehat{\mu}_X = \sum_{j=1}^{{p}} \widehat{\rho}_j \left(X_{t-j}^{\ast} - \widehat{\mu}_X\right) + \varepsilon_t^{\ast}$

(2.4)

where $\varepsilon_t^{\ast}$ are drawn with replacement from the estimated centered residuals $\widehat{\varepsilon}_t= \tilde{\varepsilon}_t - n^{-1} \sum_{i=1}^n \tilde{\varepsilon}_i$ . For a study of this bootstrap procedure in model (2.3), see e.g. [24] and references cited therein.

In a series of papers this approach has been studied for the case that model (2.3) only approximately holds. This is the case if the underlying time series is a stationary linear process, i.e. $\{X_t\}$ has an infinite order autoregressive representation:

$\displaystyle X_t - \mu_X = \sum_{j=1}^{\infty} \rho_j \left(X_{t-j} - \mu_X\right) + \varepsilon_t{},\quad t \in {\mathbb{Z}}{}.$

(2.5)

The bootstrap scheme (2.4) has been proposed for this AR( $\infty$ ) model. In a first step a model (2.3) of finite order

is fitted to the time series. Bootstrap resamples are generated as in (2.4) according to model (2.3). This resampling scheme has been called the sieve bootstrap because the AR( $\infty$ ) model (2.5) is approximated by an AR(

) model, where, in the asymptotics,

converges to infinity for increasing sample size

. It is argued that this asymptotic approach reflects practical uses of AR models where the order

is selected data adaptively and one is only thinking of the finite order AR model as an approximation to the truth. The Sieve bootstrap and its asymptotic consistency was first considered by [42,43] and further analyzed by [10,11,5,63,69,15] showed that under appropriate conditions the sieve bootstrap achieves nearly the rates of convergence of the i.i.d resampling. In particular, it usually outperforms the block bootstrap. [10] studied higher order performance of sieve bootstrap variance estimates for the sample mean under assumptions on the decay of the coefficients $\rho_j \le c j^{-v}$ for constants

and

2.4.4 The Nonparametric Autoregressive Bootstrap

Another residual based bootstrap scheme has been proposed for a nonparametric autoregression model:

$\displaystyle X_t = m(X_{t-1}, \ldots, X_{t-p})+\sigma(X_{t-1}, \ldots, X_{t-q})\varepsilon_t\quad t=1,2 \ldots$

(2.6)

where $\{\varepsilon_t\}$ is a sequence of i.i.d. error variables with zero mean and unit variance and where

and $\sigma$ are unknown smooth functions. The functions

and $\sigma$ can be estimated by nonparametric smoothing estimates $\widehat{m}$ and $\widehat{\sigma}$ . These estimates can be used to fit residuals. In the nonparametric autoregressive bootstrap resamples are generated

$\displaystyle X_t ^{\ast}= \tilde{m} \left(X_{t-1}^{\ast}, \ldots, X_{t-p}^{\as... ...-1}^{\ast}, \ldots, X_{t-q}^{\ast}\right)\varepsilon_t^{\ast}\quad t=1,2 \ldots$

where $\tilde{m}$ and $\tilde{\sigma}$ are nonparametric smoothing estimates and where $\varepsilon_t^{\ast}$ are drawn with replacement from the centered fitted residuals. The choice of the bootstrap autoregression function $\tilde{m}$ and of the bootstrap volatility function $\tilde{\sigma}^2$ is rather delicate because inappropriate choices can lead to explosive dynamics for the bootstrap time series. The nonparametric autoregressive bootstrap was discussed in [25]. They give conditions under which this bootstrap approach is consistent. [26] used this bootstrap approach for the construction of uniform confidence bands for the regression function

2.4.5 The Regression-type Bootstrap, the Wild Bootstrap and the Local Bootstrap

[25] also consider two other bootstrap procedures for the model (2.6): the regression bootstrap and the wild bootstrap. In the regression bootstrap, a nonparametric regression model is generated with (conditionally) fixed design. We describe this approach for the case of a homoscedasstic autoregression model:

$\displaystyle X_t = m(X_{t-1}, \ldots, X_{t-p})+\varepsilon_t \quad t=1,2, \ldots$

(2.7)

where again $\{\varepsilon_t\}$ is a sequence of i.i.d. error variables with zero mean and

is an unknown smooth autoregression function. Bootstrap error variables $\varepsilon_t^{\ast}$ can be generated by drawing with replacement from centered fitted residuals in model (2.7). In contrast to the autoregression bootstrap the resamples are now generated in a regression model

$\displaystyle X^{\ast}_t = \tilde{m}(X_{t-1},\ldots, X_{t-p})+\varepsilon_t^{\ast} \quad t=1,2,\ldots{},$

(2.8)

where $\tilde{m}$ is again a nonparametric smoothing estimate of

. The stochastic behavior of the autoregression estimates in model (2.7) is fitted by the bootstrap regression estimates in (2.8). Thus regression of

onto $(X_{t-1}, \ldots, X_{t-p})$ is mimicked in the bootstrap by regression of $X_t^{\ast}$ onto the same covariable $(X_{t-1}, \ldots, X_{t-p})$ . The regression bootstrap principally differs from the autoregressive bootstrap because no autoregressive scheme is generated in the resampling. Because the original time series is used as covariables in a regression problem the regression bootstrap has the advantage that there is no danger for the bootstrap process to be unstable or to explode. Thus the choice of the bootstrap error distribution and of the estimate $\tilde{m}$ is not so crucial as for the autoregression bootstrap. On the other hand the randomness of the covariables is not mimicked in the resampling. This leads to a poorer finite sample performance, see [25].

Modifications of the regression bootstrap are the local bootstrap ([64]) and the wild bootstrap. The wild bootstrap also uses a regression model with (conditionally) fixed covariables. But it is designed to work also for heteroscedastic errors. It has been first proposed for regression models with independent but not identically distributed error variables, see [78,1]. For nonparametric models it was first proposed in [29]. In the nonparametric autoregression model (2.7) wild bootstrap resamples are generated as in (2.8). But now the error variables $\varepsilon_t^{\ast}$ are generated as $\varepsilon_t^{\ast}= \widehat{\varepsilon}_t \eta_t$ where $\widehat{\varepsilon}_t$ are centered fitted residuals and where $\eta_1,\ldots,\eta_n$ are (conditionally) i.i.d. variables with conditional zero mean and conditional unit variance (given the original sample). For achieving higher order accuracy it has also been proposed to use $\eta_t$ with conditional third moment equal to . One could argue that in this resampling scheme the distribution of $\varepsilon_t$ is fitted by the conditional distribution of $\eta_t$ . Then different distributions are fitted in a model where only observations are available. This is the reason why in [29] this approach was called wild bootstrap. For a more detailed discussion of the wild bootstrap, see [52,53,56,57,58]. The asymptotic analysis of the wild bootstrap and other regression type bootstrap methods in model (2.7) is much simpler than the autoregression bootstrap. In the bootstrap world it only requires mathematical analysis of a nonparametric regression model. Only the discussion of uniform nonparametric confidence bands remains rather complicated because it involves strong approximations of the bootstrap nonparametric regression estimates by Gaussian processes, see [62]. The wild bootstrap works under quite weak model assumptions. Essentially it is only assumed that the conditional expectation of an observation given the past is a smooth function of the last observations (for some finite ). Generality has its price. Resampling schemes that use more detailed modeling may achieve a better accuracy. We now consider resampling under the stronger assumption that not only the mean but also the whole conditional distribution of an observation smoothly depends on the last observations (for some finite ). Resampling schemes that work under this smooth Markov assumption are the Markov Bootstrap schemes.

2.4.6 The Markov Bootstrap

We discuss the Markov bootstrap for a Markov model of order . We will describe two implementations of the Markov bootstrap. For both implementations one has to assume that the conditional distribution of $X_{t+1}$ given $X_1,\ldots,X_t$ smoothly depends on . The first version was introduced by [73]. It is based on a nonparametric estimate of the transition density $f(y\vert x)$ of $X_{t+1}=y$ given . Using kernel density estimates of the density of and of the joint density of $(X_t,X_{t+1})$ one can estimate $f(y\vert x)$ by

$\displaystyle \widehat{f}(y\vert x) = \frac{\widehat{f}(x,y)}{\widehat{f}(x)}{},$

where

$\displaystyle \widehat{f}(x,y) = \frac{1}{n-1}\sum_{t=1}^{n-1} K_h(X_t-x) K_g(X_{t+1}-y){},$
$\displaystyle \widehat{f}(x) = \frac{1}{n}\sum_{t=1}^{n} K_h(X_t-x)$

are kernel density estimates with kernel functions $K_r(u)= r^{-1}K\left(r^{-1}u\right)$ for bandwidths

. In the bootstrap resampling one starts with an observation $X_1^{\ast}$ from the density $\widehat{f}(\cdot)$ and then one iteratively generates $X_{t+1}^{\ast}$ by sampling from $\widehat{f}(\cdot\vert X_{t}^{\ast})$ . Higher order performance of this resampling scheme has been discussed in [39]. It turns out that it achieves faster rates of convergence compared with the block bootstrap. This is in accordance with intuition because the Markov bootstrap requires an additional model assumption, namely the Markov property.

The second version of the Markov bootstrap can be described as a limiting version of the latter for $g \to 0$ . Then in the limiting case the bootstrap process takes values only in the set of observations $\{X_1,\ldots,X_n\}$ . Given $X_t^{\ast}=x$ , the next observation $X_{t+1}^{\ast}$ is equal to $(2\leq s \leq n)$ with probability $K_h(X_{s-1}-x) / \sum_{r=1}^{n-1} K_h(X_{r}-x)$ . This resampling scheme was introduced in [66,68]. Higher order properties are not yet known. It may be expected that it has similar asymptotic properties as the smoothed version of the Markov bootstrap. The unsmoothed version has the advantage that the bootstrap time series is forced to live on the observed values of the original time series. This leads to a more stable dynamic of the bootstrap time series, in particular for smaller sample sizes. Furthermore, for higher dimensional Markov processes the unsmoothed version is based on only dimensional kernel density smoothing whereas smoothed bootstrap requires dimensional kernel smoothing. Here, denotes the dimension of the Markov process. Again, one can argue that this leads to a more stable finite sample performance of unsmoothed bootstrap. On the other hand, the smoothed Markov bootstrap takes advantage of smoothness of $f(y\vert x)$ with respect to . For larger data sets this may lead to improvements, in case of smooth transition densities.

2.4.7 The Frequency Domain Bootstrap

For the periodogram $I_X(\omega) = \frac{1}{2 \pi n} \left \vert \sum_{t=1}^n X_t \exp (-i \omega t) \right \vert^2$ it is known that its values for $\omega_j= 2 \pi j /n$ , are asymptotically independent. For the first two moments one gets that for $0< j, k < n/2, j \not = k$

$\displaystyle \mathrm{E} [I_X(\omega_j) ]= f(\omega_j) + o(n^{-1/2}){},$	(2.9)
$\displaystyle \mathrm{Var} [I_X(\omega_j)] = f(\omega_j)^2 + o(1){},$	(2.10)
$\displaystyle \mathrm{Cov} [I_X(\omega_j), I_X(\omega_k)] = n^{-1} f(\omega_j) ... ...\right]}{\mathrm{E} \left[\varepsilon_j^2\right]^2} - 3 \right ] + o(n^{-1}){},$	(2.11)

where $f(\omega) = (2\pi)^{-1} \sum_{k=-\infty}^{\infty}\mathrm{Cov}(X_t,X_{t+k}) \exp(- {\text{i}} k\omega)$ is the spectral density of the time series

and where $\varepsilon_j = X_j - \mathrm{E} [X_j\vert X_t:t\leq j-1]$ are the innovations of the time series. These expansions hold under some regularity conditions on

. In particular, it is needed that

is a linear process. Thus approximately, we get that $\eta_j = I_X(\omega_j)/ f(\omega_j)$ ,

is an i.i.d. sequence. This suggests the following bootstrap scheme, called the frequency domain bootstrap or the periodogram bootstrap.

In this resampling bootstrap values $I_X^{\ast}(\omega_j)$ of the periodogram $I_X(\omega_j)$ are generated. The resampling uses two estimates $\widehat{\mathit{f}}$ and $\tilde{f}$ of the spectral density. In some implementations these estimates can be chosen identically. The first estimate is used for fitting residuals $\widehat{\eta}_j = I_X(\omega_j)/ \widehat{f}(\omega_j)$ . The bootstrap residuals $\eta_1^{\ast},\ldots$ are drawn with replacement from the centered fitted residuals $\widehat{\eta}_j / \widehat{\eta}_{\cdot}$ where $\widehat{\eta}_{\cdot}$ is the average of $\widehat{\eta}_j$ over . The bootstrap periodogram is then calculated by putting $I_X^{\ast}(\omega_j) = \tilde{f}(\omega_j) \eta_j^{\ast}$ .

The frequency domain bootstrap can be used to estimate the distribution of statistics $n^{-1/2} \sum_{0< j < n/2} w_j I_X(\omega_j)$ . Then the distribution of $n^{-1/2} \sum_{0< j < n/2} [w_j I_X(\omega_j) - w_j f(\omega_j) ]$ is estimated by the conditional distribution of $n^{-1/2} \sum_{0< j < n/2} [w_j I_X^{\ast}(\omega_j) - w_j \tilde{f}(\omega_j) ]$ . Unfortunately, in general this approach does not work. This can be easily seen by a comparison of the asymptotic variances of the statistics. The original statistic $n^{-1/2} \sum_{0< j < n/2} w_j I_X(\omega_j)$ has variance that is asymptotically equivalent to

$\displaystyle n^{-1} \sum w_j^2 f(\omega_j)^2 + \left [ \frac{\mathrm{E} \left[... ...psilon_j^2\right]^2}- 3 \right ]\left [ n^{-1} \sum w_j f(\omega_j)\right]^2{},$

see (2.9)-(2.11). In the bootstrap world the variance is approximately

$\displaystyle n^{-1} \sum w_j^2 \tilde{f}(\omega_j)^2{}.$

Thus in general there are differences between the variances that do not vanish asymptotically. The reason is that the term on the right hand side of (2.11) contributes an additional term to the variance for the original time series. This term does not appear in the bootstrap because an i.i.d. resampling is used that produces conditionally uncorrelated $I_X^{\ast}(\omega_j)$ .

Although the frequency domain bootstrap does not work in general, there exist three important examples where it works. In all three examples the second term in the asymptotic expansion of the variance vanishes. This happens e.g. if the kurtosis of the innovations is equal to zero:

$\displaystyle \frac{\mathrm{E} \left[\varepsilon_j^4\right]}{\mathrm{E} \left[ \varepsilon_j^2\right]^2}- 3 = 0{}.$

In particular, this is the case if the innovations have a normal distribution. Another more general example where the bootstrap works is given by statistics where it holds that $n^{-1} \sum w_j f(\omega_j)= o(1)$ . A large class of examples for this case are ratio statistics

$\displaystyle n^{1/2} \frac{\sum_{0< j < n/2} r_j I_X(\omega_j)}{\sum_{0< j < n/2} I_X(\omega_j)}{}.$

By some Taylor expansion calculus one can see that

$\displaystyle n^{1/2} \left [\frac{\sum_{0< j < n/2} r_j I_X(\omega_j)} {\sum_{... ...\frac{\sum_{0< j < n/2} r_j f(\omega_j)}{\sum_{0< j < n/2} f(\omega_j)}\right ]$
$\displaystyle \approx n^{-1/2} \sum_{0< j < n/2} \left[w_j I_X(\omega_j) - w_j f(\omega_j) \right]$

with

proportional to $r_j-\sum_k r_k\, f (\omega_k)\big/\sum_k f(\omega_k)$ . Then $\sum_j w_j f(\omega_j) = 0$ and the bootstrap consistently estimates the variance of the ratio statistic. Consistency of the frequency domain bootstrap for ratio statistics has been shown in [16]. They also showed that the frequency domain bootstrap achieves higher order accuracy. But for this it is necessary that the third moment of the innovations vanishes. This is a rather restrictive assumption. Examples of ratio statistics are autocorrelation estimates, see [16] where other examples are also given. Modifications of the frequency domain bootstrap have been proposed that work for a larger class of statistics. An example is the proposal of [44] where ideas of the frequency domain bootstrap are combined with ideas of the sieve bootstrap.

There exists also another example where the frequency domain bootstrap works. Nonparametric smoothing estimates of the spectral density are linear statistics where the weights are now local. For example for kernel smoothing weights $w_j= h^{-1} K[(\omega_j-x)/h]$ with bandwidth and kernel function one has $n^{-1} \sum_j w_j^2 f(\omega_j)^2 = O(h^{-1})$ . On the other hand, $n^{-1} \sum_j w_j f(\omega_j) = O(1)$ is of lower order. Now, both the variance of the original spectral density estimate and the variance of the bootstrap spectral density estimate have variance that is up to terms of order is equal to the same quantity $(2\pi)^2 n^{-1} \sum w_j^2 f(\omega_j)^2$ . The correlation between $I_X(\omega_j)$ and $I_X(\omega_k)$ for $j \not = k$ (see (2.11)) only contributes to higher order terms. [23] firstly observed this relation and used this fact to show that the frequency domain bootstrap works for nonparametric spectral density estimation. In their approach, both $\widehat{\mathit{f}}$ and $\tilde{f}$ are nonparametric kernel smoothing estimates. For $\tilde{f}$ a bandwidth has been chosen that is of larger order than the bandwidth . Then bootstrap consistently estimates the bias of the spectral density estimate. Similar approaches have been used in bootstrap schemes for other settings of nonparametric curve estimation, see [59]. For the frequency domain bootstrap for parametric problems one can choose $\widehat{f} = \tilde{f}$ , see [16].

We now have discussed a large class of resampling schemes for dependent data. They are designed for different assumptions on the dependency structure ranging from quite general stationarity assumptions (subsampling), mixture conditions (block bootstrap), linearity assumptions (sieve bootstrap, frequency domain bootstrap), conditional mean Markov property (wild bootstrap), Markov properties (Markov bootstrap) and autoregressive structure (autoregressive bootstrap). It may be generally conjectured that resampling schemes for more restrictive models are more accurate as long as these more restrictive assumptions really apply. These conjectures are supported by asymptotic results based on higher order Edgeworth expansions. (Although these results should be interpreted with care because of the poor performance of higher order Edgeworth expansions for finite samples, see also the discussion in the introduction.) The situation is also complicated by the fact that in time series analysis typically models are used as approximations to the truth and they are not interpreted as true models. Thus one has to study the much more difficult problem how resampling schemes perform if the underlying assumptions are only approximately fulfilled.

Resampling for dependent data has stimulated very creative ideas and discussions and it had lead to a large range of different approaches. Partially, the resampling structure is quite different from the stochastic structure of the original time series. In the regression bootstrap regression data are used instead of autoregression series. In the sieve bootstrap and in the frequency domain bootstrap models are used that only approximate the original model.

For dependent data the bootstrap has broadened the field of possible statistical applications. The bootstrap offered new ways of implementing statistical procedures and made it possible to treat new types of applied problems by statistical inference.

The discussion of the bootstrap for dependent data is not yet finished. For the comparison of the proposed resampling schemes a complete understanding is still missing and theoretical research is still going on. Applications of time series analysis will also require new approaches. Examples are unit root tests, cointegration analysis and the modeling of financial time series.

Next: References Up: 2. Bootstrap and Resampling Previous: 2.3 Resampling Tests and