15.1 Intervals of homogeneity

An adaptive estimation algorithm for time series is presented in this chapter. The basic idea is the following: given a time series and a linear model, we select on-line the largest sample of the most recent observations, such that the model is not rejected. Assume for example that the data can be well fitted by a regression, an autoregression or even by a constant in an unknown interval. The main problem is then to detect the time interval where the model approximately holds. We call such an interval: interval of time homogeneity.

This approach appears to be suitable in financial econometrics, where an on-line analysis of large data sets, like e.g. in backtesting, has to be performed. In this case, as soon as a new observation becomes available, the model is checked, the sample size is optimally adapted and a revised forecast is produced.

In the remainder of the chapter we briefly present the theoretical foundations of the proposed algorithm which are due to Liptser and Spokoiny (1999) and we describe its implementation. Then, we provide two applications to financial data. In the first one we estimate the possibly time varying coefficients of an exchange rate basket, while in the second one the volatility of an exchange rate time series is fitted to a locally constant model. The main references can be found in Härdle et al. (2001), Mercurio and Spokoiny (2000), Härdle et al. (2000) and Mercurio and Torricelli (2001).

Let us consider the following linear regression equation:

$\displaystyle Y_t = X_t^{\top}\theta + \sigma\varepsilon_t,\quad t =1,\ldots,T$ (15.1)

where $ Y_t$ is real valued, $ X_t = (X_{1,t}\ldots X_{p,t})^\top$ and $ \theta = (\theta_1\ldots\theta_p)^\top$ are $ \mathbb{R}^p$ valued and $ \varepsilon _t$ is a standard normally distributed random variable. If the matrix $ \sum_{t=1}^T X_t X_t^{\top}$ is nonsingular with inverse $ W$, then the least squares estimator of $ \theta $ is:

$\displaystyle \widehat\theta = W \sum_{t=1}^T X_tY_t.$ (15.2)

Define $ w_{kk}$ as the $ k$-th element on the diagonal of $ W$ and let $ \lambda$ be a positive scalar. For nonrandom regressors,the following exponential probability bound is easy to prove:

$\displaystyle \textrm{P}(\vert\widehat\theta_k - \theta_k\vert > \lambda\sigma \sqrt{w_{kk}}) \leq 2e^{-\frac{\lambda^2}{2}}, \quad k = 1,\ldots,p.$ (15.3)

Indeed, the estimation error $ \widehat\theta_k - \theta_k
$ is $ \textrm{N}(0, w_{kk}^2\sigma^2)$ distributed, therefore:
$\displaystyle 1$ $\displaystyle =$ $\displaystyle \textrm{E}\exp\left(\frac{\lambda(\widehat\theta_k - \theta_k)}{\sigma \sqrt{w_{kk}}} -
\frac{\lambda^2}{2} \right)$  
  $\displaystyle \geq$ $\displaystyle \textrm{E}\exp\left(\frac{\lambda(\widehat\theta_k - \theta_k) }{...
...ght)
\boldsymbol{1}( \widehat\theta_k - \theta_k > \lambda\sigma \sqrt{w_{kk}})$  
  $\displaystyle \geq$ $\displaystyle \exp\left( \frac{\lambda^2}{2} \right)
\textrm{P}( \widehat\theta_k - \theta_k > \lambda\sigma \sqrt{w_{kk}} ).$  

The result in (15.3) follows from the symmetry of the normal distribution. Equation (15.3) has been generalized by Liptser and Spokoiny (1999) to the case of nonrandom regressors. More precisely, they allow the $ X_t$ to be only conditionally independent of $ \varepsilon _t$, and they include lagged values of $ Y_t$ as regressors. In this case the bound reads roughly as follows:

$\displaystyle \textrm{P}(\vert\widehat\theta_k - \theta_k\vert > \lambda\sigma \sqrt{w_{kk}};\, W$ is nonsingular $\displaystyle ) \leq \mathcal{P}(\lambda) e^{-\frac{\lambda^2}{2}}.$ (15.4)

Where $ \mathcal{P}(\lambda)$ is a polynomial in $ \lambda$. It must be noticed that (15.4) is not as sharp as (15.3), furthermore, because of the randomness of $ W$, (15.4) holds only on the set where $ W$ is nonsingular, nevertheless this set has in many cases a large probability. For example when $ Y_t$ follows an ergodic autoregressive process and the number of observations is at least moderately large. More technical details are given in Section 15.4.

We now describe how the bound (15.4) can be used in order to estimate the coefficients $ \theta $ in the regression equation (15.1) when the regressors are (possibly) stochastic and the coefficients are not constant, but follow a jump process.

Figure 15.1: Example of a locally homogeneous process.
\begin{figure}\setlength{\unitlength}{0.5cm} \begin{picture}(15, 6)(-5,0)
\linet...
...(-1.5, 3){\shortstack[l]{$\theta_{i,t}$}}
\end{picture}\\ \\ \\ \\
\end{figure}


The procedure that we describe does not require an explicit expression of the law of the process $ \theta_t$, but it only assumes that $ \theta_t$ is constant on some unknown time interval $ I = [\tau - m, \tau], \quad \tau - m >0, \quad \tau,\,m\in\mathbb{N}$. This interval is referred as an interval of time homogeneity and a model which is constant only on some time interval is called locally time homogeneous.

Let us now define some notation. The expression $ \widehat\theta_\tau$ will describe the (filtering) estimator of the process $ (\theta_t)_{t\in \mathbb{N}}$ at time $ \tau$; that is to say, the estimator which uses only observations up to time $ \tau$. For example if $ \theta $ is constant, the recursive estimator of the form:

$\displaystyle \widehat\theta_{\tau} = \left(\sum_{s=1}^\tau
X_s X_s^{\top}\right)^{-1}\sum_{s=1}^\tau X_sY_s,
$

represents the best linear estimator for $ \theta $. But, if the coefficients are not constant and follow a jump process, like in the picture above a recursive estimator cannot provide good results. Ideally, only the observations in the interval $ I = [\tau - m, \tau]$ should be used for the estimation of $ \theta_{\tau}$. Actually, an estimator of $ \theta_\tau$ using the observation of a subinterval $ J\subset I$ would be less efficient, while an estimator using the observation of a larger interval $ K \supset I$ would be biased. The main objective is therefore to estimate the largest interval of time homogeneity. We refer to this estimator as $ \widehat I = [\tau - \widehat m, \tau] $. On this interval $ \widehat I$ we estimate $ \theta_\tau$ with ordinary least squares (OLS):

$\displaystyle \widehat\theta_\tau = \widehat\theta_{\widehat I} = \left(\sum_{s \in \widehat I} X_s X_s^{\top}\right)^{-1}\sum_{s\in \widehat I } X_sY_s.$ (15.5)

In order to determine $ \widehat I$ we use the idea of pointwise adaptive estimation described in Lepski (1990), Lepski and Spokoiny (1997) and Spokoiny (1998). The idea of the method can be explained as follows.

Suppose that $ I$ is an interval-candidate, that is, we expect time-homogeneity in $ I$ and hence in every subinterval $ J\subset I$. This implies that the mean values of the $ \widehat\theta_I$ and $ \widehat\theta_J$ nearly coincide. Furthermore, we know on the basis of equation (15.4) that the events

$\displaystyle \vert\widehat\theta_{i,I} - \theta_{\tau}\vert \leq\mu \sigma\sqrt{w_{ii,I}}$    and $\displaystyle \quad
\vert\widehat\theta_{i,J} - \theta_{\tau}\vert \leq\lambda \sigma\sqrt{w_{ii,J}}
$

occur with high probability for some sufficiently large constants $ \lambda$ and $ \mu$. The adaptive estimation procedure therefore roughly corresponds to a family of tests to check whether $ \widehat\theta_I$ does not differ significantly from $ \widehat\theta_J$. The latter is done on the basis of the triangle inequality and of equation (15.4) which assigns a large probability to the event

$\displaystyle \vert\widehat\theta_{i,I} - \widehat\theta_{i,J}\vert \leq
\mu \sigma\sqrt{w_{ii,I}} + \lambda \sigma\sqrt{w_{ii,J}}
$

under the assumption of homogeneity within $ I$, provided that $ \mu$ and $ \lambda$ are sufficiently large. Therefore, if there exists an interval $ J\subset I$ such that the hypothesis $ \widehat\theta_{i,I}=\widehat\theta_{i,J}$ cannot be accepted, we reject the hypothesis of time homogeneity for the interval $ I$. Finally, our adaptive estimator corresponds to the largest interval $ I$ such that the hypothesis of homogeneity is not rejected for $ I$ itself and all smaller intervals.


15.1.1 The adaptive estimator

Now we present a formal description. Suppose that a family $ \mathcal{I}$ of interval candidates $ I$ is fixed. Each of them is of the form $ I = [\tau - m, \tau]$, so that the set $ \mathcal{I}$ is ordered due to $ m$. With every such interval we associate an estimate $ \widehat\theta_{i,I}$ of the parameter $ \theta_{i,\tau}$ and the corresponding conditional standard deviation $ \sqrt{w_{ii,I}}$. Next, for every interval $ I$ from $ \mathcal{I}$, we suppose to be given a set $ \mathcal{J}(I)$ of testing subintervals $ J$. For every $ J \in \mathcal{J}(I)$, we construct the corresponding estimate $ \widehat\theta_{i,J}$ from the observations for $ t \in J$ and compute $ \sqrt{w_{ii,J}}$. Now, with two constants $ \mu$ and $ \lambda$, define the adaptive choice of the interval of homogeneity by the following iterative procedure:

The adaptive estimator $ \widehat\theta_{\tau}$ of $ \theta_{\tau}$ is defined by applying the selected interval $ \widehat{I}$:

$\displaystyle \widehat\theta_{i,\tau} = \widehat\theta_{i,\widehat{I}}$    for $\displaystyle i = 1,\ldots, p.
$

As for the variance estimation, note that the previously described procedure requires the knowledge of the variance $ \sigma^2$ of the errors. In practical applications, $ \sigma^2$ is typically unknown and has to be estimated from the data. The regression representation (15.1) and local time homogeneity suggests to apply a residual-based estimator. Given an interval $ I = [\tau - m, \tau]$, we construct the parameter estimate $ \widehat\theta_I$. Next the pseudo-residuals $ \widehat{\varepsilon}_t$ are defined as $ \widehat{\varepsilon}_t = Y_t - X_t^{\top}\widehat\theta_I$. Finally the variance estimator is defined by averaging the squared pseudo-residuals:

$\displaystyle \widehat{\sigma}^2 = \frac{1}{\vert I\vert} \sum_{t\in I}\widehat{\varepsilon}_t^2.$    


15.1.2 A small simulation study

The performance of the adaptive estimator is evaluated with data from the following process:

$\displaystyle Y_t = \theta_{1,t} + \theta_{2,t} X_{2,t} + \theta_{3,t} X_{3,t} + \sigma\varepsilon_t.$    

The length of the sample is 300. The regressors $ X_2$ and $ X_3$ are two independent random walks. The regressor coefficients are constant in the first half of the sample, then they make a jump after which they continue being constant until the end of the sample. We simulate three models with jumps of different magnitude. The values of the simulated models are presented in Table 15.1.


Table 15.1: Simulated models.
$ 1\leq t\leq 150$ $ 151 \leq t \leq 300 $
  large jump medium jump small jump
$ \theta_{1,t} =1 $ $ \theta_{1,t}= .85 $ $ \theta_{1,t}=.99$ $ \theta_{1,t}=.9995$
$ \theta_{2,t} = .006 $ $ \theta_{2,t}=.0015$ $ \theta_{2,t}= .004$ $ \theta_{2,t}=.0055$
$ \theta_{3,t} = .025 $ $ \theta_{3,t}= .04$ $ \theta_{3,t}=.028$ $ \theta_{3,t}= .0255$
     


The error term $ \varepsilon _t$ is a standard Gaussian white noise, and $ \sigma = 10^{-2}$. Note that the average value of $ \sigma\vert\varepsilon_t\vert$ equals $ 10^{-2}\sqrt{2/\pi}\approx 0.008$, therefore the small jump of magnitude $ 0.0005$ is clearly not visible by eye. For each of the three models above $ 100$ realizations of the white noise $ \varepsilon _t$ are generated and the adaptive estimation is performed.

In order to implement the procedure we need two parameters: $ \mu$ and $ \lambda$, and two sets of intervals: $ \mathcal{I}$ and $ \mathcal{J}(I)$. As far as the latter are concerned the simplest proposal is to use a regular grid $ G = \{t_k \}$ with $ t_k
= m_0 k$ for some integer $ m_0$ and with $ \tau = t_{k^*}$ belonging to the grid. We next consider the intervals $ I_k = [t_k, t_{k^*}[ =[t_k, \tau[$ for all $ t_k<t_{k^*}= \tau$. Every interval $ I_k$ contains exactly $ k^* - k$ smaller intervals $ J' =
[t_{k'}, t_{k^*}[$. So that for every interval $ I_k = [t_k, t_{k^*}[$ and $ k':k<k'<k^*$ we define the set $ \mathcal{J}(I_k)$ of testing subintervals $ J'$ by taking all smaller intervals with right end point $ t_{k^*}$: $ J' =
[t_{k'}, t_{k^*}[$ and all smaller intervals with left end point $ t_k$: $ J'=[t_k, t_{k'}[$:

$\displaystyle \mathcal{J}(I_k) =\{ J=[t_{k'}, t_{k^*}[$    or $\displaystyle J=[t_k,
t_{k'}[:k<k'<k^*\} .
$

The testing interval sets $ \mathcal{I}$ and $ \mathcal{J}(I)$ are therefore identified by the parameter $ m_0$: the grid step.

We are now left with the choice of three parameters: $ \lambda$, $ \mu$ and $ m_0$. These parameters act as the smoothing parameters in the classical nonparametric estimation. The value of $ m_0$ determines the number of points at which the time homogeneity is tested and it defines the minimal delay after which a jump can be discovered. Simulation results have shown that small changes of $ m_0$ do not essentially affect the results of the estimation and, depending on the number of parameters to be estimated, it can be set between 10 and 50.

The choice of $ \lambda$ and $ \mu$ is more critical because these parameters determine the acceptance or the rejection of the interval of time homogeneity as it can be seen from equation (15.6). Large values of $ \lambda$ and $ \mu$ reduce the sensitivity of the algorithm and may delay the detection of the change point, while small values make the procedure more sensitive to small changes in the values of the estimated parameters and may increase the probability of a type-I error.

For the simulation, we set: $ m_0=30$, $ \lambda = 2$ and $ \mu = 4$, while a rule for the selection of $ \lambda$ and $ \mu$ for real application will be discussed in the next section. Figure 15.2 shows the results of the simulation. The true value of the coefficients is plotted ( $ \theta_{1,t}$: first row, $ \theta_{2,t}$: second row, $ \theta_{3,t}$: third row) along with the median, the maximum and the minimum of the estimates from all realizations for each model at each time point.

Figure 15.2: On-line estimates of the regression coefficients with jumps of different magnitude. Median (thick dotted line), maximum and minimum (thin dotted line) among all estimates.
\includegraphics[width=1.45\defpicwidth]{allsimu.ps}

The simulation results are very satisfactory. The change point is quickly detected, almost within the minimal delay of $ 30$ periods for all three models, so that the adaptive estimation procedure show a good performance even for the small jump model.