4.2 Linear Stationary Models for Time Series

A stochastic process $\left\{y_{t}\right\}_{t= -\infty}^{\infty}$ is a model that describes the probability structure of a sequence of observations over time. A time series $y_{t}$ is a sample realization of a stochastic process that is observed only for a finite number of periods, indexed by $t= 1, \dots, T$ .

Any stochastic process can be partially characterized by the first and second moments of the joint probability distribution: the set of means, $\mu_t \, =\textrm{E}y_t$ , and the set of variances and covariances $cov(y_t, y_s) \, =\,\textrm{E}(y_t - \mu_t) (y_s - \mu_s), \, \forall t, \, s$ . In order to get consistent forecast methods, we need that the underlying probabilistic structure would be stable over time. So a stochastic process is called weak stationary or covariance stationary when the mean, the variance and the covariance structure of the process is stable over time, that is:

$\displaystyle \textrm{E}\, y_t$	$\displaystyle =$	$\displaystyle \mu < \infty$	(4.1)
$\displaystyle E (y_t - \mu)^2$	$\displaystyle =$	$\displaystyle \gamma_0 < \infty$	(4.2)
$\displaystyle E (y_t - \mu) (y_s - \mu)$	$\displaystyle =$	$\displaystyle \gamma_{\vert t-s\vert} \qquad\quad \forall t, s \quad t \neq s$	(4.3)

Given condition (4.3), the covariance between

and

depends only on the displacement $\vert t-s\vert=j$ and it is called autocovariance at lag

, $\gamma_j$ . The set of autocovariances $\gamma_j$ , $j =0, \pm 1, \pm 2, \dots$ , is called the autocovariance function of a stationary process.

The general Autoregressive Moving Average model

is a linear stochastic model where the variable

is modelled in terms of its own past values and a disturbance. It is defined as follows:

$\displaystyle y_t$	$\displaystyle =$	$\displaystyle \delta + \phi_1 y_{t-1} + \phi_2 y_{t-2} + \dots + \phi_p y_{t-p} + u_t$	(4.4)
$\displaystyle u_t$	$\displaystyle =$	$\displaystyle \varepsilon_t + \theta_1 \varepsilon_{t-1} + \theta_2 \varepsilon_{t-2} + \dots + \theta_q \varepsilon_{t-q}$
$\displaystyle \varepsilon_t$	$\displaystyle \sim$	$\displaystyle i.i.d.(0, \sigma^2_{\varepsilon})$

where the random variable $\varepsilon_t$ is called the innovation because it represents the part of the observed variable

that is unpredictable given the past values $y_{t-1}, y_{t-2}, \dots$ .

The general

model (4.4) assumes that

is the output of a linear filter that transforms the past innovations $\varepsilon_{t-i}, i= 0, 1, \dots, \infty$ , that is,

is a linear process. This linearity assumption is based on the Wold's decomposition theorem (Wold; 1938) that says that any discrete stationary covariance process

can be expressed as the sum of two uncorrelated processes,

$\displaystyle u_t = \sum_{i=0}^\infty \psi_i \varepsilon_{t-i} \quad \hbox{with} \quad \psi_0 = 1, \qquad \sum_{i=0}^\infty \psi_i^2 < \infty$

(4.6)

The

formulation (4.4) is a finite reparametrization of the infinite representation (4.5)-(4.6) with

constant. It is usually written in terms of the lag operator

defined by $L^j y_t=y_{t-j}$ , that gives a shorter expression:

$\displaystyle (1 - \phi_1 L - \dots - \phi_p L^p) y_t$	$\displaystyle =$	$\displaystyle \delta + (1 + \theta_1 L + \dots + \theta_q L^q) \varepsilon_t$
$\displaystyle \Phi(L) y_t$	$\displaystyle =$	$\displaystyle \delta + \Theta(L) \varepsilon_t$	(4.7)

where the lag operator polynomials $\Theta(L)$ and $\Phi(L)$ are called the

polynomial and the

polynomial, respectively. In order to avoid parameter redundancy, we assume that there are not common factors between the

and the

components.

Next, we will study the plot of some time series generated by stationary

models with the aim of determining the main patterns of their temporal evolution. Figure 4.2 includes two series generated from the following stationary processes computed by means of the genarma quantlet:

Series 1:	$y1_t = 1.4\, y1_{t-1} - 0.8\, y1_{t-2} + \varepsilon_t,$	$\varepsilon_t \sim N.I.D.(0,1)$
*[2mm] Series 2:	$y2_t = 0.9+ 0.7\, y2_{t-1} + 0.5 \varepsilon_{t-1} + \varepsilon_t,$	$\varepsilon_t \sim N.I.D.(0,1)$

**Figure 4.2:** Time series generated by models
$\includegraphics[width=0.7\defpicwidth]{genseries1.ps}$ $\includegraphics[width=0.7\defpicwidth]{genseries2.ps}$ `XEGutsm02.xpl`

As expected, both time series move around a constant level without changes in variance due to the stationary property. Moreover, this level is close to the theoretical mean of the process, $\mu$ , and the distance of each point to this value is very rarely outside the bounds $\pm 2 \sigma$ . Furthermore, the evolution of the series shows local departures from the mean of the process, which is known as the mean reversion behavior that characterizes the stationary time series.

Let us study with some detail the properties of the different

processes, in particular, the autocovariance function which captures the dynamic properties of a stochastic stationary process. This function depends on the units of measure, so the usual measure of the degree of linearity between variables is the correlation coefficient. In the case of stationary processes, the autocorrelation coefficient at lag

, denoted by $\rho_j$ , is defined as the correlation between

and $y_{t-j}$ :

$\displaystyle \rho_{j} = \frac{cov(y_t,y_{t-j})}{\sqrt{V(y_t)}\sqrt{V(y_{t-j})}}\, =\, \frac{\gamma_{j}}{\gamma_0}, \qquad j = 0, \pm 1, \pm 2, \dots$

Thus, the autocorrelation function (ACF) is the autocovariance function standarized by the variance $\gamma_0$ . The properties of the ACF are:

$\displaystyle \rho_0$	$\displaystyle =$	$\displaystyle 1$	(4.8)
$\displaystyle \vert\rho_j\vert$	$\displaystyle \leq$	$\displaystyle 1$	(4.9)
$\displaystyle \rho_j$	$\displaystyle =$	$\displaystyle \rho_{-j}$	(4.10)

Given the symmetry property (4.10), the ACF is usually represented by means of a bar graph at the nonnegative lags that is called the simple correlogram.

Another useful tool to describe the dynamics of a stationary process is the partial autocorrelation function (PACF). The partial autocorrelation coefficient at lag

measures the linear association between

and $y_{t-j}$ adjusted for the effects of the intermediate values $y_{t-1}, \dots, y_{t-j+1}$ . Therefore, it is just the coefficient $\phi_{jj}$ in the linear regression model:

$\displaystyle y_t = \alpha + \phi_{j1} y_{t-1} + \phi_{j2} y_{t-2} + \dots +\phi_{jj} y_{t-j} + e_t$

(4.11)

The properties of the PACF are equivalent to those of the ACF (4.8)-(4.10) and it is easy to prove that $\phi_{11} = \rho_1$ (Box and Jenkins; 1976). Like the ACF, the partial autocorrelation function does not depend on the units of measure and it is represented by means of a bar graph at the nonnegative lags that is called partial correlogram.

The dynamic properties of each stationary model determine a particular shape of the correlograms. Moreover, it can be shown that, for any stationary process, both functions, ACF and PACF, approach to zero as the lag

tends to infinity. The

models are not always stationary processes, so it is necessary first to determine the conditions for stationarity. There are subclasses of

models which have special properties so we shall study them separately. Thus, when

and $\delta = 0$ , it is a white noise process, when

, it is a pure moving average process of order ,

, and when

it is a pure autoregressive process of order ,

4.2.1 White Noise Process

The simplest

model is a white noise process, where

is a sequence of uncorrelated zero mean variables with constant variance $\sigma ^2$ . It is denoted by $y_t \sim WN(0, \sigma^2)$ . This process is stationary if its variance is finite, $\sigma^2< \infty$ , since given that:

$\begin{displaymath}\begin{array}{rclcl} E y_t &=& 0 &\qquad& \forall t\\ V(y_t)... ...forall t\\ Cov(y_t, y_s) &=& 0 & & \forall t\neq s \end{array}\end{displaymath}$

verifies conditions (4.1)-(4.3). Moreover,

is uncorrelated over time, so its autocovariance function is:

$\displaystyle \gamma_j$

$\displaystyle =$

$\displaystyle \left\{\begin{array}{lll} \sigma^2 && j = 0\\ 0 && j\neq1 \end{array}\right.$

$\rho_j \, = \, \left\{\begin{array}{lll} 1 && j = 0\\ 0 && j\neq1 \end{array}\right.$

$\phi_{jj} \, = \, \left\{\begin{array}{lll} 1 && j = 0\\ 0 && j\neq1 \end{array}\right.$

To understand the behavior of a white noise, we will generate a time series of size 150 from a gaussian white noise process $y_t \sim N.I.D.(0,1)$ . Figure 4.3 shows the simulated series that moves around a constant level randomly, without any kind of pattern, as corresponds to the uncorrelation over time. The economic time series will follow white noise patterns very rarely, but this process is the key for the formulation of more complex models. In fact, it is the starting point of the derivation of the properties of

processes given that we are assuming that the innovation of the model is a white noise.

**Figure 4.3:** Realization from a white noise process
$\includegraphics[width=0.8\defpicwidth]{wn1.ps}$ `XEGutsm03.xpl`

4.2.2 Moving Average Model

$\displaystyle y_t$	$\displaystyle =$	$\displaystyle \delta + \varepsilon_t + \theta_1 \varepsilon_{t-1} + \dots + \theta_q \varepsilon_{t-q}$	(4.12)
$\displaystyle y_t$	$\displaystyle =$	$\displaystyle \delta + \Theta(L) \varepsilon_t, \hspace*{3.5cm} \varepsilon_t \,\sim \, \hbox{WN}(0, \sigma^2_{\varepsilon})$

It can be easily shown that

processes are always stationary, given that the parameters of any finite

processes always verify condition (4.6). Moreover, we are interested in invertible

processes. When a process is invertible, it is possible to invert the process, that is, to express the current value of the variable

in terms of a current shock $\varepsilon_t$ and its observable past values $y_{t-1}, y_{t-2}, \dots$ . Then, we say that the model has an autoregressive representation. This requirement provides a sensible way of associating present events with past happenings. A

model is invertible if the

roots of the characteristic equation $\Theta(L) = 0$ lie outside the unit circle. When the root

is real, this condition means that the absolute value must be greater than unity, $\vert R_j\vert >1$ . If there are a pair of complex roots, they may be written as $R_j = a \pm b i$ , where

are real numbers and $i=\sqrt{-1}$ , and then the invertibility condition means that its moduli must be greater than unity, $\sqrt{a^2 + b^2} >1$ .

$\displaystyle y_t$	$\displaystyle =$	$\displaystyle \delta + \varepsilon_t + \theta \varepsilon_{t-1}, \qquad \varepsilon_t \sim \, WN(0, \sigma^2_{\varepsilon})$
$\displaystyle y_t$	$\displaystyle =$	$\displaystyle \delta + (1 + \theta L) \varepsilon_t$

Let us study this simple

process in detail. Figure 4.4 plots simulated series of length 150 from two

processes where the parameters $(\delta, \theta)$ take the values (0, 0.8) in the first model and (4, -0.5) in the second one. It can be noted that the series show the general patterns associated with stationary and mean reversion processes. More specifically, given that only a past innovation $\varepsilon_{t-1}$ affects the current value of the series

(positively for $\theta >0$ and negatively for $\theta<0$ ), the

process is known as a very short memory process and so, there is not a 'strong' dynamic pattern in the series. Nevertheless, it can be observed that the time evolution is smoother for the positive value of $\theta$ .

**Figure 4.4:** Realizations of models with $\varepsilon _{t}\sim N.I.D.(0,1)$
$\includegraphics[width=0.7\defpicwidth]{ma18.ps}$ $\includegraphics[width=0.7\defpicwidth]{ma1-5.ps}$ `XEGutsm04.xpl`

$\begin{displaymath} \begin{array}{rclcl} E y_t &=& \textrm{E}(\delta + \varepsil... ...heta \varepsilon_{t-j-1}) & =&0 \qquad \forall j >1 \end{array}\end{displaymath}$

given that, for all

and for all

, the innovations $\varepsilon_t, \varepsilon_{t-1}$ are uncorrelated with $\varepsilon_{t-j}, \varepsilon_{t-j-1}$ . Then, the autocorrelation function is:

$\displaystyle \rho_j$

$\displaystyle =$

$\displaystyle \left\{\begin{array}{cll} \displaystyle\frac{\theta}{1 + \theta^2} && j = 1\\ *[2mm] 0 && j > 1 \end{array}\right.$

**Figure 4.5:** Population ACF and PACF for
$\includegraphics[width=0.7\defpicwidth]{ma8s.ps}$ $\includegraphics[width=0.7\defpicwidth]{ma8p.ps}$ $\includegraphics[width=0.7\defpicwidth]{ma-5s.ps}$ $\includegraphics[width=0.7\defpicwidth]{ma-5p.ps}$

That is, there is a cutoff in the ACF at the first lag. Finally, the partial autocorrelation function shows an exponential decay. Figure 4.5 shows typical profiles of this ACF jointly with the PACF.

**Figure 4.6:** Population ACF and PACF for processes
$\includegraphics[width=0.7\defpicwidth]{ma2-1s.ps}$ $\includegraphics[width=0.7\defpicwidth]{ma2-1p.ps}$ $\includegraphics[width=0.7\defpicwidth]{ma2-2s.ps}$ $\includegraphics[width=0.7\defpicwidth]{ma2-2p.ps}$

It can be shown that the general stationary and invertible

process has the following properties (Box and Jenkins; 1976):

Figure 4.6 shows the simple and partial correlograms for two different

processes. Both ACF exhibit a cutoff at lag two. The roots of the

polynomial of the first series are real, so the PACF decays exponentially while for the second series with complex roots the PACF decays as a damping sine-cosine wave.

4.2.3 Autoregressive Model

$\displaystyle y_t$	$\displaystyle =$	$\displaystyle \delta + \phi_1 y_{t-1} + \dots + \phi_p y_{t-p} + \varepsilon_t$	(4.13)
$\displaystyle \Phi(L) y_t$	$\displaystyle =$	$\displaystyle \delta + \varepsilon_t, \hspace*{3.5cm} \varepsilon_t \sim \hbox{WN}(0, \sigma^2_{\varepsilon})$

Let us begin with the simplest

process, the autoregressive process of first order,

, that is defined as:

$\displaystyle y_t$	$\displaystyle =$	$\displaystyle \delta + \phi \,y_{t-1} + \varepsilon_t$	(4.14)
$\displaystyle (1 - \phi L)\, y_t$	$\displaystyle =$	$\displaystyle \delta + \varepsilon_t, \hspace*{1.5cm} \varepsilon_t \sim \, WN(0, \sigma^2_{\varepsilon})$

**Figure 4.7:** Realizations of models with $\varepsilon _{t}\sim N.I.D.(0,1)$
$\includegraphics[width=0.7\defpicwidth]{ar17.ps}$ $\includegraphics[width=0.7\defpicwidth]{ar1-7.ps}$ `XEGutsm05.xpl`

Figure 4.7 shows two simulated time series generated from

processes with zero mean and parameters $\phi=0.7$ and -0.7, respectively. The autoregressive parameter measures the persistence of past events into the current values. For example, if $\phi>0$ , a positive (or negative) shock $\varepsilon_t$ affects positively (or negatively) for a period of time which is longer the larger the value of $\phi$ . When $\phi <0$ , the series moves more roughly around the mean due to the alternation in the direction of the effect of $\varepsilon_t$ , that is, a shock that affects positively in moment

, has negative effects on

, positive in

, ...

The

process is always invertible and it is stationary when the parameter of the model is constrained to lie in the region $-1<\phi <1$ . To prove the stationary condition, first we write the

in the moving average form by recursive substitution of $y_{t-i}$ in (4.14):

$\displaystyle y_t =\delta \sum_{i=0}^{\infty} \phi^i + \sum_{i=0}^{\infty} \phi^i \varepsilon_{t-i}$

(4.15)

**Figure 4.8:** Population correlograms for processes
$\includegraphics[width=0.7\defpicwidth]{ar7s.ps}$ $\includegraphics[width=0.7\defpicwidth]{ar7p.ps}$ $\includegraphics[width=0.7\defpicwidth]{ar-7s.ps}$ $\includegraphics[width=0.7\defpicwidth]{ar-7p.ps}$

That is,

is a weighted sum of past innovations. The weights depend on the value of the parameter $\phi$ : when $\vert\phi\vert>1$ , (or $\vert\phi\vert <1$ ), the influence of a given innovation $\varepsilon_t$ increases (or decreases) through time. Taking expectations to (4.15) in order to compute the mean of the process, we get:

$\displaystyle \textrm{E}\, y_t = \delta \sum_{i=0}^{\infty} \phi^i + \sum_{i=0}^{\infty} \phi^i\textrm{E}\, \varepsilon_{t-i}$

Given that $E \varepsilon_{t-i}= 0$ , the result is a sum of infinite terms that converges for all value of $\delta$ only if $\vert\phi\vert <1$ , in which case $E y_t = \delta (1-\phi)^{-1}$ . A similar problem appears when we compute the second moment. The proof can be simplified assuming that $\delta = 0$ , that is,

. Then, variance is:

$\displaystyle V(y_t)$	$\displaystyle =$	$\displaystyle \textrm{E}\left(\sum_{i=0}^{\infty} \phi^i \varepsilon_{t-i} \right)^2$
$\displaystyle *[2mm]$	$\displaystyle =$	$\displaystyle \sum_{i=0}^{\infty} \phi^{2i} V(\varepsilon_{t-i}) \, =\, \sigma^2_{\varepsilon} \sum_{i=1}^{\infty}\phi^{2i}$

Again, the variance goes to infinity except for $-1<\phi <1$ , in which case $V(y_t)= \sigma^2_{\varepsilon} (1-\phi^2)^{-1}$ . It is easy to verify that both the mean and the variance explode when that condition doesn't hold.

$\displaystyle \gamma_j = \textrm{E}\left\{(\phi y_{t-1} + \varepsilon_t)y_{t-j}... ...ma^2_{\varepsilon} \displaystyle (1-\phi^2)^{-1} \; \phi^j \qquad \forall j >0$

$\displaystyle \rho_j = \frac{\phi \gamma_{j-1}}{\gamma_0} = \phi \rho_{j-1} = \phi^j \qquad \forall j$

That is, the correlogram shows an exponential decay with positive values always if $\phi$ is positive and with negative-positive oscillations if $\phi$ is negative (see figure 4.8). Furthermore, the rate of decay decreases as $\phi$ increases, so the greater the value of $\phi$ the stronger the dynamic correlation in the process. Finally, there is a cutoff in the partial autocorrelation function at the first lag.

**Figure 4.9:** Population correlograms for processes
$\includegraphics[width=0.7\defpicwidth]{ar2-1s.ps}$ $\includegraphics[width=0.7\defpicwidth]{ar2-1p.ps}$ $\includegraphics[width=0.7\defpicwidth]{ar2-2s.ps}$ $\includegraphics[width=0.7\defpicwidth]{ar2-2p.ps}$ $\includegraphics[width=0.7\defpicwidth]{ar2-3s.ps}$ $\includegraphics[width=0.7\defpicwidth]{ar2-3p.ps}$

Some examples of correlograms for more complex

models, such as the

, can be seen in figure 4.9. They are very similar to the

patterns when the processes have real roots, but take a very different shape when the roots are complex (see the first pair of graphics of figure 4.9).

4.2.4 Autoregressive Moving Average Model

The general (finite-order) autoregressive moving average model of orders

, is:

$\displaystyle y_t$	$\displaystyle =$	$\displaystyle \delta + \phi_1 y_{t-1} + \dots + \phi_p y_{t-p} + \varepsilon_t + \theta_1 \varepsilon_{t-1} + \dots + \theta_q \varepsilon_{t-q}$
$\displaystyle \Phi(L) y_t$	$\displaystyle =$	$\displaystyle \delta + \Theta(L) \varepsilon_t, \hspace*{3.5cm} \varepsilon_t \,\sim \, \hbox{WN}(0, \sigma^2_{\varepsilon})$

$\displaystyle y_t$	$\displaystyle =$	$\displaystyle \delta + \phi y_{t-1} + \theta \varepsilon_{t-1} + \varepsilon_t$
$\displaystyle (1 - \phi L)\, y_t$	$\displaystyle =$	$\displaystyle \delta + (1 + \theta L) \varepsilon_t, \qquad \qquad \varepsilon_t \sim \, WN(0, \sigma^2_{\varepsilon})$

This model is stationary if $\vert\phi\vert <1$ and is invertible if $\vert\theta\vert<1$ . The mean of the

stationary process can be derived as follows:

$\displaystyle E y_t = \delta + \phi\textrm{E}y_{t-1} + \theta\textrm{E}\varepsilon_{t-1} + \textrm{E}\varepsilon_t$

**Figure 4.10:** Population correlograms for processes
$\includegraphics[width=0.7\defpicwidth]{arma11-1s.ps}$ $\includegraphics[width=0.7\defpicwidth]{arma11-1p.ps}$ $\includegraphics[width=0.7\defpicwidth]{arma11-2s.ps}$ $\includegraphics[width=0.7\defpicwidth]{arma11-2p.ps}$

The autovariance function for an

stationary process (assuming $\delta = 0$ ) is as follows:

$\begin{displaymath} \begin{array}{rclcl} \gamma_0 &=& \textrm{E}(\phi y_{t-1} + ... ...j}\right\} & =&\phi \gamma_{j-1} \qquad \forall j>1 \end{array}\end{displaymath}$

$\displaystyle \rho_j = \displaystyle\left\{\begin{array}{lcr} \phi + \displayst... ...a^2 + 2\theta \phi} && j=1 \\ *[2mm] \phi \rho_{j-1} && j>1 \end{array}\right.$