17.2 Nonparametric ARMA Estimates

GARCH processes are closely related to ARMA processes. If we square a GARCH
[3](1,1) process $ \{ \varepsilon _t\} $ given by (17.1) then we get an ARMA(1,1) process

$\displaystyle \varepsilon _t^ 2 = \omega + (\alpha + \beta) \, \varepsilon _{t-1}^ 2 - \beta \, \zeta
_{t-1} + \zeta _t, $

where $ \zeta_t = \sigma_t^ 2 (Z_t^2 - 1)$ is white noise, i.e. a sequence of pairwise uncorrelated random variables, with mean 0. Therefore, we study as an intermediate step towards GARCH processes the nonparametric estimation for ARMA models which is more closely related to the errors-in-variables regression of Fan and Truong (1993). A linear ARMA(1,1) model with non-vanishing mean $ \omega$ is given by

$\displaystyle X_{t+1} = \omega + a\ X_t + b\ e_t + e_{t+1} $

with zero-mean white noise $ e_t$. We consider the nonparametric generalization of this model

$\displaystyle X_{t+1} = f(X_t, e_t) + e_{t+1}$ (17.3)

for some unknown function $ f(x,u)$ which is monotone in the second argument $ u$. Assume we have a sample $ X_1, \ldots, X_{N+1}$ observed from (17.3). If $ f$ does not depend on the second argument, (17.3) reduces to a nonparametric autoregression of order 1

$\displaystyle X_{t+1} = f(X_t) + e_{t+1} $

and the autoregression function $ f(x)$ may be estimated by common kernel estimates or local polynomials. There exists extensive literature about that type of estimation problem, and we refer to the review paper of Härdle, Lütkepohl and Chen (1997). In the general case of (17.3) we again have the problem of estimating a function of (partially) non-observable variables. As $ f$ depends also on the observable time series $ X_t$, the basic idea of constructing a nonparametric estimate of $ f(x,u)$ is to combine a common kernel smoothing in the first variable $ x$ with a deconvolution kernel smoothing in the second variable $ u.$ To define the estimate we have to introduce some notation and assumptions.

We assume that the innovations $ e_t$ have a known probability density $ p_e$ with distribution function $ P_e(v) = \int_{-\infty}^v p_e(u)\, du$ and with Fourier transform $ \phi_e (\omega) \not= 0 $ for all $ \omega$ and

$\displaystyle \vert\phi_e (\omega) \vert \ge c \cdot \vert\omega \vert ^ {\beta _0} \exp (-\vert\omega \vert^ \beta
/\gamma )$   for$\displaystyle \quad \vert\omega\vert \longrightarrow \infty $

for some constants $ c, \beta, \gamma > 0,\ \, \beta_0 .$ The nonlinear ARMA process (17.3) has to be stationary and strongly mixing with exponentially decaying mixing coefficients. Let $ p(x)$ denote the density of the stationary marginal density of $ X_t.$

The smoothing kernel $ K^ x $ in $ x$-direction is a common kernel function with compact support $ [-1, +1]$ satisfying $ 0 \le K^ x (u) \le K^ x (0)$ for all $ u$. The kernel $ K$ which is used in the deconvolution part has a Fourier transform $ \phi _K (\omega)$ which is symmetric around 0, has compact support $ [-1, +1]$ and satisfies some smoothness conditions (Holzberger; 2001). We have chosen a kernel with the following Fourier transform:

\begin{displaymath}\begin{array}{rcll}
\phi _K (u) & =& 1 - u^2 &\textrm{for} \q...
...7 & \textrm{for} \quad 0.5 \le \vert u\vert \le 1.
\end{array}\end{displaymath}

For convenience, we use the smoothing kernel $ K^ x $ to be proportional to that function: $ K^x(u) \propto \phi _K (u)$. The kernel $ K^ x $ is hence an Epanechnikov kernel with modified boundaries.

Let $ b = C/N^ {1/5}$ be the bandwidth for smoothing in $ x$-direction, and let $ h = A/\log (N)$ be the smoothing parameter for deconvolution in $ u$-direction where $ A > \pi/2$ and $ C>0$ are some constants. Then,

$\displaystyle \widehat{p}_b (x) = \frac{1}{(N+1)b} \, \sum^ {N+1}_{t=1} K^ x \left(\frac{x-X_t}{b}\right) $

is a common Rosenblatt-Parzen density estimate for the stationary density $ p(x).$

Let $ q(u)$ denote the stationary density of the random variable $ f(X_t,
e_t),$ and let $ q(u\vert x)$ be its conditional density given $ X_t = x.$ An estimate of the latter is given by

$\displaystyle \widehat{q}_{b,h} (u\vert x) = \frac{1}{Nhb} \, \sum^ N_{t=1} K^h...
...-X_{t+1}}{h}\right)\, K^ x \left( \frac{x-X_t}{b} \right)\ /\ \widehat{p}_b (x)$ (17.4)

where the deconvolution kernel $ K^h$ is

$\displaystyle K^h (u) = \frac{1}{2\pi} \int^ \infty_{-\infty} e^ {-i\omega u}
\frac{\phi_K (\omega)}{\phi_e (\omega / h)} \,d\omega \, . $

In (17.4) we use a deconvolution smoothing in the direction of the second argument of $ f(x,u)$ using only pairs of observations $ (X_t, X_{t+1})$ for which $ \vert x - X_t \vert \le b,$ i.e. $ X_t \approx x. $ By integration, we get the conditional distribution function of $ f(X_t, e_t)$ given $ X_t = x $

$\displaystyle Q(v\vert x) = \textrm{P}(f(x,e_t) \le v\vert X_t = x)
= \int^ v_{-\infty} q(u\vert x) \,du $

and its estimate

$\displaystyle \widehat{Q}_{b,h} (v\vert x) = \int^ v_{-a_N} \widehat{q}_{b,h} (u\vert x) du \bigg/
\int^ {a_N}_{-a_N} \widehat{q}_{b,h} (u\vert x) \,du
$

for some $ a_N \sim N^ {1/6} $ for $ N\rightarrow \infty.$ Due to technical reasons we have to cut off the density estimate in regions where it is still unreliable for given $ N$. The particular choice of denominator guarantees that $ \widehat{Q}_{b,h} (a_N\vert x) = 1$ in practice, since $ Q(v\vert x)$ is a cumulative distribution function.

To estimate the unconditional density $ q(u)$ of $ f(X_t, e_t) = X_{t+1} - e_{t+1},$ we use a standard deconvolution density estimate with smoothing parameter $ h^* = A^*/\log (N)$

$\displaystyle \widehat{q}_{h^*} (u) = \frac{1}{Nh^*} \, \sum^ N_{t=1} K_{h^*} \left( \frac{u-X_t}{h^*}
\right). $

Let $ p_e (u\vert x)$ be the conditional density of $ e_t$ given $ X_t = x,$ and let $ P_e (v\vert x) = \int^ v_{-\infty} p_e(u\vert x) \,du $ be the corresponding conditional distribution function. An estimate of it is given as

$\displaystyle \widehat{P}_{e,h^*} (v\vert x) = \int^ v_{-a_N} \widehat{q}_{h^*}...
...\ p_e (u) du \bigg/
\int^ {a_N}_{-a_N} \widehat{q}_{h^*} (x-u) \, p_e (u) \,du $

where again we truncate at $ a_N \sim N^ {1/6}.$

To obtain the ARMA function $ f$, we can now compare $ Q(v\vert x)$ and $ P_e (v\vert x)$. In practice this means to relate $ \widehat{Q}_{b,h} (v\vert x)$ and $ \widehat{P}_{e,h^*} (v\vert x)$. The nonparametric estimate for the ARMA function $ f(x,v)$ depending on smoothing parameters $ b, h$ and $ h^*$ is hence given by

$\displaystyle \widehat{f} _{b,h,h^*} (x, v) = \widehat{Q}_{b,h}^ {-1} (\widehat{P}_{e,h^*} (v\vert x) \, \vert x) $

if $ f(x,v)$ is increasing in the second argument, and

$\displaystyle \widehat{f} _{b,h,h^*} (x, v) = \widehat{Q}_{b,h}^ {-1} (1 - \widehat{P}_{e,h^*} (v\vert x) \, \vert x) $

if $ f(x,v)$ is a decreasing function of $ v$ for any $ x$. $ \widehat{Q}_{b,h}^ {-1} (\cdot \vert x)$ denotes the inverse of the function $ \widehat{Q}_{b,h} (\cdot \vert x)$ for fixed $ x$. Holzberger (2001) has shown that $ \widehat{f}_{b,h,h^*} (x,v)$ is a consistent estimate for $ f(x,v)$ under suitable assumptions and has given upper bounds on the rates of bias and variance of the estimate. We remark that the assumption of monotonicity on $ f$ is not a strong restriction. In the application to GARCH processes which we have in mind it seems to be intuitively reasonable that the volatility of today is an increasing function of the volatility of yesterday which translates into an ARMA function $ f$ which is decreasing in the second argument.

Let us illustrate the steps for estimating a nonparametric ARMA process. First we generate time series data and plot $ X_{t+1}$ versus $ X_t$.

  library("times")
  n=1000
  x=genarma(0.7,0.7,normal(n))
33189 XFGnpg01.xpl

The result is shown in Figure 17.1. The scatterplot in the right panel of Figure 17.1 defines the region where we can estimate the function $ f(x,v)$.

Figure 17.1: ARMA(1,1) process.
\includegraphics[width=0.7\defpicwidth]{XFGnpg01a.ps}\includegraphics[width=0.7\defpicwidth]{XFGnpg01b.ps}

To compare the deconvolution density estimate with the density of $ f(X_t, e_t)$ we use now our own routine (myarma) for generating ARMA(1,1) data from a known function (f):

  proc(f)=f(x,e,c)
    f=c[1]+c[2]*x+c[3]*e 
  endp

  proc(x,f)=myarma(n,c)
    x=matrix(n+1)-1
    f=x
    e=normal(n+1)
    t=1
    while (t<n+1)
      t=t+1
      f[t]=f(x[t-1],e[t-1],c)
      x[t]=f[t]+e[t]
    endo
    x=x[2:(n+1)]
    f=f[2:(n+1)]
  endp

  n=1000
  {x,f}=myarma(n,0|0.7|0.7)

  h=0.4
  library("smoother")
  dh=dcdenest(x,h)        // deconvolution estimate
  fh=denest(f,3*h)        // kernel estimate
33197 XFGnpg02.xpl

Figure 17.2: Deconvolution density estimate (solid) and kernel density estimate (dashed) of the known mean function of an ARMA(1,1) process.
\includegraphics[width=1\defpicwidth]{XFGnpg02.ps}

Figure 17.2 shows both density estimates. Note that the smoothing parameter (bandwidth $ h$) is different for both estimates since different kernel functions are used.



f = 33210 nparmaest (x {,h {,g {,N {,R } } } } )
estimates a nonparametric ARMA process


The function 33213 nparmaest computes the function $ f(x,v)$ for an ARMA process according to the algorithm described above. Let us first consider an ARMA(1,1) with $ f(x,v) = 0.3 + 0.6x + 1.6 v$, i.e.

$\displaystyle X_t = 0.3 + 0.6 X_{t-1} + 1.6e_{t-1}+ e_t.$

Hence, we use myarma with c=0.3|0.6|1.6 and call the estimation routine by
  f=nparmaest(x)
33217 XFGnpg03.xpl

The optional parameters N and R are set to 50 and 250, respectively. N contains the grid sizes used for $ x$ and $ v$. R is an additional grid size for internal computations. The resulting function is therefore computed on a grid of size N $ \times$ N. For comparison, we also calculate the true function on the same grid. Figure 17.3 shows the resulting graphs. The bandwidths h (corresponding to $ h^*$) for the one-dimensional deconvolution kernel estimator $ \widehat{q}$ and g for the two-dimensional (corresponding to $ h$ and $ b$) are chosen according to the rates derived in Holzberger (2001).

Figure 17.3: Nonparametric estimation of a (linear) ARMA process. True vs. estimated function and data.
\includegraphics[width=1.25\defpicwidth]{XFGnpg03.ps}

As a second example consider an ARMA(1,1) with a truly nonlinear function $ f(x,v) = -2.8 +8 F(6v)$, i.e.

$\displaystyle X_t = -2.8 + 8 F(6\,e_{t-1}) + e_t,$

where $ F$ denotes the sigmoid function $ F(u)=(1+e^{-u})^{-1}$ In contrast to the previous example, this function is obviously not dependent on the first argument. The code above has to be modified by using
  proc(f)=f(x,e,c)
    f=c[2]/(1+exp(-c[3]*e))+c[1]
  endp
  c=-2.8|8|6
33224 XFGnpg04.xpl

The resulting graphs for this nonlinear function are shown in Figure 17.4. The estimated surface varies obviously only in the second dimension and follows the $ s$-shaped underlying true function. However, the used sample size and the internal grid sizes of the estimation procedure do only allow for a rather imprecise reconstruction of the tails of the surface.

Figure 17.4: Nonparametric estimation of a (nonlinear) ARMA process. True vs. estimated function and data.
\includegraphics[width=1.25\defpicwidth]{XFGnpg04.ps}