16.2 Nonlinear Autoregressive Models of Higher Order

In Subsection 16.1.3 we briefly discussed diagnostics to check for the correct specification of a time series model. There we found for the lynx data set that the nonlinear autoregressive model of order one (16.2) is of too low order to capture the linear correlation in the data. For practical flexible time series modelling it is therefore necessary to allow for higher order nonlinear autoregressive models (16.1). Their estimation and the selection of relevant lags will be discussed in this section. To simplify notation, we introduce the vector of lagged variables $ {X}_t = (Y_{i_1}, Y_{i_2},\ldots,Y_{i_m})^T$ such that (16.1) can be written as

$\displaystyle Y_t = f({X}_t) + \sigma({X}_t)\xi_t$ (16.16)

.


16.2.1 Estimation of the Conditional Mean


mh = 27557 regxestp (x{, h, K, v})
computes the Nadaraya-Watson estimator for multivariate autoregression.
mh = 27560 regestp (x{, h, K, d})
Nadaraya-Watson estimator for multivariate regression. The computation uses WARPing.
mh = 27563 lregxestp (x{, h, K, v})
estimates a multivariate regression function using local polynomial kernel regression with quartic kernel.
mh = 27566 lregestp (x{, h, K, d})
estimates a multivariate regression function using local polynomial kernel regression. The computation uses WARPing.
{mA, gsqA, denA, err} = 27569 fvllc (Xsj, Yorig, h, Xtj, kernreg, lorq, fandg, loo)
estimates a multivariate regression function using local linear regression with Gaussian kernel.
It is not difficult to extend the Nadaraya-Watson estimator (16.4) and local linear estimator (16.5) to several lags in the conditional mean function $ f(\cdot)$. One then simply uses Taylor expansions of order $ p$ for several variables. In the weighted minimization problem of the local constant estimator (16.3) one has to extend the kernel function $ K_h(\cdot)$ for several lagged variables. The simplest way of doing this is to use a product kernel

$\displaystyle K_{h}({X}_t - {x}) = \prod_{j=1}^m h_j^{-1} K\left(\frac{X_{t,j}-x_j}{h_j}\right)$ (16.17)

where one $ {h}=(h_1,h_2,\ldots, h_m)^T$ is a vector of bandwidths for each lag or variable. Of course, one may also use the same bandwidth $ {h}=(h,h,\ldots, h)^T$ for all lags in which case we write $ K_{h}({X}_t
- {x})$. Using a scalar bandwidth, (16.3) becomes

$\displaystyle \widehat{c}_{0}=\textrm{arg min}_{\left\{ c_{0}\right\} } \sum_{t=i_m+1}^{T}\left\{ Y_{t}-c_{0}\right\} ^{2}K_{h}({X}_{t}-{ x})$ (16.18)

and the Nadaraya-Watson estimator is given by

$\displaystyle \widehat{f}_1({x},h)=\widehat{c}_0=\frac{\sum_{t=i_m+1}^T K_h({X}_{t}-{ x}) Y_t} {\sum_{t=i_m+1}^T K_h({X}_{t}-{x})}.$ (16.19)

Note that from now on we indicate the Nadaraya-Watson estimator and local linear estimator by the indices $ 1$ and $ 2$, respectively.

The local linear estimator with $ p = 1$ is derived from the weighted minimization

$\displaystyle \{\widehat{c}_{0},\widehat{c}_1\}=\textrm{arg min}_{\left\{ c_{0}...
...m+1}^{T}\left\{ Y_{t}-c_{0}-{c}_1({X}_{t}-{x})\right\} ^{2}K_{h}({X}_{t}-{ x}).$ (16.20)

Using the notation

\begin{displaymath}
{Z}_2=\left(
\begin{array}{ccc}
1 & \cdots & 1 \\ {X}_{i_m+...
...x}
\end{array}\right) ^T, \quad {Y} = (Y_{i_m+1},\ldots,Y_T)^T
\end{displaymath}

$\displaystyle e=(1,0_{1\times m})^T, \quad {W}=\textrm{diag}\left\{
\frac{1}{T-i_m}K_h({X}_t-{x})\right\}
_{t=i_m+1}^T,
$

the estimate $ \widehat{f}_2({x},h)=\widehat{c}_0$ can be written for any $ {x}\in
\hbox{\it I\hskip -2pt R}^m$ as

$\displaystyle \widehat{f}_2(x)=e^T\left( {Z}_2^T{W}{Z}_2\right) ^{-1}{Z}_2^T{W}{Y}.$ (16.21)

Under suitable conditions which are listed in Subsection 16.2.2 the Nadaraya-Watson estimator (16.19) and local linear estimator (16.21) have an asymptotic normal distribution

$\displaystyle T^{2/(4+m)}\left(\widehat{f}_a({x},h) - f({x})\right) \rightarrow...
...m}\frac{\sigma({ x})}{\mu({x})} \vert\vert K\vert\vert _2^2 \right),\quad a=1,2$ (16.22)

where

$\displaystyle r_1({x}) = \textrm{Tr} \left\{\nabla^2 f({ x})\right\} + 2 \nabla...
...f({x}) /\mu({x}), \quad r_2({x}) = \textrm{Tr} \left\{\nabla^2 f({ x})\right\}.$ (16.23)

Thus, the rate of convergence deteriorates with the number of lags. This feature is commonly called the `curse of dimensionality' and often viewed as a substantial drawback of nonparametric methods. One should keep in mind, however, that the $ \sqrt{T}$-rate of parametric models only holds if one estimates a model with an a priori chosen finite number of parameters which may imply a large estimation bias in case of misspecified models. If, however, one allows the number of parameters of parametric models to grow with sample size, $ \sqrt{T}$-convergence may no longer hold.

The quantlets 27578 regxestp and 27581 lregxestp compute the Nadaraya-Watson estimator (16.19) and local linear estimator (16.21) for higher order autoregressions. They are called by

  mh = regxestp(x{, h, K, v})
or
mh = lregxestp(x{, h, K, v})
with input variables
x
( $ T-i_m) \times (m+1)$ matrix of the data with the $ m$ lagged variables in the first $ m$ columns and the dependent variable in the last column,
h
scalar or $ m \times 1$ or $ 1 \times m$ vector of bandwidth for which if not given 20% of the range of the values in the first column of x is used,
K
string, kernel function on [-1,1] or Gaussian kernel "gau" for which if not given, the quartic kernel "qua" is used,
v
$ n \times m$ matrix of values of the independent variable on which to compute the regression for which if not given, a grid of length 100 ($ m = 1$), length 30 ($ m=2$) and length 8 ($ m=3$) is used in case of $ m<4$. When $ m\geq 4$ then v is set to x.
The output variable is a
mh
( $ T-i_m) \times (m+1)$ or $ n \times (m+1)$ matrix where the first $ m$ columns contain the grid or the sorted first $ m$ columns of x, the $ m+1$ column contains the regression estimate on the values of the first $ m$ columns.
As before, there are also quantlets which apply WARPing. They are called 27584 regestp and 27587 lregestp , respectively.

Since we found in Subsection 16.1.3 that a NAR(1) model is not sufficient to capture the dynamics of the lynx trappings, we compute and plot in the following quantlet the autoregression function for lag 1 and 2 for both estimators using the crude bandwidth of 20% of the data range. Note that you have to click on the graph and rotate it in order to see the regression surface.

  library("smoother")
  library("plot")
  setsize(640,480)

;                       data preparation
  lynx      = read("lynx.dat")
  lynxrows  = rows(lynx)
  lag1      = lynx[1:lynxrows-2]        ; vector of first lag
  lag2      = lynx[2:lynxrows-1]        ; vector of second lag
  y         = lynx[3:lynxrows]          ; vector of dep. var.
  data      = lag1~lag2~y
  data      = log(data)

;                       estimation
  h         = 0.2*(max(data[,1])-min(data[,1])) ; crude bandwidth
  mh        = regxestp(data,h)        ; local constant estimation
  mhlp      = lregxestp(data,h)       ; local constant estimation

;                       graphics
  mhplot    = createdisplay(1,1)
  mh        = setmask(mh,"surface","blue")
  show(mhplot,1,1,data,mh)              ; surface plot
  setgopt(mhplot,1,1,"title",
                           "Nadaraya-Watson estimate -- ROTATE!")
  mhlpplot  = createdisplay(1,1)
  mhlp      = setmask(mhlp,"surface","red")
  show(mhlpplot,1,1,data,mhlp)          ; surface plot
  setgopt(mhlpplot,1,1,"title",
                           "Local linear estimate -- ROTATE!")

27591 XAGflts08.xpl

Figures 16.9 and 16.10 show three-dimensional plots of the observations and the estimated regression function. In Figure 16.9 one can clearly see the problem of boundary effects, i.e. in regions where are no or only few data points the estimated function values may easily become erratic if the bandwidth is too small. Therefore, a selected bandwidth may be appropriate for regions with plenty of observations while inappropriate elsewhere. As can be seen from Figure 16.10, this boundary problem turns out to be worse for the local linear estimator where one observes a large outlier for one grid point. Such terrible estimates happen if the inversion in (16.20) is imprecise due to a too small bandwidth. One then has to increase the bandwidth. Try the quantlet 27596 XAGflts08.xpl with replacing in the crude bandwidth choice the factor 0.2 by 2. Note that increasing the bandwidth makes the estimated regression surfaces of the two estimators look flat and closer to linearity, respectively. This, however, can increase the estimation bias. Therefore, an appropriate bandwidth choice is important. It will be discussed in the next section.

Figure 16.9: Observations and Nadaraya-Watson estimate of NAR(2) regression function for the lynx data
\includegraphics[scale=.55]{lynx_nw_2}

Figure 16.10: Observations and local linear estimate of NAR(2) regression function for the lynx data
\includegraphics[scale=.55]{lynx_loclin_2}


16.2.2 Bandwidth and Lag Selection


{Bhat, Bhatr, hB, Chat, sumwc, hC, hA} = 28235 hoptest (xsj, yorig, xtj, estimator, kernel, ntotal, sigy2, perB, lagmax, robden)
quantlet to compute plug-in bandwidth for multivariate regression or nonlinear autoregressive processes of higher order.
{crmin, crpro} = 28238 cafpe (y, truedat, xdataln, xdatadif, xdatastand, lagmax, searchmethod, dmax)
quantlet for local linear lag selection for the conditional mean function based on the Asymptotic Final Prediction Error ($ AFPE_2$) or its corrected versions ($ CAFPE_2$) using default settings.
{crmin, crpro, crstore, crstoreadd, hstore, hstoretest} = 28241 cafpefull (y, truedat, xresid, trueres, xdataln, xdatadif, xdatastand, lagmax, volat, searchmethod, dmax, selcrit, robden, perA, perB, startval, noutputf, outpath)
quantlet for local linear lag selection for the conditional mean or volatility function based on the asymptotic final prediction error (AFPE$ _2$) or its corrected version (CAFPE$ _2$).
{mA, gsqA, denA, err} = 28244 fvllc (Xsj, Yorig, h, Xtj, kernreg, lorq, fandg, loo)
can estimate the multivariate regression function, first or second direct derivatives using local linear or partial local quadratic regression with Gaussian kernel.
The example of the previous section showed that the bandwidth choice is very important for higher order autoregressive models. Equally important is the selection of the relevant lags. Both will be discussed in this section. The presented procedures are based on Tschernig and Yang (2000).We start with the problem of selecting the relevant lags. For this step it is necessary to a priori specify a set of possible lag vectors by choosing the maximal lag $ M$. Denote the full lag vector containing all lags up to $ M$ by $ {
X}_{t,M}=(Y_{t-1},Y_{t-2},\ldots,Y_{t-M})^T$. The lag selection task is now to eliminate from the full lag vector $ {X}_{t,M}$ all lags that are redundant. Let us first state the assumptions that Tschernig and Yang (2000) require:
(A1)
For some $ M\geq i_m$ the vector process $ {X}_{t,M}$ is strictly stationary and $ \beta$-mixing with $ \beta(T) \leq k_0
T^{-(2+\delta)/\delta}$ for some $ \delta > 0$, $ k_0>0$.
(A2)
The stationary distribution of the process $ {X}_{t,M}$ has a continuous density $ \mu_M({x}_M)$, $ {x}_M
\in
\hbox{\it I\hskip -2pt R}^M$. Note that $ \mu(\cdot)$ is used for denoting $ \mu_M(\cdot)$ and all of its marginal densities.
(A3)
The function $ f(\cdot)$ is twice continuously differentiable while $ \sigma(\cdot)$ is continuous and positive on the support of $ \mu(\cdot)$.
(A4)
The errors $ \{\xi_t\}_{t\geq i_m}$ have a finite fourth moment $ m_4$.
(A5)
The support of the weight function $ w(\cdot)$ is compact with nonempty interior. The function $ w(\cdot)$ is continuous, nonnegative and $ \mu({x}_M) > 0$ for $ {x}_M$ in the support of $ w(\cdot)$.
(A6)
The kernel function $ K: \hbox{\it I\hskip -2pt R}\rightarrow \hbox{\it I\hskip -2pt R}$ is a symmetric probability density and the bandwidth $ h$ is a positive number with $ h\rightarrow 0$, $ nh^m\rightarrow \infty$ as $ n\rightarrow \infty$.
For the definition of $ \beta$-mixing see Section 16.1.1 or Doukhan (1994). Conditions (A1) and (A2) can be checked using e.g. Doukhan (1994, Theorem 7 and Remark 7, pp. 102, 103). Further conditions can be found in Lu (1998).

For comparing the quality of competing lag specifications, one needs an appropriate measure of fit, as for example the final prediction error (FPE)

$\displaystyle FPE_a(h,i_1,\ldots,i_m) = E\left[\left(\breve{Y}_t - \widehat{f}_a(\breve{{X}}_t,h)\right)^2w(\breve{{ X}}_{t,M})\right], \quad a=1,2.$ (16.24)

In the definition of the $ FPE(\cdot)$ the process $ \{\breve{Y}_t\}$ is assumed to be independent of the process $ \{Y_t\}$ but to have the same stochastic properties. If we now indicate the vector of lagged values of the data generating process by the superscript $ ^{\ast}$ and assume its largest lag is smaller than the chosen $ M$, we can easily relate the definition of the FPE (16.24) to the MISE

$\displaystyle d_{a,M}(h,i_1,\ldots,i_m) = E\left[ \int \left\{f({x}^*) - \widehat{f}_a({x})\right\}^2 w({x}_M) \mu({x}_M) d{ x}_M\right],$ (16.25)

which here extends (16.11) to functions with several lags. First note that
$\displaystyle FPE_a(h,i_1,\ldots,i_m)$ $\displaystyle =$ $\displaystyle E\left\{ E\left[
\left(\breve{Y}_t - \widehat{f}_a(\breve{{X}}_t,h)\right)^2w(\breve{{
X}}_{t,M})\vert Y_1,\ldots, Y_T \right]\right\}$  
  $\displaystyle =$ $\displaystyle E\left\{ \int \left(y-\widehat{f}_a({x})\right)^2
w({x}_M) \mu(y,{x}_M)dy d{x}_M \right\}.$  

Using $ \left\{y- \widehat{f}({
x})\right\}^2=\left\{y-f({x})^*+ f({
x})^*-\widehat{f}({x})\right\}^2$ one obtains the decomposition

$\displaystyle FPE_a(h,i_1,\ldots,i_m) = A + d_{a,M}(h,i_1,\ldots,i_m),$ (16.26)

where
$\displaystyle A$ $\displaystyle =$ $\displaystyle \int
\sigma^2({x}^*)w({x}_M)\mu({x}_M)d{x}_M$ (16.27)

denotes the mean variance or final prediction error for the true function $ f({x}^*)$. Therefore, it follows from (16.26) that the FPE measures the sum of the mean variance and the MISE.

In the literature mainly two approaches were suggested for estimating the unknown $ FPE_a(\cdot)$ or variants thereof, namely cross-validation (Vieu; 1994), (Yao and Tong; 1994) or estimation of an asymptotic expression of the $ FPE_a(\cdot)$ (Auestad and Tjøstheim; 1990), (Tjøstheim and Auestad; 1994), (Tschernig and Yang; 2000). Given Assumptions (A1) to (A6), Tschernig and Yang (2000, Theorem 2.1) showed that for the local constant estimator, $ a=1$, and the local linear estimator, $ a=2$, one has $ FPE_a(h,i_1,\ldots,i_m)
= AFPE_a(h,i_1,\ldots,i_m) + o\{h^4+(T-i_m)^{-1}h^{-m}\}$ where

$\displaystyle AFPE_a(h,i_1,\ldots,i_m) = A + b(h)B + c(h)C_a$ (16.28)

denotes the asymptotic final prediction error. The terms $ b(h)B$ and $ c(h)C$ denote the expected variance and squared bias of the estimator, respectively, with the constants
$\displaystyle B$ $\displaystyle =$ $\displaystyle \int
\sigma^2({x}^*)w({x}_M)\mu({x}_M)/\mu({x})d{x}_M,$ (16.29)
$\displaystyle C_a$ $\displaystyle =$ $\displaystyle \int r_a({x})^2 w({x}_M)\mu({
x}_M)d{x}_M$ (16.30)

and the variable terms

$\displaystyle b(h) = \vert\vert K\vert\vert _2^{2m} (T-i_m)h^{-m}, \quad c(h) = \sigma_K^4h^4/4$ (16.31)

with $ \vert\vert K\vert\vert _2^2=\int K(u)^2du$ and $ \sigma_K^2=\int K(u)u^2du$. The sum of the expected variance and squared bias of the estimator just represents the asymptotic mean squared error. Note that if the vector of correct lags $ {X}_t^*$ is included in $ {X}_t$, then $ AFPE_a(h,\cdot)$ tends to $ A$ as both $ b(h)B$ and $ c(h)C_a$ tend to zero.

From (16.28) it is possible to determine the asymptotically optimal bandwidth $ h_{opt}$ by minimizing the asymptotic MISE, i.e. solving the variance-bias tradeoff between $ b(h)B$ and $ c(h)C$. The asymptotically optimal bandwidth is given by

$\displaystyle h_{a,opt} = \left\{m\vert\vert K\vert\vert _2^{2m}B(T-i_m)^{-1}C_a^{-1}\sigma_K^{-4}\right\}^{1/(m+4)}.$ (16.32)

Note that for a finite asymptotically optimal bandwidth to exist one has to assume that
(A7)
$ C_a$ defined in (16.30) is positive and finite.
This requirement implies that in case of local linear estimation there does not exist a finite $ h_{2,opt}$ for linear processes. This is because there does not exist an approximation bias and thus a larger bandwidth has no cost.

In order to obtain the plug-in bandwidth $ \widehat{h}_{a,opt}$ one has to estimate the unknown constants $ B$ and $ C_a$. A local linear estimate of $ B$ (16.29) is obtained from

$\displaystyle \widehat{B}_2({h}_{B})=T^{-1}\sum_{t=1}^{T}\left\{ Y_{t}-
\widehat{f}_2({X}_{t},{h}_{B})\right\} ^{2}
w({X}_{t,M})/\widehat{\mu }({X}_{t},{h}_{B}),$

where $ \widehat{\mu }(\cdot )$ is the Gaussian kernel estimator (16.40) of the density $ \mu({x}) $. For estimating $ h_B$ one may use Silverman's (1986) rule-of-thumb bandwidth

$\displaystyle \widehat{h}_{B}= \widehat{\sigma} \left(\frac{4}{T+2}\right)^{1/(m+4)}T^{-1/(m+4)}$ (16.33)

with $ \widehat{\sigma }=\left( \prod_{j=1}^{m}\sqrt{Var({X}_{j})}
\right) ^{1/m}$ denoting the geometric mean of the standard deviation of the regressors.

For the local linear estimator (16.21), $ C_2$ (16.30) can be consistently estimated by

$\displaystyle \widehat {C}_2({h}_C)= \frac{1}{T}\sum_{t=i_m+1}^T \left[\sum_{j=1}^m \widehat{f}^{(jj)}({X}_t,{h}_C) \right]^2 w({X}_{t,M}),$ (16.34)

where $ f^{(jj)}(\cdot)$ denotes the second direct derivative of the function $ f(\cdot)$. It can be estimated using the partial local quadratic estimator

$\displaystyle \{\widehat{c}_{0},\widehat{c}_{11},\ldots,\widehat{c}_{1m},\wideh...
...min}_{\left\{ c_{0},{c}_{11},\ldots,{c}_{1m},{c}_{21},\ldots,{c}_{2m}\right\} }$ (16.35)


  $\displaystyle \sum_{t=i_m+1}^{T}\left\{ Y_{t}-c_{0}-c_{11}(X_{t1}-x_1)- \cdots -
c_{1m}(X_{tm}-x_m) \right.$    
    $\displaystyle \left. - c_{21}(X_{t1}-x_1)^2 - \cdots - \cdots - c_{2m}(X_{tm}-x_m)^2 \right\} ^{2}K_{h}({X}_{t}-{x}).$  

The estimates of the direct second derivatives are then given by $ \widehat{f}^{(jj)}({x},h)=2\widehat c_{2j}$, $ j=1,\ldots,m$. Excluding all cross terms has no asymptotic effects while keeping the increase in the `parameters' $ c_{0}, c_{1j},
c_{2j}$, $ j=1,\ldots,m$ linear in the number of lags $ m$. This approach is a simplification of the partial cubic estimator proposed by Yang and Tschernig (1999) who also showed that the rule-of-thumb bandwidth

$\displaystyle \widehat{h}_C=2\widehat{\sigma}\left(\frac{4}{T+4}\right)^{1/(m+6)}T^{-1/(m+6)}$ (16.36)

has the optimal rate. We note that for the estimation of $ C_1$ of the Nadaraya-Watson estimator one has additionally to estimate the derivative of the density as it occurs in (16.23). Therefore, we exclusively use the local linear estimator (16.21). The direct second derivatives $ f^{(jj})({x})$ can be estimated with the quantlet 28251 tp/capfe/fvllc .

The plug-in bandwidth $ \widehat{h}_{2,opt}$ is then given by

$\displaystyle \widehat{h}_{2,opt} = \left\{m\vert\vert K\vert\vert _2^{2m}\wide...
...) (T-i_m)^{-1}\widehat{C}_2(\widehat{h}_C)^{-1}\sigma_K^{-4}\right\}^{1/(m+4)}.$ (16.37)

It now turns out that when taking into account the estimation bias of $ A$, the local linear estimator of $ AFPE_2(h,\cdot)$ (16.28) becomes

$\displaystyle AFPE_2=\widehat{A}_2(h_{2,opt})+2K(0)^{m}(T-i_m)^{-1}h_{2,opt}^{-m} \widehat{B}_2(h_{B})$ (16.38)

and the expected squared bias of estimation drops out. In practice, $ h_{2,opt}$ is replaced by the plug-in bandwidth (16.37). Note that one can interpret the second term in (16.38) as a penalty term to punish overfitting or choosing superfluous lags. This penalty term decreases with sample size as $ h_{2,opt}$ is of order $ T^{-1/(m+4)}$. The final prediction error for the true function $ A$ (16.27) is estimated by taking the sample average

$\displaystyle \widehat{A}_2(h)=T^{-1}\sum_{t=1}^{T}\left\{ y_{t}-\widehat{f}_2({X}_{t},h)\right\} ^{2}w({X}_{t,M})$

of the residuals from the local linear estimator $ \widehat{f}_2({X}_t,h)$. The asymptotic properties of the lag selection method rely on the fact that the argument of $ w(\cdot)$ is the full lag vector $ {X}_{t,M}$.

In order to select the adequate lag vector, one computes (16.38) for all possible lag combinations with $ m\leq
M$ and chooses the lag vector with the smallest $ AFPE_2$. Given Assumptions (A1) to (A7) and a further technical condition, Tschernig and Yang (2000, Theorem 3.2) showed that this procedure is weakly consistent, i.e. the probability of choosing the correct lag vector if it is included in the set of lags considered approaches one with increasing sample size. This consistency result may look surprising since the linear FPE is known to be inconsistent. However, in the present case the rate of the penalty term in (16.38) depends on the number of lags $ m$. Thus, if one includes $ l$ lags in addition to $ m^*$ correct ones, the rate of the penalty term becomes slower which implies that too large models are ruled out asymptotically. Note that this feature is intrinsic to the local estimation approach since the number of lags influence the rate of convergence, see (16.22). We remark that the consistency result breaks down if Assumption (A7) is violated e.g. if the stochastic process is linear. In this case overfitting (including superfluous lags in addition to the correct ones) is more likely. The breakdown of consistency can be avoided if one uses the Nadaraya-Watson instead of the local linear estimator since the former is also biased in case of linear processes.

Furthermore, Tschernig and Yang (2000) show that asymptotically it is more likely to overfit than to underfit (miss some correct lags). In order to reduce overfitting and therefore increase correct fitting, they suggest to correct the AFPE and estimate the Corrected Asymptotic FPE

$\displaystyle CAFPE_a=AFPE_a\left\{ 1+m(T-i_m)^{-4/(m+4)}\right\}, \quad a=1,2.$ (16.39)

The correction does not affect consistency under the stated assumptions while additional lags are punished more heavily in finite samples. One chooses the lag vector with the smallest $ CAFPE_a$, $ a=1,2$.

We note that if one allows the maximal lag $ M$ to grow with sample size, then one has a doubled nonparametric problem of nonparametric function estimation and nonparametric lag selection.

The nonparametric lag selection criterion $ CAFPE_2$ can be computed using the quantlet 28254 tp/cafpe/cafpe . The quantlet 28257 tp/cafpe/cafpefull also allows to use $ AFPE_a$. Both are part of the third party quantlib tp/cafpe/cafpe which contains various quantlets for lag and bandwidth selection for nonlinear autoregressive models (16.16). The quantlet 28262 tp/cafpe/cafpe is called as

  {crmin, crpro} = cafpe(y, truedat, xdataln, xdatadif,
                         xdatastand, lagmax, searchmethod, dmax)
with the input variables:
y
$ T \times 1$ matrix of the observed time series or set to zero if truedat is used,
truedat
character variable that contains path and name of ascii data file if y=0,
xdataln
character variable where "yes" takes natural logs, "no" doesn't,
xdatadif
character variable where the value "yes" takes first differences of data, "no" doesn't,
xdatastand
character variable where "yes" standardizes data, "no" doesn't,
lagmax
scalar variable, largest lag to be considered,
searchmethod
character variable where "full" considers all possible lag combinations,
"directed" does directed search (recommended if lagmax $ >
10$),
dmax
scalar variable with maximum number of possible lags,
and output variables
crmin
(dmax+1)$ \times 1$ vector that stores for all considered lag combinations in the first dmax columns the selected lag vector, in the dmax+1 column the estimated $ CAFPE_2$, in the dmax+2 column $ \widehat A$, in the dmax+3 column the bias corrected estimate of $ A$, see TY (equation 3.3),
crpro
(dmax+1)$ \times$(dmax+6) matrix that stores for each number of lags $ (0, 1, \dots $ $ , {\tt dmax} )$ in the first dmax colunms the selected lag vector, in the dmax+1 column the plug-in bandwidth $ \widehat{h}_{2,opt}$ for estimating the final prediction error for the true function $ A$ and $ CAFPE_2$, in the dmax+2 column the bandwidth $ \widehat{h}_B$ for estimating the constant $ B$ which is used for computing $ CAFPE_2$ and the plug-in bandwidth $ \widehat{h}_{2,opt}$, in the dmax+3 column the bandwidth $ \widehat{h}_C$ for estimating the constant $ C$ which is used for computing the plug-in bandwidth $ \widehat{h}_{2,opt}$, in the dmax+4 column the estimated $ CAFPE_2$, in the dmax+5 column $ \widehat A$, in the dmax+6 column the bias corrected estimate of $ A$, see TY (equation 3.3).
Some comments may be appropriate. The weight function $ w(\cdot)$ is the indicator function on the range of the observed data. If $ M$ is large or the time series is long, then conducting a full search over all possible lag combinations may take extraordinarily long. In this case, one should use the directed search suggested by Tjøstheim and Auestad (1994): lags are added as long as they reduce the selection criterion and one adds that lag from the remaining ones which delivers the largest reduction.

For computing $ CAFPE_2$ TY follow Tjøstheim and Auestad (1994) and implement two additional features for robustification. For estimating $ {\mu}({x},h)$ the kernel estimator

$\displaystyle \widehat{\mu }({x},h)=(T-i_{m}+i_{1})^{-1} \sum_{i=i_{m}+1}^{T+i_{1}}K_{h}({X}_{i}-{x})$ (16.40)

is used where the vectors $ {X}_{i}$, $ i=T+1,\ldots ,T+i_{1}$ are all available from the observations $ Y_{t}$, $ t=1,\ldots ,T$. For example, $ {X}_{T+i_{1}}$ is given by $ (Y_{T},\ldots
,Y_{T+i_{1}-i_{m}})^{T}$. This robustification is switched off if the sum stops at $ T$. Furthermore, 5% of those observations whose density values $ \widehat{\mu }(\cdot )$ are the lowest, are screened off. These features can be easily switched off or modified in the quantlet 28265 tp/cafpe/cafpefull . This quantlet also allows to select the lags of the conditional standard deviation $ \sigma(\cdot)$ and is therefore discussed in detail in Subsection 16.2.4.

If one is only interested in computing the plug-in bandwidth $ \widehat
h_{2,opt}$, then one can directly use the quantlet 28268 tp/cafpe/hoptest . However, before it can be called it requires to prepare the time series accordingly so that it is easier to run the lag selection which automatically delivers the plug-in bandwidth for the chosen lag vector as well. For the definition of its variables the reader is referred to the helpfile of 28271 tp/cafpe/hoptest .

We are now ready to run the quantlet 28274 tp/cafpe/cafpe on the lynx data set. The following quantlet conducts a full search among the first six lags

  pathcafpe       = "tp/cafpe/" ; path of CAFPE quantlets
;       load required quantlibs
  library("xplore")
  library("times")
  func(pathcafpe + "cafpeload") ; load  XploRe files of CAFPE
  cafpeload(pathcafpe)

  setenv("outheadline","")   ; no header for each output file
  setenv("outlineno","")        ; no numbering of output lines
;       set parameters
  truedat         = "lynx.dat"  ; name of data file
  y               = 0
  xdataln         = "yes";      ; take logarithms
  xdatadif        = "no";       ; don't take first differences
  xdatastand      = "no";       ; don't standardize data
  lagmax          = 6        ; the largest lag considered is 6
  searchmethod    = "full"   ; consider all possible lag comb.
  dmax            = 6           ; consider at most 6 lags
;       conduct lag selection
  { crmin,crpro } = cafpe(y,truedat,xdataln,xdatadif,xdatastand,
                         lagmax,searchmethod,dmax)
  "selected lag vector,               estimated CAFPE "
  crmin[,1:dmax+1]
  "number of lags, chosen lag vector, estimated CAFPE,
                                             plug-in bandwidth"
  (0:dmax)~crpro[,1:dmax|(dmax+4)|(dmax+1)]
28278 XAGflts09.xpl

A screenshot of the output which shows the criteria for all other number of lags is contained in Figure 16.11. The selected lags are 1 to 4 with plug-in bandwidth $ \widehat{h}_{2,opt}=0.90975$ and $ CAFPE_2=0.2163$. However, the largest decrease in $ CAFPE_2$ occurs if one allows for two lags instead of one and lag 2 is added. In this case, $ CAFPE_2$ drops from 0.64125 to 0.24936. Therefore lag 2 seems to capture the autocorrelation in the residuals of the NAR(1) model which was estimated in Subsections 16.1.1 to 16.1.3. For this reason a NAR(2) model could be sufficient for the lynx data. Its graphical representation is discussed in the next section.

Figure 16.11: Results of the lag selection procedure using $ CAFPE_2$ for lynx data
\includegraphics[scale=0.6]{cafpelynx}


16.2.3 Plotting and Diagnostics


{hplugin, hB, hC, xs, resid} = 28811 plotloclin (xdata, xresid, xdataln, xdatadif, xdatastand, volat, lags, h, xsconst, gridnum, gridmax, gridmin)
computes 1- or 2-dimensional plot of regression function of a nonlinear autoregressive process for a given lag vector on the range of the data; if more than 2 lags are used, then only two lags are allowed to vary, the others have to be fixed
Once the relevant lags and an appropriate bandwidth are determined, one would like to have a closer look at the implied conditional mean function as well as checking the residuals for potential model misspecification as discussed in Subsection 16.1.3. The latter may be done by inspecting the autocorrelation function and testing the normality of the residuals. The quantlet 28814 tp/cafpe/plotloclin of the quantlib tp/cafpe/cafpe allows to do both. It generates two- or three-dimensional plots of the autoregression function on a grid that covers the range of data and computes the residuals for the given time series. Both is done either with a bandwidth specified by the user or the plug-in bandwidth $ \widehat{h}_{2,opt}$ which is automatically computed if required. The quantlet 28819 tp/capfe/plotloclin also allows to compute three-dimensional plots of functions with more than two lags by keeping $ m-2$ lags fixed at user-selected values. It is called by
  {hplugin,hB,hC,xs,resid} = plotloclin(xdata,xresid,xdataln,
                       xdatadif,xdatastand,volat,lags,h,xsconst,
                       gridnum,gridmax,gridmin)
with the input variables
xdata
$ T \times 1$ vector of the observed time series
xresid
$ T' \times 1$ vector of residuals or observations for plotting conditional volatility function, if not needed set xresid = 0,
xdataln
character variable, "yes" takes natural logs, "no" doesn't,
xdatadif
character variable, "yes" takes first differences of data, "no" doesn't,
xdatastand
character variable, "yes" standardizes data, "no" doesn't,
volat
character variable, "no" plots conditional mean function, "resid" plots conditional volatility function, the residuals of fitting a conditional mean function have to be contained in xresid,
lags
$ m \times 1$ vector of lags,
h
scalar bandwidth for which if set to zero a scalar plug-in bandwidth using hoptest is computed or a $ m \times 1$ vector bandwidth
xsconst
$ m \times 1$ vector (only needed if $ m>2$) indicates which lags vary and which are kept fixed for those keeping fixed, the entry in the correponding row contains the value at which it is fixed for those to be varied, the entry in the corresponding row is 1e-100,
gridnum
scalar, number of grid points in one direction,
gridmax
scalar, maximum of grid,
gridmin
scalar, minimum of grid,

and output variables
hplugin
scalar plug-in bandwidth $ \widehat{h}_{2,opt}$ (16.37) or chosen scalar or vector bandwidth,
hB
scalar, rule-of-thumb bandwidth (16.33) for nonparametrically estimating the constant $ B$ in $ CAFPE_2$ and for computing the plug-in bandwidth,
hC
scalar, rule-of-thumb bandwidth (16.36) for nonparametrically estimating the constant $ C$ for computing the plug-in bandwidth,
xs
$ T' \times m$ matrix with lagged values of time series which are used to compute plug-in bandwidth and residuals for potential diagnostics,
resid
$ T' \times 1$ vector with residuals after fitting a local linear regression at xs.

Figure 16.12 shows the plot of the conditional mean function for an NAR(2) model of the lynx data on a grid covering all observations. The autocorrelation function of the residuals is shown in Figure 16.13. These graphs and a plot of the standardized residuals are computed with the following quantlet. It also returns the Jarcque-Bera test statistic of 2.31 with $ p$-value of 0.32.

  pathcafpe   = "tp/cafpe/"   ; path of CAFPE quantlets
;       load required quantlibs
  library("xplore")
  library("times")
  func("jarber")
  func(pathcafpe + "cafpeload"); load XploRe files of CAFPE
  cafpeload(pathcafpe)

  setenv("outheadline","")    ; no header for each output file
  setenv("outlineno","")      ; no numbering of output lines
;       set parameters
  lynx        = read("lynx.dat");
  xresid      = 0
  xdataln     = "yes";        ; take logarithms
  xdatadif    = "no";         ; don't take first differences
  xdatastand  = "no";         ; don't standardize data
  lags        = 1|2     ; lag vector for regression function
  h           = 0
  xsconst     = 1e-100|1e-100 ; 1e-100 for the lags which are
                              ; varied for those kept fixed it
                              ; includes the chosen constant
  gridnum     = 30            ; number of gridpoints in one dir.
  gridmax     = 9             ; maximum of grid
  gridmin     = 4             ; minimum of grid
; compute opt. bandwidth and plot regression fct. for given lags
  { hplugin,hB,hC,xs,resid } = plotloclin(lynx,xresid,xdataln,
                               xdatadif,xdatastand,volat,lags,h,
                               xsconst,gridnum,gridmax,gridmin)
  "plug-in bandwidth" hplugin
;       diagnostics
  acfplot(resid) ; compute and plot acf of residuals
  {jb,probjb,sk,k} = jarber(resid,1)
         ; compute Jarque-Bera test for normality of residuals

28823 XAGflts10.xpl

From inspecting Figure 16.13 one can conclude that a NAR(2) model captures most of the linear correlation structure. However, the autocorrelation at lags 3 and 4 is close to the boundaries of the confidence intervals of white noise and explains why the CAFPE procedure suggests lags one to four. The regression surface in Figure 16.12 nicely shows the nonlinearity in the conditional mean function which may be difficult to capture with standard parametric nonlinear models.

Figure 16.12: Plot of the conditional mean function of a NAR(2) model for the logged lynx data
\includegraphics[scale=.55]{plotloclinlynx}

Figure 16.13: Plot of the autocorrelation function of the residuals of a NAR(2) model for the logged lynx data
\includegraphics[scale=.55]{plotloclinlynxacf}


16.2.4 Estimation of the Conditional Volatility

So far we have considered the estimation and lag selection for the conditional mean function $ f({x})$. Finally, we turn our attention to modelling the function of the conditional standard deviation $ \sigma({x})$. The conditional standard deviation plays an important role in financial modelling, e.g. for computing option prices. As an example we consider 300 logged observations dmus58-300 of a 20 minutes spaced sample of the Deutschemark/US-Dollar exchange rate. Figures 16.14 and 16.15 display the logged observations and its first differences. The figures are generated with the quantlet

  library("plot")
  library("times")
  setsize(640,480)
  fx        = read("dmus58-300.dat"); read data
  d1        = createdisplay(1,1)
  x1        = #(1:300)~fx
  setmaskl (x1, (1:rows(x1))', 0, 1)
  show(d1,1,1,x1)           ; plot data
  setgopt(d1,1,1,"title",
                    "20 min. spaced sample of DM/US-Dollar rate")
  setgopt(d1,1,1,"xlabel","Periods","ylabel","levels")

  d2        = createdisplay(1,1)
  x2        = #(2:300)~tdiff(fx)
  setmaskl (x2, (1:rows(x2))', 0, 1)
  show(d2,1,1,x2)           ; plot data
  setgopt(d2,1,1,"title","20 min. spaced sample of 
                          DM/US-Dollar rate - first differences")
  setgopt(d2,1,1,"xlabel","Periods","ylabel","first differences")

29289 XAGflts11.xpl

Figure 16.14: Time series of logarithm of 20 minutes spaced sample of DM/US-Dollar rate
\includegraphics[scale=.55]{plotdmus}

Figure 16.15: Time series of 20 minutes spaced sample of exchange rate returns
\includegraphics[scale=.55]{plotdmusdif}

In the following we assume that the conditional mean function $ f(\cdot)$ is known and subtracted from $ Y_t$. Thus, we obtain $ \tilde
Y_t = Y_t
- f({X}_t)$. After squaring (16.16) and rearranging we have

$\displaystyle \tilde Y_t^2 = \sigma^2({X}_t)+\sigma^2({X}_t)(\xi_t^2 -1).$ (16.41)

Since $ \sigma^2({X}_t)(\xi_t^2 -1)$ has expectation zero, the stochastic process (16.41) can be modelled with the methods described in Subsections 16.2.1 and 16.2.2 by simply replacing the dependent variable $ Y_t$ by its squares. However, we have to remark that the existence of the expectation $ E\left[\left(\tilde Y_t^2-\sigma^2({
X}_t)\right)^2\right]$ is a necessary condition for applying $ CAFPE_2$. Otherwise, the FPE cannot be finite. We note that if $ f({x})$ has to be estimated, the asymptotic properties of $ CAFPE_2$ are expected to remain the same. Therefore, it may be used in practice, however, after replacing $ \tilde Y_t$ by the residuals $ Y_t-\widehat{f}_2({X}_t)$. This is possible with the quantlet 29296 tp/capfe/cafpefull which extends the functionality of the quantlet 29299 tp/cafpe/cafpe and allows the user to change additional tuning parameters. The quantlet 29302 tp/cafpe/cafpefull is called by
{crmin,crpro,crstore,crstoreadd,hstore,hstoretest} =
cafpefull(y,truedat,xresid,trueres,xdataln,xdatadif,xdatastand,
          lagmax,volat,searchmethod,dmax,selcrit,robden,perA,
          perB,startval,noutputf,outpath)
and has input variables
y
$ T \times 1$ vector of univariate time series,
truedat
character variable that contains path and name of ascii data file if y=0,
xresid
$ T' \times 1$ vector of residuals or observations for selecting lags of conditional volatility function, if not needed set xresid = 0,
trueres
character variable, "yes" takes natural logs, "no" doesn't,
xdatadif
character variable, "yes" takes first differences of data, "no" doesn't,
xdatastand
character variable, "yes" standardizes data, "no" doesn't,
lagmax
scalar, largest lag to be considered,
volat
character variable, "no" conducts lag selection for conditional mean function, "resid" conducts lag selection for conditional volatility function, the residuals of fitting a conditional mean function have to be contained in xresid or a file name has to be given in trueres,
searchmethod
character variable for determining search method, "full" conducts full search over all possible input variable combinations, "directed" does directed search,
dmax
scalar, maximal number of lags
selcrit
character variable to select lag selection critierion, "lqafpe" estimates the asymptotic Final Prediction Error $ AFPE_2$ (16.38) using local linear estimation and the plug-in bandwidth $ \widehat{h}_{2,opt}$ (16.37), "lqcafpe" estimates the corrected asymptotic Final Prediction Error $ CAFPE_2$ (16.39) using local linear estimation and the plug-in bandwidth $ \widehat{h}_{2,opt}$ (16.37)
robden
character variable, "yes" and "no" switch on and off robustification in density estimation (16.40),
perA
scalar, parameter used for screening off a fraction of 0 $ \leq$ perA $ \leq$ 1 observations with the lowest density in computing $ \widehat{A}_2$
perB
scalar, parameter like perA but for screening off a fraction of perB observations with lowest density in computing $ \widehat{B}_2$,
startval
character variable to control treatment of starting values, "different" uses for each lag vector as few starting values as necessary, "same" uses for each lag vector the same starting value which is determined by the largest lag used in the lag selection quantlet 29305 tp/cafpe/xorigxe ,
noutputf
character variable, name of output file,
outpath
character variable, path for output file.
The output variables are
crmin
vector that stores for all considered lag combinations in the first dmax rows the selected lag vector, in the dmax+1 row the estimated criterion, in the dmax+2 row $ \widehat{A}_2$, in the dmax+3 row the bias corrected estimate of $ A$,
crpro
matrix that stores for each number of lags in the first dmax rows the selected lag vector, in the dmax+1 row the plug-in bandwidth $ \widehat{h}_{2,opt}$ for estimating $ A$ and $ (C)AFPE$, in the dmax+2 row the bandwidth $ \widehat{h}_B$ used for estimating $ B$, in the dmax+3 row the bandwidth $ \widehat{h}_C$ for estimating $ C$, in the dmax+4 row the estimated criterion $ AFPE_2$ or $ CAFPE_2$, in the dmax+5 row $ \widehat{A}_2$, in the dmax+6 row the bias corrected estimate of $ A$,
crstore
matrix that stores lag vector and criterion value for all lag combinations and bandwidth values considered, in the first dmax rows all considered lag vector are stored, in the dmax+1 rows the estimated criterion for each lag vector is stored,
crstoreadd
matrix that stores those criteria that are evaluated in passing for all lag combinations where all values for one lag combination are stored in one column (see program for details),
hstore
row vector that stores the bandwidths used in computing (C)AFPE for each lag vector
hstoretest
matrix that stores for each lag vector in one column the plug-in bandwidth $ \widehat{h}_{2,opt}$, $ \widehat{h}_B$ and $ \widehat{h}_C$.

The quantlet 29308 XAGflts12.xpl (for brevity not shown) conducts a lag selection for the conditional mean function $ f({x})$ and finds lag 1 and 3 with bandwidth $ \widehat{h}_{2,opt}=0.000432$. If you run the quantlet, you will obtain the XploRe warning ``quantlet fvllc: inversion in local linear estimator did not work because probably the bandwidth is too small''. This means that for one of the checked combinations of lags, one of the rule-of-thumb bandwidths or the plug-in bandwidth was too small so that the matrix $ Z_2^TWZ_2$ in the local linear estimator (16.21) is near singular and the matrix inversion failed. In this case, the relevant bandwidth is doubled (at most 30 times) until the near singularity disappears. Therefore, lag selection for the conditional volatility function $ \sigma({x})$ is done with replacing the observations $ Y_t$ in model (16.41) by the estimated residuals $ Y_t -
\widehat{f}({X}_t)$. The computations are carried out with the following quantlet which also generates a plot of the conditional mean function on the range $ [-0.0015,0.0015]$ displayed in Figure 16.16 and plots the autocorrelation function of the residuals (not shown). The latter plot does not show significant autocorrelation.

  pathcafpe     = "tp/cafpe/"   ; path of CAFPE quantlets

;   load required quantlibs
  library("xplore")
  library("times")
  func("jarber")
  func(pathcafpe + "cafpeload") ;load XploRe files of CAFPE
  cafpeload(pathcafpe)

;   set output format
  setenv("outheadline","")  ; no header for each output file
  setenv("outlineno","")    ; no numbering of output lines

;   load data
  x             = read("dmus58-300.dat")  ; name of data file
  y             = tdiff(x)  ; compute first differences
  xresid        = 0
  truedat       = ""        ; name of potential data file
  trueres       = ""        ; name of potential residuals file
  xdataln       = "no"      ; don't take logarithms
  xdatadif      = "no"      ; don't take first differences
  xdatastand    = "no"      ; don't standardize data
  lagmax        = 6         ; the largest lag considered is 6
  searchmethod  = "full"    ; consider all possible lag comb.
  dmax          = 6         ; consider at most 6 lags
  volat         = "no"      ; plot cond. mean function
  selcrit       = "lqcafpe" ; use CAFPE with plug-in bandwidth
  robden        = "yes"     ; robustify density estimation
  perA          = 0
  perB          = 0.05      ; screen off data with lowest density
  startval      = "different"
  noutputf      = ""        ; name of output file
  outpath       = "test"    ; path for output file

  lags          = 1|3       ; lag vector for regression function
  h             = 0
  xsconst       = 1e-100|1e-100 ; 1e-100 for the lags which are
                            ; varied for those kept fixed it
                            ; includes the chosen constant
  gridnum       = 30     ; number of gridpoints in one direction
  gridmax       = 0.0015    ; maximum of grid
  gridmin       = -0.0015   ; minimum of grid

; compute optimal bandwidth and plot cond. mean for given lags
 { hplugin,hB,hC,xs,resid } = plotloclin(y,xresid,xdataln,
               xdatadif,xdatastand,volat,lags,h,xsconst,gridnum,
                                                gridmax,gridmin)
  "plug-in bandwidth for conditional mean" hplugin

;   diagnostics
  acfplot(resid); compute and plot acf of residuals
  {jb,probjb,sk,k} = jarber(resid,1)
           ; compute Jarque-Bera test for normality of residuals

;   conduct lag selection for cond. standard deviation
  xresid        = resid
  volat         = "resid" ; conduct lat selection for cond. vol.
  {crmin,crpro,crstore,crstoreadd,hstore,hstoretest}
                = cafpefull(y,truedat,xresid,trueres,xdataln,
                            xdatadif,xdatastand,lagmax,volat,
                            searchmethod,dmax,selcrit,robden,
                            perA,perB,startval,noutputf,outpath)
  "Lag selection for cond. standard deviation using residuals"
  "selected lag vector,               estimated CAFPE "
  crmin[,1:dmax+1]
  "number of lags, chosen lag vector,  estimated CAFPE,
                                              plug-in bandwidth"
  (0:dmax)~crpro[,1:dmax|(dmax+4)|(dmax+1)]
29316 XAGflts13.xpl

For the conditional standard deviation one obtains lags 2 and 6 with bandwidth $ \widehat{h}_{2,opt}=0.000456$. Figures 16.17, 16.18 and 16.19 display the plot of the estimated conditional standard deviation $ \widehat{\sigma}_2({x})$, of the standardized residuals of the modified model (16.41) and of their autocorrelation. The plots are generated with the following quantlet

  pathcafpe = "tp/cafpe/"   ; path of CAFPE quantlets

;   load required quantlets
  library("xplore")
  library("times")
  func("jarber")
  func(pathcafpe + "cafpeload"); load XploRe files of CAFPE
  cafpeload(pathcafpe)

  setenv("outheadline","")  ; no header for each output file
  setenv("outlineno","")    ; no numbering of output lines

;   set parameters
  x         = read("dmus58-300.dat");
  y         = tdiff(x)
  xresid    = 0
  xdataln   = "no"      ; don't take logarithms
  xdatadif  = "no"      ; don't take first differences
  xdatastand= "no"      ; don't standardize data
  volat     = "no"      ; compute cond. standard deviation
  lags      = 1|3       ; lag vector for regression function
  h         = 0         ; compute plug-in bandwidths
  xsconst   = 1e-100|1e-100 
                        ; 1e-100 for the lags which are varied
                        ; for those kept fixed it includes the
                        ; chosen constant
  gridnum   = 30        ; number of gridpoints in one direction
  gridmax   = 0.0015    ; maximum of grid
  gridmin   = -0.0015   ; minimum of grid

; compute optimal bandwidth and plot cond. mean for given lags
  { hplugin,hB,hC,xs,resid } = plotloclin(y,xresid,xdataln,
            xdatadif,xdatastand,volat,lags,h,xsconst,gridnum,
                                             gridmax,gridmin)
  "plug-in bandwidth for mean" hplugin

; compute plug-in bandwidth and
; plot cond. standard deviation for given lags
  lags      = 2|6       ; lags for cond. volatility
  xresid    = resid
  volat     = "resid"
  gridmax   = 0.0008     ; maximum of grid
  gridmin   = -0.0008    ; minimum of grid

  { hplugin,hB,hC,xs,resid } = plotloclin(y,xresid,xdataln,
            xdatadif,xdatastand,volat,lags,h,xsconst,gridnum,
                                             gridmax,gridmin)
  "plug-in bandwidth for conditional volatility" hplugin

;   diagnostics
  acfplot(resid); compute and plot acf of residuals
  {jb,probjb,sk,k} = jarber(resid,1)
        ; compute Jarque-Bera test for normality of residuals

29322 XAGflts14.xpl

The surface plot of the conditional standard deviation is computed on the range $ [-0.0008,0.0008]$ in order to avoid boundary effects. Inspecting the range of the standardized residuals in Figure 16.18 indicates that the analysis may be strongly influenced by outliers which also may explain the extreme increase of the conditional standard deviation in Figure 16.17 in one corner. Moreover, Figure 16.19 shows some significant autocorrelation in the residuals. One explanation for this finding could be the presence of long memory in the squared observations. This topic is treated in detail in Chapter 14. Therefore, one should continue to improve the current function estimates by excluding extreme observations and using models that allow for many lags in the function of the conditional standard deviation such as, for example, Yang, Härdle and Nielsen (1999).

Figure 16.16: Plot of the conditional mean function of a NAR model with lags 1 and 3 for the returns of the Deutschemark/US-Dollar exchange rate
\includegraphics[scale=.55]{plotloclindmus}

Figure 16.17: Plot of the conditional standard deviation of a NAR model with lags 2 and 6 for the returns of the Deutschemark/US-Dollar exchange rate
\includegraphics[scale=.55]{plotloclindmusvol}

Figure 16.18: Plot of the standardized residuals of the modified model (16.41)
\includegraphics[scale=.55]{plotloclindmusvolres}

Figure 16.19: Plot of the autocorrelation function of residuals of the modified model (16.41)
\includegraphics[scale=.55]{plotloclindmusvolacf}