# 16.1 Nonlinear Autoregressive Models of Order One

## 16.1.1 Estimation of the Conditional Mean

 mh = regxest (x{, h, K, v}) computes the univariate conditional mean function using the Nadaraya-Watson estimator mh = regest (x{, h, K, v}) computes the univariate conditional mean function using the Nadaraya-Watson estimator and WARPing mh = lpregxest (x{, h, p, v}) computes the univariate conditional mean function using local polynomial estimation mh = lpregest (x{, h, p, K, d}) computes the univariate conditional mean function using local polynomial estimation and WARPing

Let us turn to estimating the conditional mean function of a nonlinear autoregressive processes of order one (NAR(1) process)

 (16.2)

using nonparametric techniques. The basic idea is to estimate a Taylor approximation of order of the unknown function around a given point . The simplest Taylor approximation is obtained if its order is chosen to be zero. One then approximates the unknown function by a constant. Of course, this approximation may turn out to be very bad if one includes observations that are distant to since this might introduce a large approximation bias. One therefore weights those observations less in the estimation. Using the least squares principle, the estimated function value is provided by the estimated constant of a local constant estimate around

 (16.3)

where denotes the weighting function, which is commonly called a kernel function, and . A number of kernel functions are used in practice, e.g. the Gaussian density function or the quartic kernel on the range and elsewhere. is known as the Nadaraya-Watson or local constant function estimator and can be written as

 (16.4)

The parameter is called bandwidth parameter and controls the weighting of the lagged variables with respect to their distance to . While choosing too small and therefore including only few observations in the estimation procedure leads to a too large estimation variance, taking too large implies a too large approximation bias. Methods for bandwidth selection are presented in Subsection 16.1.2.

Before one applies Nadaraya-Watson estimation one should be aware of the conditions that the underlying data generating mechanism has to fulfil such that the estimator has nice asymptotic properties: most importantly, the function has to be continuous, the stochastic process has to be stationary and the dependence among the observations must decline fast enough if the distance among the observations increases. For measuring dependence in nonlinear time series one commonly uses various mixing concepts. For example, a sequence is said to be -mixing (strong mixing) (Robinson; 1983) if

where and is the -field generated by . An alternative and stronger condition is given by the -mixing condition (absolute regularity)

for any and . An even stronger condition is the -mixing (uniformly mixing) condition (Billingsley; 1968) where

for any and and tends to zero for . The rate at which , or go to zero plays an important role in showing asymptotic properties of the nonparametric smoothing procedures. We note that these conditions are in general difficult to check. However, if the process follows a stationary Markov chain, then geometric ergodicity implies absolute regularity, which in turn implies strong mixing conditions. Techniques exist for checking geometric ergodicity, see e.g. Doukhan (1994) or Lu (1998). Further and more detailed conditions will be discussed in Subsection 16.2.2.

The quantlet regxest allows to compute Nadarya-Watson estimates of for an array of different 's. Its syntax is

  mh = regxest(x{, h, K, v})

with the input variables
x
matrix, in the first column the independent, in the second column the dependent variable,
h
scalar, bandwidth for which if not given, 20% of the range of the values in the first column of x is used,
K
string, kernel function on [-1,1] or Gaussian kernel "gau" for which if not given, the Quartic kernel "qua" is used,
v
vector of values of the independent variable on which to compute the regression for which if not given, x is used.
This quantlet returns a or matrix mh, where the first column is the sorted first column of x or the sorted v, the second column contains the regression estimate on the values of the first column.

In order to illustrate the methods presented in this chapter, we model the dynamics underlying the famous annual Canadian lynx trappings in 1821-1934, see e.g. Brockwell andDavis (1991, Appendix, Series G). Figures 16.1 and 16.2 of their original and logged time series are obtained with the quantlet

  library("plot")
setsize(640,480)
d1          = createdisplay(1,1)
x1          = #(1821:1934)~lynx
show(d1,1,1,x1)                 ; plot data
Trappings, 1821-1934")
setgopt(d1,1,1,"xlabel","Years","ylabel","Lynx")
d2          = createdisplay(1,1)
x2          = #(1821:1934)~log(lynx)
show(d2,1,1,x2)                 ; plot data
Lynx Trappings, 1821-1934")
setgopt(d2,1,1,"xlabel","Years","ylabel","Lynx")


Their inspection indicates that taking logarihms is required to make the time series look stationary.

The following quantlet reads the lynx data set, constructs the vectors of the dependent and lagged variables, computes the Nadaraya-Watson estimator and plots the resulting function including the scatter plot which is displayed in Figure 16.3. For selecting the bandwidth we use here the primitive rule to take one fifth of the data range.
  library("smoother")
library("plot")
setsize(640,480)
;                       data preparation
lynxrows  = rows(lynx)
lag1      = lynx[1:lynxrows-1]    ; vector of first lag
y         = lynx[2:lynxrows]      ; vector of dep. var.
data      = lag1~y
data      = log(data)
;                       estimation
h         = 0.2*(max(data[,1])-min(data[,1])); crude bandwidth
"Bandwidth used" h
mh        = regxest(data,h)      ; N-W estimation
;                       graphics
plot(xy,mh)
setgopt(plotdisplay,1,1,"title","Estimated NAR(1)
mean function")
setgopt(plotdisplay,1,1,"xlabel","First Lag","ylabel","Lynx")


For long time series the computation of the Nadaraya-Watson estimates may become quite slow since there are more points at which to estimate the function and each estimation involves more data. In this case one may use the WARPing, weighted average of rounded points, technique. The basic idea is the binning'' of the data in bins of length . Each observation is then replaced by the bincenter of the corresponding bin which means that each point is rounded to the precision given by . A typical choice for is or . In the latter case, the effective sample size , i.e. the number of nonempty bins, for computation is at most 101. If WARPing is necessary, just call the quantlet regest which has the same parameters as the quantlet regxest .

While the Nadaraya-Watson function estimate is simple to compute it may suffer from a substantial estimation bias due to the zero order Taylor expansion. Therefore, it seems natural to increase the order of the expansion. For example, by selecting one obtains the local linear estimator which corresponds to the following weighted minimiziation problem

 (16.5)

where the estimated function value is provided as before by the estimated constant . In a similar way one obtains the local quadratic estimator if one chooses . The quantlet lpregxest allows to compute local linear or local quadratic function estimates using the quartic kernel. Its syntax is
  y = lpregxest (x,h {,p {,v}})

where the inputs are:
x
matrix, in the first column the independent, in the second column the dependent variable,
h
scalar, bandwidth for which if not given, the rule-of-thumb bandwidth computed by the quantlet lpregrot is used,
p
integer, order of polynomial: p=0 yields the Nadaraya-Watson estimator, p=1 yields local linear estimation (which is default), p=2 (local quadratic) is the highest possible order,
v
, values of the independent variable on which to compute the regression for which if not given, x is used.
The output is given by the
mh
or matrix, the first column is the sorted first column of x or the sorted v, the second column contains the regression estimate on the values of the first column.

The following quantlet allows to visualize the difference between local constant and local linear estimation of the first order nonlinear autoregressive mean function for the lynx data. It produces Figure 16.4 where the solid and dotted lines display the local linear and local constant estimates, respectively. One notices that the local linear function estimate shows less variation.

  library("smoother")
library("plot")
setsize(640,480)
;                       data preparation
lynxrows  = rows(lynx)
lag1      = lynx[1:lynxrows-1]    ; vector of first lag
y         = lynx[2:lynxrows]      ; vector of dep. var.
data      = lag1~y
data      = log(data)
;                       estimation
h         = 0.2*(max(data[,1])-min(data[,1])); crude bandwidth
mh        = regxest(data,h)       ; N-W estimation
mhlp      = lpregxest(data,h)     ; local linear estimation
;                       graphics
plot(xy,mh,mhlp)
setgopt(plotdisplay,1,1,"title","Estimated NAR(1)
mean function")
setgopt(plotdisplay,1,1,"xlabel","First Lag","ylabel","Lynx")


Like Nadaraya-Watson estimation local linear estimation may become slow for long time series. In this case, one may use the quantlet lpregest which uses the WARPing technique.

## 16.1.2 Bandwidth Selection

 {hcrit, crit} = regxbwsel (x{, h, K}) interactive tool for bandwidth selection in univariate kernel regression estimation. {hcrit, crit} = regbwsel (x{, h, K, d}) interactive tool for bandwidth selection in univariate kernel regression estimation using the WARPing method.

So far we have used a primitive way of selecting the bandwidth parameter . Of course, there are better methods for bandwidth choice. They are all based on minimizing some estimated distance measures. Since we are interested in one bandwidth for various , we look at global'' distances like, for instance, the integrated squared error (ISE)

 (16.6)

Here denotes the density of the stationary distribution and is a weight function with compact support. Note that the bandwidth which minimizes the ISE in generally varies from sample to sample. In practice, one may want to avoid the integration and consider an approximation of the ISE, namely the average squared error (ASE)

 (16.7)

Since the measure of accuracy involves the unknown autoregression function , it cannot be used directly. Instead, one may estimate by . One then obtains the average squared error of prediction (ASEP)

 (16.8)

This, however, implies the new problem that can be driven to zero by choosing small enough. To see this consider the Nadaraya-Watson estimator (16.4) and imagine that the bandwidth is chosen so small that (16.4) becomes . This implies . This estimation problem can easily be solved by always leaving out in computing (16.4) which leads to

 (16.9)

and is called the leave-one-out cross-validation estimate of the autoregression function. One therefore estimates with the cross-validation function

 (16.10)

Let be the bandwidth that minimizes . Härdle (1990) and Härdle and Vieu (1992) proved that under an -mixing condition,

The interactive quantlet regxbwsel offers cross-validation and other bandwidth selection methods. The latter may be used in case of independent data. It is called by

  {hcrit, crit} = regxbwsel(x{, h, K})

with the input variables:
x
vector of the data,
h
vector of bandwidths,
K
string, kernel function on e.g. quartic kernel "qua" (default) or Gaussian kernel "gau".
The output variables are:
hcrit
vector, selected bandwidths by the different criteria,
crit
string vector, criteria considered for bandwidth selection.
If one wants to use WARPing one has to use the quantlet regbwsel . Using the following quantlet one may estimate the cross-validation bandwidth for the lynx data set and obtains .
  library("smoother")
library("plot")
setsize(640,480)
;                       data preparation
lynxrows  = rows(lynx)
lag1      = lynx[1:lynxrows-1]            ; vector of first lag
y         = lynx[2:lynxrows]              ; vector of dep. var.
data      = lag1~y
data      = log(data)
;
tmp       = regxbwsel(data)


It was already noted that the optimal bandwidth with respect to ISE (16.6) or ASE (16.7) may vary across samples. In order to obtain a sample independent optimal bandwidth one may consider the mean integrated squared error (MISE)

 (16.11)

Like or , it also cannot be used directly. It is, however, possible to derive the asymptotic expansion of . This allows to obtain an explicit formula for the asymptotically optimal bandwidth which, however, contains unknown constants. In Subsection 16.2.2 we show how one can estimate these unknown quantities in order to obtain a plug-in bandwidth .

## 16.1.3 Diagnostics

 acfplot (x) generates plot of autocorrelation function of time series contained in vector x. {jb, probjb, sk, k} = jarber (x, 1) checks for normality of the data contained in vector x using the Jarque-Bera test.

It is well known that if a fitted model is misspecified, then resulting inference can be misleading like, for example, for confidence intervals or significance tests. One way to check whether a chosen model is correctly specified is to investigate the resulting residuals. Most importantly, one checks for autocorrelation remaining in the residuals. This can easily be done by inspecting the graph of the autocorrelation function using the quantlet acfplot . It only requires the vector x with the estimated residuals as input variable. The quantlet also draws 95% confidence intervals for the case of no autocorrelation.

Another issue is to check the normality of the residuals. This is commonly done by using the Bera-Jarque test suggested by Bera and Jarque (1982). It is commonly called JB-test and can be computed with the quantlet jarber which is called by

  {jb, probjb, sk, k} = jarber(resid, printout)

with input variables
resid
matrix of residuals,
printout
scalar, 0 no printout, 1 printout,
and output variables
jb
scalar, test statistic of Jarque-Bera test,
probjb
scalar, probability value of test statistics,
sk
scalar, skewness,
k
scalar, kurtosis.
In the following quantlet these diagnostics are applied to the residuals of the NAR(1) model fitted to the lynx data using the Nadaraya-Watson estimator (16.4) with the cross-validation bandwidth
;               load required quantlets
library("smoother")
library("plot")
func("acfplot")
func("jarber")
setsize(640,480)
;               data preparation
lynxrows  = rows(lynx)
lag1      = lynx[1:lynxrows-1]        ; vector of first lag
y         = lynx[2:lynxrows]          ; vector of dep. var.
data      = lag1~y
data      = log(data)
datain    = data~#(1:lynxrows-1)      ; add index to data
dataso    = sort(datain,1)            ; sorted data
;               estimation
h         = 1.12085               ; Cross-validation bandwidth
mhlp      = regxest(dataso[,1|2],h)
; local constant estimation
;               graphics
plot(xy,mhlp)
setgopt(plotdisplay,1,1,"title",
"Estimated NAR(1) mean function")
setgopt(plotdisplay,1,1,"xlabel","First Lag","ylabel","Lynx")
;               diagnostics
yhatso    = mhlp.data[,2]~dataso[,3]  ; sorted est. fct. values
yhat      = sort(yhatso,2)            ; undo sorting
eps       = data[,2] - yhat[,1]       ; compute residuals
acfplot(eps)            ; plot autocorrelation function of res.
setgopt(dacf,1,1,"title","Autocorrelation function of NAR(1)
residuals")
;
{jb,probjb,sk,k} = jarber(eps,1)
; compute Jarque-Bera test for normality of residuals


The plot of the resulting autocorrelation function of the residuals is shown in Figure 16.5. It clearly shows that the residuals are not white noise. This indicates that one should use a higher order nonlinear autoregressive process for modelling the dynamics of the lynx data. This will be discussed in Section 16.2. Moreover, normality is rejected even at the 1% significance level since the JB-test statistic is 11.779 which implies a -value of 0.003.

## 16.1.4 Confidence Intervals

 {mh, clo, cup} = regxci (x{, h, alpha, K, xv}) computes pointwise confidence intervals with prespecified confidence level for univariate regression using the Nadaraya-Watson estimator. {mh, clo, cup} = regci (x{, h, alpha, K, d}) computes pointwise confidence intervals with prespecified confidence level for univariate regression using the Nadaraya-Watson estimator. The computation uses WARPing.

Once one selected the bandwidth and checked the residuals one often wants to investigate the variance of estimating the autoregression function. Under appropriate conditions, the variance of both the Nadaraya-Watson and the local linear estimator can be approximated by

 (16.12)

as will be seen in Subsection 16.2.1. (16.12) can be used for constructing confidence intervals for since one can estimate the conditional variance by the kernel estimate

 (16.13)

and the density by the kernel estimate

 (16.14)

Based on these estimates the quantlet regxci computes pointwise confidence intervals using the Nadaraya-Watson estimator. It is called with

  {mh, clo, cup} = regxci(x{, h, alpha, K, xv})

with input variables:
x
matrix of the data with the independent and the dependent variable in the first and second column, respectively,
h
scalar, bandwidth for which if not given 20% of the range of the values in the first column x is used,
alpha
confidence level with 0.05 as default value,
K
string, kernel function on and the quartic kernel "qua" as default,
xv
matrix of the values of the independent variable on which to compute the regression and x as default.
The output variables are:
mh
or matrix, the first column is the sorted first column of x or the sorted xv, the second column contains the regression estimate on the values of the first column,
clo
or matrix, the first column is the sorted first column of x or the sorted xv, the second column contains the lower confidence bounds on the values of the first column,
cup
or matrix, the first column is the sorted first column of x or the sorted xv, the second column contains the upper confidence bounds on the values of the first column.
If the WARPing technique is required, one uses the quantlet regci .

In Subsection 16.1.3 we found that the NAR(1) model for the lynx data is misspecified. Therefore, it is not appropriate for illustrating the computation of pointwise confidence intervals. Instead we will use a simulated time series. The quantlet below generates 150 observations of a stationary exponential AR(1) process

 (16.15)

calls the interactive quantlet regxbwsel for bandwidth selection where one has to choose for the first time cross-validation and for the second time stop, computes the confidence intervals and plots the true and estimated function (solid and dashed line) as well as the pointwise confidence intervals (dotted line) as shown in Figure 16.6.

  library("smoother")
library("plot")
library("times")
setsize(640,480)

;                   generate exponential AR(1) process
phi1    = 0.3
phi2    = 2.2
g       = 0.1
randomize(0)
x       = genexpar(1,g,phi1,phi1+phi2,normal(150))

;                   data preparation
xrows   = rows(x)
lag1    = x[1:xrows-1]             ; vector of first lag
y       = x[2:xrows]               ; vector of dep. var.
data    = lag1~y

;                   true function
f       = sort(lag1~(phi1*lag1 + phi2*lag1.*exp(-g*lag1^2)),1)

;                   estimation
{hcrit,crit}    = regxbwsel(data)
{mh, clo, cup}  = regxci(data,hcrit)

plot(data,f,mh,clo,cup)
setgopt(plotdisplay,1,1,"title","Confidence intervals of
estimated NAR(1) mean function")
setgopt(plotdisplay,1,1,"xlabel","First Lag","ylabel","Y")


## 16.1.5 Derivative Estimation

 mh = lpderxest (x, h{, q, p, K, v}) estimates the q-th derivative of a regression function using local polynomial kernel regression with quartic kernel. mh = lpderest (x, h{, q, p, K, d}) estimates the q-th derivative of a autoregression function using local polynomial kernel regression. The computation uses WARPing.

When investigating the properties of a conditional mean function, one is often interested in its derivatives. The estimation of derivatives can be accomplished by using local polynomial estimation as long as the order of the polynomial is at least as large as the order of the derivative to be estimated. Using a local quadratic estimator

one estimates the first and second derivative of at with

In general, one uses a instead of a -th order polynomial for the estimation of the -th derivative since this reduces the complexity of the estimation bias, see e.g. Fan and Gijbels (1995). The estimated derivative is then obtained as . The quantlet lpderxest allows to estimate first and second order derivatives where maximally a second order polynomial is used. It is called by
  mh = lpderxest (x, h{, q, p, K, v})

with input variables
x
matrix of the data with the independent and dependent variable in the first and second column, respectively.
h
scalar, bandwidth for which if not given the rule-of-thumb bandwidth is computed with lpderrot ,
q
integer , order of derivative for which if not given, q=1 (first derivative) is chosen,
p
integer, order of polynomial for which if not given, p=q + 1 is used for q and p=q is used for q=2,
v
, values of the independent variable on which to compute the regression for which if not given, x is used.
The output variable is
mh
or matrix where the first column is the sorted first column of x or the sorted v and the second column contains the derivative estimate on the values of the first column.
The quantlet lpderest which applies the WARPing technique (Fan and Marron; 1994) allows for p and q . We note, however, that WARPing may waste a lot of information. Bandwidth selection remains an important issue and can be done using the quantlet lpderrot .

In the following quantlet we estimate the first and second derivatives of the conditional mean function of the exponential AR(1) process (16.15) based on 150 observations. The true derviatives (solid lines) and their estimates (dashed lines) are shown in Figures 16.7 and 16.8.

  library("smoother")
library("plot")
library("times")
setsize(640,480)
;                   generate exponential AR(1) process
phi1    = 0.3
phi2    = 2.2
g       = 0.1
randomize(0)
x       = genexpar(1,g,phi1,phi1+phi2,normal(150))

;                       data preparation
xrows   = rows(x)
lag1    = x[1:xrows-1]             ; vector of first lag
y       = x[2:xrows]               ; vector of dep. var.
data    = lag1~y
ffder   = sort(lag1~(phi1 + exp(-g*lag1^2).*
phi2.*(1-2.*g.*lag1^2)),1)
fsder   = sort(lag1~(exp(-g*lag1^2).*(-2*g.*lag1)*
phi2.*(3-2.*g.*lag1^2)),1)

;                       estimate first derivative
mhfder  = lpderxest(data)
plotder = createdisplay(1,1)
show(plotder,1,1,ffder,mhfder)
setgopt(plotder,1,1,"title","Estimated first derivative
of mean function")
setgopt(plotder,1,1,"xlabel","First lag","ylabel",
"First derivative")
;                       estimate second derivative