16.1 Nonlinear Autoregressive Models of Order One
16.1.1 Estimation of the Conditional Mean
- mh =
regxest
(x{, h, K, v})
- computes the univariate conditional mean function using the
Nadaraya-Watson estimator
- mh =
regest
(x{, h, K, v})
- computes the univariate conditional mean function using the
Nadaraya-Watson estimator and WARPing
- mh =
lpregxest
(x{, h, p, v})
- computes the univariate conditional mean function using
local polynomial estimation
- mh =
lpregest
(x{, h, p, K, d})
- computes the univariate conditional mean function using
local polynomial estimation and WARPing
|
Let us turn to estimating the conditional mean function
of a
nonlinear autoregressive processes of order one (NAR(1) process)
 |
(16.2) |
using nonparametric techniques. The basic idea is to estimate a Taylor
approximation of order
of the unknown function
around a given
point
. The simplest Taylor approximation is obtained if its order
is chosen to be zero. One then approximates the unknown function
by a constant. Of course, this approximation may turn out to be very
bad if one includes observations
that are distant to
since this might introduce a large approximation bias. One therefore
weights those observations less in the estimation. Using the least
squares principle, the estimated function value
is
provided by the estimated constant
of a local constant
estimate around
 |
(16.3) |
where
denotes the weighting function, which is commonly called a
kernel function, and
. A number of kernel functions are used in
practice, e.g. the Gaussian density function or the quartic kernel
on the range
and
elsewhere.
is known as the Nadaraya-Watson or local
constant function estimator and can be written as
 |
(16.4) |
The parameter
is called bandwidth parameter and controls the weighting of
the lagged variables
with respect to their distance to
. While
choosing
too small and therefore including only few observations in the
estimation procedure leads to a too large estimation variance, taking
too
large implies a too large approximation bias. Methods for bandwidth selection
are presented in Subsection 16.1.2.
Before one applies Nadaraya-Watson estimation one should be aware of
the conditions that the underlying data generating mechanism has to
fulfil such that the estimator has nice asymptotic properties: most importantly, the
function
has to be continuous, the stochastic process has to be
stationary and the dependence among the observations must decline fast
enough if the distance among the observations increases. For measuring
dependence in nonlinear time series one commonly uses various mixing
concepts. For example, a sequence is said to be
-mixing
(strong mixing) (Robinson; 1983) if
where
and
is the
-field
generated by
. An alternative and stronger condition
is given by the
-mixing condition (absolute regularity)
for any
and
. An
even stronger condition is the
-mixing (uniformly mixing)
condition (Billingsley; 1968) where
for any
and
and
tends to zero for
. The rate at which
,
or
go to zero plays an important role
in showing asymptotic properties of the nonparametric smoothing
procedures. We note that these conditions are in general difficult to
check. However, if the process follows a stationary Markov chain, then
geometric ergodicity implies absolute regularity, which in turn
implies strong mixing conditions. Techniques exist for checking
geometric ergodicity, see e.g. Doukhan (1994) or Lu (1998).
Further and more detailed conditions will be discussed in Subsection
16.2.2.
The quantlet
regxest
allows to compute Nadarya-Watson estimates of
for an
array of different
's. Its syntax is
mh = regxest(x{, h, K, v})
with the input variables
- x
-
matrix, in the first column the independent, in the
second column the dependent variable,
- h
- scalar, bandwidth for which if not given, 20% of the range of the values in
the first column of x is used,
- K
- string, kernel function on [-1,1] or Gaussian kernel "gau" for which if
not given, the Quartic kernel "qua" is used,
- v
-
vector of values of the independent variable on which to compute
the regression for which if not given, x is used.
This quantlet returns a
or
matrix mh,
where the first column is the sorted first column of x or the
sorted v, the second column contains the regression estimate on
the values of the first column.
In order to illustrate the methods presented in this chapter, we model
the dynamics underlying the famous annual Canadian lynx trappings in
1821-1934, see e.g. Brockwell andDavis (1991, Appendix, Series
G). Figures 16.1 and
16.2 of their original and logged time series are
obtained with the quantlet
library("plot")
setsize(640,480)
lynx = read("lynx.dat") ; read data
d1 = createdisplay(1,1)
x1 = #(1821:1934)~lynx
setmaskl (x1, (1:rows(x1))', 0, 1)
show(d1,1,1,x1) ; plot data
setgopt(d1,1,1,"title","Annual Canadian Lynx
Trappings, 1821-1934")
setgopt(d1,1,1,"xlabel","Years","ylabel","Lynx")
d2 = createdisplay(1,1)
x2 = #(1821:1934)~log(lynx)
setmaskl (x2, (1:rows(x2))', 0, 1)
show(d2,1,1,x2) ; plot data
setgopt(d2,1,1,"title","Logs of Annual Canadian
Lynx Trappings, 1821-1934")
setgopt(d2,1,1,"xlabel","Years","ylabel","Lynx")
Their inspection indicates that taking logarihms is required to make the time
series look stationary.
Figure 16.1:
Time series of annual Canadian Lynx Trappings, 1821-1934
|
Figure 16.2:
Time series of logarithm of annual Canadian Lynx Trappings, 1821-1934
|
The following quantlet reads the
lynx
data set, constructs the vectors
of the dependent and lagged variables, computes the Nadaraya-Watson estimator
and plots the resulting function including the scatter plot which is displayed
in Figure 16.3. For selecting the bandwidth we use here the
primitive rule to take one fifth of the data range.
library("smoother")
library("plot")
setsize(640,480)
; data preparation
lynx = read("lynx.dat")
lynxrows = rows(lynx)
lag1 = lynx[1:lynxrows-1] ; vector of first lag
y = lynx[2:lynxrows] ; vector of dep. var.
data = lag1~y
data = log(data)
; estimation
h = 0.2*(max(data[,1])-min(data[,1])); crude bandwidth
"Bandwidth used" h
mh = regxest(data,h) ; N-W estimation
; graphics
mh = setmask(mh,"line","blue")
xy = setmask(data,"cross","small")
plot(xy,mh)
setgopt(plotdisplay,1,1,"title","Estimated NAR(1)
mean function")
setgopt(plotdisplay,1,1,"xlabel","First Lag","ylabel","Lynx")
Figure 16.3:
Nadaraya-Watson estimates of NAR(1) mean function for lynx data and scatter plot
|
For long time series the computation of the Nadaraya-Watson estimates
may become quite slow since there are more points at which to estimate
the function and each estimation involves more data. In this case one
may use the WARPing, weighted average of rounded points,
technique.
The basic idea is the ``binning'' of the data in bins of length
.
Each observation is then replaced by the bincenter of the
corresponding bin which means that each point is rounded to the
precision given by
. A typical choice for
is
or
. In the latter case, the effective sample
size
, i.e. the number of nonempty bins, for computation is at
most 101. If WARPing is necessary, just call the quantlet
regest
which has the same parameters as the quantlet
regxest
.
While the Nadaraya-Watson function estimate is simple to compute it
may suffer from a substantial estimation bias due to the zero order
Taylor expansion. Therefore, it seems natural to increase the order
of the expansion. For example, by selecting
one obtains the
local linear estimator which corresponds to the following weighted
minimiziation problem
 |
(16.5) |
where the estimated function value
is provided as
before by the estimated constant
. In a similar way one
obtains the local quadratic estimator if one chooses
. The
quantlet
lpregxest
allows to compute local linear or local
quadratic function estimates using the quartic kernel. Its syntax is
y = lpregxest (x,h {,p {,v}})
where the inputs are:
- x
-
matrix, in the first column the independent, in the
second column the dependent variable,
- h
- scalar, bandwidth for which if not given, the rule-of-thumb bandwidth computed
by the quantlet lpregrot is used,
- p
- integer, order of polynomial: p=0 yields the Nadaraya-Watson
estimator, p=1 yields local linear estimation (which is
default), p=2 (local quadratic) is the highest possible order,
- v
-
, values of the independent variable
on which to compute the regression for which if not given, x is used.
The output is given by the
- mh
-
or
matrix, the first column is the sorted
first column of x or the sorted v, the second column
contains the regression estimate on the values of the first column.
The following quantlet allows to visualize the difference between
local constant and local linear estimation of the first order
nonlinear autoregressive mean function for the lynx data.
It produces Figure
16.4 where the solid and dotted lines display the local
linear and local constant estimates, respectively. One notices that
the local linear function estimate shows less variation.
library("smoother")
library("plot")
setsize(640,480)
; data preparation
lynx = read("lynx.dat")
lynxrows = rows(lynx)
lag1 = lynx[1:lynxrows-1] ; vector of first lag
y = lynx[2:lynxrows] ; vector of dep. var.
data = lag1~y
data = log(data)
; estimation
h = 0.2*(max(data[,1])-min(data[,1])); crude bandwidth
mh = regxest(data,h) ; N-W estimation
mhlp = lpregxest(data,h) ; local linear estimation
; graphics
mh = setmask(mh,"line","blue","dashed")
mhlp = setmask(mhlp,"line","red")
xy = setmask(data,"cross","small")
plot(xy,mh,mhlp)
setgopt(plotdisplay,1,1,"title","Estimated NAR(1)
mean function")
setgopt(plotdisplay,1,1,"xlabel","First Lag","ylabel","Lynx")
Figure 16.4:
Local linear estimates (solid line) and Nadaraya-Watson estimates (dotted line) of NAR(1) mean function for lynx data and scatter plot
|
Like Nadaraya-Watson estimation local linear estimation may become
slow for long time series. In this case, one may use the quantlet
lpregest
which uses the WARPing technique.
16.1.2 Bandwidth Selection
- {hcrit, crit} =
regxbwsel
(x{, h, K})
- interactive tool for bandwidth selection in univariate kernel
regression estimation.
- {hcrit, crit} =
regbwsel
(x{, h, K, d})
- interactive tool for bandwidth selection in univariate kernel
regression estimation using the WARPing method.
|
So far we have used a primitive way of selecting the bandwidth
parameter
. Of course, there are better methods for bandwidth
choice. They are all based on minimizing some estimated distance
measures. Since we are interested in one bandwidth for various
, we
look at ``global'' distances like, for instance, the
integrated squared error
(ISE)
 |
(16.6) |
Here
denotes the density of the stationary distribution
and
is a weight function with compact support. Note that
the bandwidth which minimizes the ISE
in generally varies
from sample to sample. In practice, one may want to avoid the
integration and consider an approximation of the ISE, namely the
average squared error
(ASE)
 |
(16.7) |
Since the measure of accuracy
involves the unknown
autoregression function
, it cannot be used directly.
Instead, one may estimate
by
. One then obtains the
average squared error of prediction (ASEP)
 |
(16.8) |
This, however, implies the new problem that
can be driven
to zero by choosing
small enough. To see this consider the
Nadaraya-Watson estimator (16.4) and imagine that the
bandwidth
is chosen so small that (16.4) becomes
. This implies
. This
estimation problem can easily be solved by always leaving out
in
computing (16.4) which leads to
 |
(16.9) |
and is called the leave-one-out cross-validation estimate
of the
autoregression function. One therefore estimates
with the
cross-validation function
 |
(16.10) |
Let
be the bandwidth that minimizes
.
Härdle (1990) and Härdle and Vieu (1992) proved
that under an
-mixing condition,
The interactive quantlet
regxbwsel
offers cross-validation and
other bandwidth selection methods. The latter may be used in case of
independent data.
It is called by
{hcrit, crit} = regxbwsel(x{, h, K})
with the input variables:
- x
-
vector of the data,
- h
-
vector of bandwidths,
- K
- string, kernel function on
e.g. quartic kernel "qua" (default)
or Gaussian kernel "gau".
The output variables are:
- hcrit
-
vector, selected bandwidths by the different criteria,
- crit
-
string vector, criteria considered for bandwidth selection.
If one wants to use WARPing one has to use the quantlet
regbwsel
. Using the following quantlet one may estimate the
cross-validation bandwidth for the lynx data set and obtains
.
library("smoother")
library("plot")
setsize(640,480)
; data preparation
lynx = read("lynx.dat")
lynxrows = rows(lynx)
lag1 = lynx[1:lynxrows-1] ; vector of first lag
y = lynx[2:lynxrows] ; vector of dep. var.
data = lag1~y
data = log(data)
;
tmp = regxbwsel(data)
It was already noted that the optimal bandwidth with respect to ISE
(16.6) or ASE (16.7) may vary across samples.
In order to obtain a sample independent optimal bandwidth one may
consider the mean integrated squared error (MISE)
![$\displaystyle d_M(h) = E\left[\int \left\{ f(y)-\widehat{f}(y,h)\right\}^2 w(y)\mu(y)dy\right].$](xaghtmlimg1658.gif) |
(16.11) |
Like
or
, it also cannot be used directly. It is,
however, possible to derive the asymptotic expansion of
. This
allows to obtain an explicit formula for the asymptotically optimal
bandwidth
which, however, contains unknown constants. In
Subsection 16.2.2 we show how one can estimate these unknown
quantities in order to obtain a plug-in bandwidth
.
16.1.3 Diagnostics
acfplot
(x)
- generates plot of autocorrelation function of time series
contained in vector x.
- {jb, probjb, sk, k} =
jarber
(x, 1)
- checks for normality of the data contained in
vector x using the Jarque-Bera test.
|
It is well known that if a fitted model is misspecified, then
resulting inference can be misleading like, for example, for confidence
intervals or significance tests. One way to check whether a chosen
model is correctly specified is to investigate the resulting
residuals. Most importantly, one checks for autocorrelation remaining
in the residuals. This can easily be done by inspecting the graph of
the autocorrelation function using the quantlet
acfplot
. It
only requires the
vector x with the estimated
residuals as input variable. The quantlet also draws 95% confidence
intervals for the case of no autocorrelation.
Another issue is to check the normality of the residuals. This is
commonly done by using the Bera-Jarque test
suggested by
Bera and Jarque (1982). It is commonly called JB-test and
can be computed with the
quantlet
jarber
which is called by
{jb, probjb, sk, k} = jarber(resid, printout)
with input variables
- resid
-
matrix of residuals,
- printout
- scalar, 0 no printout, 1 printout,
and output variables
- jb
- scalar, test statistic of Jarque-Bera test,
- probjb
- scalar, probability value of test statistics,
- sk
- scalar, skewness,
- k
- scalar, kurtosis.
In the following quantlet these diagnostics are applied to the
residuals of the NAR(1) model fitted to the lynx data using the
Nadaraya-Watson estimator (16.4) with the cross-validation
bandwidth
; load required quantlets
library("smoother")
library("plot")
func("acfplot")
func("jarber")
setsize(640,480)
; data preparation
lynx = read("lynx.dat")
lynxrows = rows(lynx)
lag1 = lynx[1:lynxrows-1] ; vector of first lag
y = lynx[2:lynxrows] ; vector of dep. var.
data = lag1~y
data = log(data)
datain = data~#(1:lynxrows-1) ; add index to data
dataso = sort(datain,1) ; sorted data
; estimation
h = 1.12085 ; Cross-validation bandwidth
mhlp = regxest(dataso[,1|2],h)
; local constant estimation
; graphics
mhlp = setmask(mhlp,"line","red")
xy = setmask(data,"cross","small")
plot(xy,mhlp)
setgopt(plotdisplay,1,1,"title",
"Estimated NAR(1) mean function")
setgopt(plotdisplay,1,1,"xlabel","First Lag","ylabel","Lynx")
; diagnostics
yhatso = mhlp.data[,2]~dataso[,3] ; sorted est. fct. values
yhat = sort(yhatso,2) ; undo sorting
eps = data[,2] - yhat[,1] ; compute residuals
acfplot(eps) ; plot autocorrelation function of res.
setgopt(dacf,1,1,"title","Autocorrelation function of NAR(1)
residuals")
;
{jb,probjb,sk,k} = jarber(eps,1)
; compute Jarque-Bera test for normality of residuals
The plot of the resulting autocorrelation function of the residuals is
shown in Figure 16.5. It clearly shows that the
residuals are not white noise. This indicates that one should use a
higher order nonlinear autoregressive process for modelling the
dynamics of the lynx data. This will be discussed in Section
16.2. Moreover, normality is rejected even at the 1%
significance level since the JB-test statistic is 11.779 which implies a
-value of 0.003.
Figure 16.5:
Autocorrelation function of estimated residuals based on a NAR(1) model for the lynx data
|
16.1.4 Confidence Intervals
- {mh, clo, cup} =
regxci
(x{, h, alpha, K, xv})
- computes pointwise confidence intervals with prespecified
confidence level for univariate regression using the
Nadaraya-Watson estimator.
- {mh, clo, cup} =
regci
(x{, h, alpha, K, d})
- computes pointwise confidence intervals with prespecified
confidence level for univariate regression using the
Nadaraya-Watson estimator. The computation uses WARPing.
|
Once one selected the bandwidth and checked the residuals one often wants to
investigate the variance of estimating the autoregression function. Under
appropriate conditions, the variance of both the Nadaraya-Watson and the local
linear estimator
can be approximated by
 |
(16.12) |
as will be seen in Subsection 16.2.1. (16.12) can
be used for constructing confidence intervals for
since one can estimate the conditional variance
by the
kernel estimate
 |
(16.13) |
and the density
by the kernel estimate
 |
(16.14) |
Based on these estimates the quantlet
regxci
computes pointwise
confidence intervals using the Nadaraya-Watson estimator. It is called with
{mh, clo, cup} = regxci(x{, h, alpha, K, xv})
with input variables:
- x
-
matrix of the data with the independent and the dependent variable in
the first and second column, respectively,
- h
- scalar, bandwidth for which if not given 20% of the range of the values in the first column
x is used,
- alpha
- confidence level with 0.05 as default value,
- K
- string, kernel function on
and the quartic kernel "qua"
as default,
- xv
-
matrix of the values of the independent variable on which to compute the
regression and x as default.
The output variables are:
- mh
-
or
matrix, the first column is the sorted first
column of x or the sorted xv, the second
column contains the regression estimate on the values
of the first column,
- clo
-
or
matrix, the first column is the sorted first
column of x or the sorted xv, the second
column contains the lower confidence bounds
on the values of the first column,
- cup
-
or
matrix, the first column is
the sorted first column of x or the sorted xv,
the second column contains the upper confidence bounds on
the values of the first column.
If the WARPing technique is required, one uses the quantlet
regci
.
In Subsection 16.1.3 we found that the NAR(1) model for
the
lynx
data is misspecified.
Therefore, it is not appropriate for illustrating the computation of pointwise
confidence intervals. Instead we will use a simulated time series. The quantlet
below generates 150 observations of a stationary exponential AR(1) process
 |
(16.15) |
calls the interactive quantlet
regxbwsel
for bandwidth
selection where one has to choose for the first time cross-validation and for the second time stop, computes the
confidence intervals and plots the true and estimated function (solid
and dashed line) as well as the pointwise confidence intervals (dotted
line) as shown in Figure 16.6.
library("smoother")
library("plot")
library("times")
setsize(640,480)
; generate exponential AR(1) process
phi1 = 0.3
phi2 = 2.2
g = 0.1
randomize(0)
x = genexpar(1,g,phi1,phi1+phi2,normal(150))
; data preparation
xrows = rows(x)
lag1 = x[1:xrows-1] ; vector of first lag
y = x[2:xrows] ; vector of dep. var.
data = lag1~y
; true function
f = sort(lag1~(phi1*lag1 + phi2*lag1.*exp(-g*lag1^2)),1)
; estimation
{hcrit,crit} = regxbwsel(data)
{mh, clo, cup} = regxci(data,hcrit)
f = setmask(f,"line","solid","red")
data = setmask(data,"cross")
mh = setmask(mh,"line","dashed","blue")
clo = setmask(clo,"line","blue","thin","dotted")
cup = setmask(cup,"line","blue","thin","dotted")
plot(data,f,mh,clo,cup)
setgopt(plotdisplay,1,1,"title","Confidence intervals of
estimated NAR(1) mean function")
setgopt(plotdisplay,1,1,"xlabel","First Lag","ylabel","Y")
Figure 16.6:
True and estimated mean function plus pointwise confidence
intervals for a generated exponential AR(1) process
|
16.1.5 Derivative Estimation
- mh =
lpderxest
(x, h{, q, p, K, v})
- estimates the q-th derivative of a regression function using local
polynomial kernel regression with quartic kernel.
- mh =
lpderest
(x, h{, q, p, K, d})
- estimates the q-th derivative of a autoregression function using local
polynomial kernel regression. The computation uses WARPing.
|
When investigating the properties of a conditional mean function, one is often
interested in its derivatives.
The estimation of derivatives can be accomplished
by using local polynomial estimation as long as the order
of the polynomial
is at least as large as the order
of the derivative to be estimated. Using a
local quadratic estimator
one estimates the first and second derivative of
at
with
In general, one uses a
instead of a
-th order polynomial for
the estimation of the
-th derivative since this reduces the
complexity of the estimation bias, see e.g. Fan and Gijbels (1995). The estimated derivative is then
obtained as
. The quantlet
lpderxest
allows to estimate first and second order derivatives where
maximally a second order polynomial is used. It is called by
mh = lpderxest (x, h{, q, p, K, v})
with input variables
- x
-
matrix of the data with the independent and dependent
variable in the first and second column, respectively.
- h
- scalar, bandwidth for which if not given the rule-of-thumb bandwidth
is computed with
lpderrot
,
- q
- integer
, order of derivative for which if not given,
q=1 (first derivative) is
chosen,
- p
- integer, order of polynomial for which if not given, p=q + 1
is used for q
and
p=q is used for q=2,
- v
-
, values of the independent variable on which to
compute the regression for which
if not given, x is used.
The output variable is
- mh
-
or
matrix where the first
column is the sorted first column of x or the sorted v and
the second column contains the derivative estimate on the values of the first column.
The quantlet
lpderest
which applies the WARPing
technique
(Fan and Marron; 1994) allows for p
and q
. We
note, however, that WARPing may waste a lot of information. Bandwidth selection remains an important
issue and can be done using the quantlet
lpderrot
.
In the following quantlet we estimate the first and second derivatives
of the conditional mean function of the exponential AR(1) process
(16.15) based on 150 observations. The true derviatives
(solid lines) and their estimates (dashed lines) are shown in Figures
16.7 and 16.8.
library("smoother")
library("plot")
library("times")
setsize(640,480)
; generate exponential AR(1) process
phi1 = 0.3
phi2 = 2.2
g = 0.1
randomize(0)
x = genexpar(1,g,phi1,phi1+phi2,normal(150))
; data preparation
xrows = rows(x)
lag1 = x[1:xrows-1] ; vector of first lag
y = x[2:xrows] ; vector of dep. var.
data = lag1~y
ffder = sort(lag1~(phi1 + exp(-g*lag1^2).*
phi2.*(1-2.*g.*lag1^2)),1)
fsder = sort(lag1~(exp(-g*lag1^2).*(-2*g.*lag1)*
phi2.*(3-2.*g.*lag1^2)),1)
; estimate first derivative
ffder = setmask(ffder,"line","solid","red")
mhfder = lpderxest(data)
mhfder = setmask(mhfder, "line","dashed","blue")
plotder = createdisplay(1,1)
show(plotder,1,1,ffder,mhfder)
setgopt(plotder,1,1,"title","Estimated first derivative
of mean function")
setgopt(plotder,1,1,"xlabel","First lag","ylabel",
"First derivative")
; estimate second derivative
fsder = setmask(fsder,"line","solid","red")
hrot = 2*lpderrot(data,2)
mhsder = lpderxest(data,hrot,2)
mhsder = setmask(mhsder, "line","dashed","blue")
plot(fsder,mhsder)
setgopt(plotdisplay,1,1,"title","Estimated second
derivative of mean function")
setgopt(plotdisplay,1,1,"xlabel","First lag","ylabel",
"Second derivative")
Figure 16.7:
True and estimated first derivative for a generated exponential AR(1) process
|
Figure 16.8:
True and estimated second derivative a generated exponential AR(1) process
|