16.2 Nonlinear Autoregressive Models of Higher Order
In Subsection 16.1.3 we briefly discussed diagnostics to
check for the correct specification of a time series model. There we
found for the
lynx
data set that the nonlinear autoregressive
model of order one (16.2) is of too low order to capture
the linear correlation in the data. For practical flexible time series
modelling it is therefore necessary to allow for higher order
nonlinear autoregressive models (16.1).
Their estimation
and the selection of relevant lags will be discussed in this section.
To simplify notation, we introduce the vector of lagged variables
such that
(16.1) can be written as
 |
(16.16) |
.
16.2.1 Estimation of the Conditional Mean
- mh =
regxestp
(x{, h, K, v})
- computes the Nadaraya-Watson estimator for multivariate
autoregression.
- mh =
regestp
(x{, h, K, d})
- Nadaraya-Watson estimator for multivariate regression. The
computation uses WARPing.
- mh =
lregxestp
(x{, h, K, v})
- estimates a multivariate regression function using local
polynomial kernel regression with quartic kernel.
- mh =
lregestp
(x{, h, K, d})
- estimates a multivariate regression function using local
polynomial kernel regression. The computation uses WARPing.
- {mA, gsqA, denA, err} =
fvllc
(Xsj, Yorig, h, Xtj, kernreg, lorq, fandg, loo)
- estimates a multivariate regression function using local
linear regression with Gaussian kernel.
|
It is not difficult to extend the Nadaraya-Watson estimator (16.4)
and local linear estimator (16.5) to several lags in the
conditional mean function
.
One then simply uses
Taylor expansions of order
for several variables. In the weighted
minimization problem of the local constant estimator (16.3) one
has to extend the kernel function
for several lagged
variables. The simplest way of doing this is to use a product kernel
 |
(16.17) |
where one
is a vector of
bandwidths for each lag or variable. Of course, one may also use the
same bandwidth
for all lags in which
case we write
. Using a scalar bandwidth, (16.3) becomes
 |
(16.18) |
and the Nadaraya-Watson estimator is given by
 |
(16.19) |
Note that from now on we indicate the Nadaraya-Watson estimator
and local linear estimator by the indices
and
, respectively.
The local linear estimator with
is derived from the weighted
minimization
 |
(16.20) |
Using the notation
the estimate
can be
written for any
as
 |
(16.21) |
Under suitable conditions which are listed in
Subsection 16.2.2 the
Nadaraya-Watson estimator (16.19) and local linear estimator
(16.21) have an asymptotic normal distribution
 |
(16.22) |
where
 |
(16.23) |
Thus, the rate of convergence deteriorates with the number of lags.
This feature
is commonly called the `curse of dimensionality' and often viewed as a
substantial drawback of nonparametric methods. One should keep in mind, however,
that the
-rate of parametric models only holds if one estimates a
model with an a priori chosen finite number of parameters which may imply a
large estimation bias in case of misspecified models. If, however, one allows
the number of parameters of parametric models to grow with sample size,
-convergence may no longer hold.
The quantlets
regxestp
and
lregxestp
compute the Nadaraya-Watson
estimator (16.19) and local linear estimator
(16.21) for higher order autoregressions. They are called by
mh = regxestp(x{, h, K, v})
or
mh = lregxestp(x{, h, K, v})
with input variables
- x
- (
matrix of the data with the
lagged variables in the first
columns and
the dependent variable in the last column,
- h
- scalar or
or
vector of bandwidth for which if not given
20% of the range of the values in the first column of x is used,
- K
- string, kernel function on [-1,1] or Gaussian kernel "gau" for which if not given,
the quartic kernel "qua" is used,
- v
-
matrix of values of the independent variable on which to compute the regression for which
if not given, a grid of length 100 (
), length 30 (
) and length 8 (
) is used in case of
.
When
then v is set to x.
The output variable is a
- mh
- (
or
matrix where the first
columns contain the grid or
the sorted first
columns of x, the
column contains the regression estimate on the values
of the first
columns.
As before, there are also quantlets which apply WARPing.
They are called
regestp
and
lregestp
, respectively.
Since we found in Subsection 16.1.3 that a NAR(1) model is not
sufficient to capture the dynamics of the lynx trappings, we compute and plot in
the following quantlet the autoregression function for lag 1 and 2 for both estimators
using the crude bandwidth of 20% of the data range. Note that you
have to click on the graph and rotate it in order to see the regression surface.
library("smoother")
library("plot")
setsize(640,480)
; data preparation
lynx = read("lynx.dat")
lynxrows = rows(lynx)
lag1 = lynx[1:lynxrows-2] ; vector of first lag
lag2 = lynx[2:lynxrows-1] ; vector of second lag
y = lynx[3:lynxrows] ; vector of dep. var.
data = lag1~lag2~y
data = log(data)
; estimation
h = 0.2*(max(data[,1])-min(data[,1])) ; crude bandwidth
mh = regxestp(data,h) ; local constant estimation
mhlp = lregxestp(data,h) ; local constant estimation
; graphics
mhplot = createdisplay(1,1)
mh = setmask(mh,"surface","blue")
show(mhplot,1,1,data,mh) ; surface plot
setgopt(mhplot,1,1,"title",
"Nadaraya-Watson estimate -- ROTATE!")
mhlpplot = createdisplay(1,1)
mhlp = setmask(mhlp,"surface","red")
show(mhlpplot,1,1,data,mhlp) ; surface plot
setgopt(mhlpplot,1,1,"title",
"Local linear estimate -- ROTATE!")
Figures 16.9 and 16.10 show
three-dimensional plots of the observations and the estimated regression
function. In Figure 16.9 one can clearly see the problem of
boundary effects, i.e. in regions where are no or only few data
points the estimated function values may easily become erratic if the
bandwidth is too small. Therefore, a selected bandwidth may be
appropriate for regions with plenty of observations while
inappropriate elsewhere. As can be seen from Figure
16.10, this boundary problem turns out to be
worse for the local linear estimator where one observes a large
outlier for one grid point. Such terrible estimates happen if the
inversion in (16.20) is imprecise due to a too small
bandwidth. One then has to increase the bandwidth. Try the quantlet
XAGflts08.xpl
with replacing in the crude bandwidth choice
the factor 0.2 by 2. Note that increasing the bandwidth makes the
estimated regression surfaces of the two estimators look flat and
closer to linearity, respectively. This, however, can increase the estimation
bias. Therefore, an appropriate bandwidth choice is important. It will
be discussed in the next section.
Figure 16.9:
Observations and Nadaraya-Watson estimate of NAR(2) regression function for the lynx data
|
Figure 16.10:
Observations and local linear estimate of NAR(2) regression function for the lynx data
|
16.2.2 Bandwidth and Lag Selection
- {Bhat, Bhatr, hB, Chat, sumwc, hC, hA} =
hoptest
(xsj, yorig, xtj, estimator, kernel, ntotal, sigy2, perB, lagmax, robden)
- quantlet to compute plug-in bandwidth for multivariate
regression or nonlinear autoregressive processes of higher
order.
- {crmin, crpro} =
cafpe
(y, truedat, xdataln, xdatadif, xdatastand, lagmax, searchmethod, dmax)
- quantlet for local linear lag selection for the conditional mean function based on
the Asymptotic Final Prediction Error (
) or its corrected versions ( )
using default settings.
- {crmin, crpro, crstore, crstoreadd, hstore, hstoretest} =
cafpefull
(y, truedat, xresid, trueres, xdataln, xdatadif, xdatastand, lagmax, volat, searchmethod, dmax, selcrit, robden, perA, perB, startval, noutputf, outpath)
- quantlet for local linear lag selection for the conditional mean or
volatility function based on
the asymptotic final prediction error (AFPE
) or its corrected
version (CAFPE ).
- {mA, gsqA, denA, err} =
fvllc
(Xsj, Yorig, h, Xtj, kernreg, lorq, fandg, loo)
- can estimate the multivariate regression function, first or
second direct derivatives using local linear or
partial local quadratic regression with Gaussian kernel.
|
The example of the previous section showed that the bandwidth choice
is very important for higher order autoregressive models. Equally
important is the selection of the relevant lags.
Both will be
discussed in this section. The presented procedures are based on
Tschernig and Yang (2000).We start with the problem of selecting the relevant lags. For this
step it is necessary to a priori specify a set of possible lag vectors
by choosing the maximal lag
. Denote the full lag vector containing
all lags up to
by
. The lag selection task
is now to eliminate from the full lag vector
all
lags that are redundant. Let us first state the assumptions that
Tschernig and Yang (2000)
require:
- (A1)
- For some
the vector process
is
strictly stationary and
-mixing with
for some
,
.
- (A2)
- The stationary distribution of the process
has a continuous density
,
. Note that
is used for denoting
and all of its marginal densities.
- (A3)
- The function
is twice continuously
differentiable while
is continuous and positive on the
support of
.
- (A4)
- The errors
have a finite fourth
moment
.
- (A5)
- The support of the weight function
is compact with nonempty
interior. The function
is continuous, nonnegative and
for
in the support of
.
- (A6)
- The kernel function
is a symmetric
probability density and the bandwidth
is a positive number with
,
as
.
For the definition of
-mixing see Section 16.1.1 or
Doukhan (1994).
Conditions (A1) and (A2) can be checked using e.g.
Doukhan (1994, Theorem 7 and Remark 7, pp. 102, 103).
Further conditions can be found in Lu (1998).
For comparing the quality of competing lag specifications, one needs an
appropriate measure of fit, as for example the final prediction error
(FPE)
![$\displaystyle FPE_a(h,i_1,\ldots,i_m) = E\left[\left(\breve{Y}_t - \widehat{f}_a(\breve{{X}}_t,h)\right)^2w(\breve{{ X}}_{t,M})\right], \quad a=1,2.$](xaghtmlimg1731.gif) |
(16.24) |
In the definition of the
the process
is
assumed to be independent of the process
but to have the
same stochastic properties. If we now indicate the vector of lagged
values of the data generating process by the superscript
and assume its largest
lag is smaller than the chosen
, we can easily relate the definition
of the FPE (16.24) to the MISE
![$\displaystyle d_{a,M}(h,i_1,\ldots,i_m) = E\left[ \int \left\{f({x}^*) - \widehat{f}_a({x})\right\}^2 w({x}_M) \mu({x}_M) d{ x}_M\right],$](xaghtmlimg1735.gif) |
(16.25) |
which here extends (16.11) to functions with several lags. First
note that
Using
one obtains the
decomposition
 |
(16.26) |
where
denotes the mean variance or final prediction error for the true
function
. Therefore, it follows from
(16.26) that the FPE measures the
sum of the mean variance and the MISE.
In the literature mainly two approaches were suggested for estimating
the unknown
or variants thereof, namely
cross-validation (Vieu; 1994),
(Yao and Tong; 1994) or estimation of an asymptotic expression
of the
(Auestad and Tjøstheim; 1990),
(Tjøstheim and Auestad; 1994),
(Tschernig and Yang; 2000). Given Assumptions (A1) to
(A6),
Tschernig and Yang (2000, Theorem 2.1) showed that
for the local constant estimator,
, and the local linear estimator,
, one has
where
 |
(16.28) |
denotes the asymptotic final prediction error.
The terms
and
denote the expected variance and squared bias of the
estimator, respectively, with the constants
and the variable terms
 |
(16.31) |
with
and
. The
sum of the expected variance and squared bias of the estimator just
represents the asymptotic mean squared error.
Note that if the vector
of correct lags
is included in
,
then
tends to
as both
and
tend
to zero.
From (16.28) it is possible to determine the
asymptotically optimal bandwidth
by minimizing the
asymptotic MISE, i.e. solving the
variance-bias tradeoff between
and
. The asymptotically
optimal bandwidth is given by
 |
(16.32) |
Note that for a finite asymptotically optimal bandwidth to exist one
has to assume that
- (A7)
defined in (16.30) is positive and finite.
This requirement implies that in case of local linear estimation there does not exist a finite
for linear processes. This is because there does not
exist an approximation bias and thus a larger bandwidth has no cost.
In order to obtain the plug-in bandwidth
one has to
estimate the unknown constants
and
. A local linear estimate of
(16.29) is obtained from
where
is the Gaussian kernel estimator
(16.40) of the density
. For
estimating
one may use Silverman's
(1986) rule-of-thumb bandwidth
 |
(16.33) |
with
denoting the geometric mean of the standard deviation of the
regressors.
For the local linear estimator (16.21),
(16.30) can be consistently estimated by
![$\displaystyle \widehat {C}_2({h}_C)= \frac{1}{T}\sum_{t=i_m+1}^T \left[\sum_{j=1}^m \widehat{f}^{(jj)}({X}_t,{h}_C) \right]^2 w({X}_{t,M}),$](xaghtmlimg1772.gif) |
(16.34) |
where
denotes the second direct derivative of the function
. It can be estimated using the partial local quadratic estimator
 |
(16.35) |
The estimates of the direct second derivatives are then given by
,
. Excluding all cross terms has no asymptotic effects
while keeping the increase in the `parameters'
,
linear in the number of lags
. This
approach is a simplification of the partial cubic estimator proposed
by Yang and Tschernig (1999) who also showed that the
rule-of-thumb bandwidth
 |
(16.36) |
has the optimal rate. We note that for the estimation of
of the
Nadaraya-Watson estimator one has additionally to estimate the
derivative of the density as it occurs in (16.23).
Therefore, we exclusively use the local linear estimator
(16.21). The direct second derivatives
can be estimated with the quantlet
tp/capfe/fvllc
.
The plug-in bandwidth
is then given by
 |
(16.37) |
It now turns out that when taking into account the estimation bias of
, the local linear estimator of
(16.28)
becomes
 |
(16.38) |
and the expected squared bias of estimation drops out. In practice,
is replaced by the plug-in bandwidth
(16.37). Note that one can interpret the second term in
(16.38) as a penalty term to punish overfitting or
choosing superfluous lags. This penalty term decreases with sample
size as
is of order
. The final prediction
error for the true function
(16.27) is estimated by
taking the sample average
of the residuals from the local linear estimator
. The asymptotic properties of the
lag selection method rely on the fact that the argument of
is the full lag vector
.
In order to select the adequate lag vector, one computes
(16.38) for all possible lag combinations with
and chooses the lag vector with the smallest
. Given
Assumptions (A1) to (A7) and a further technical condition,
Tschernig and Yang (2000, Theorem 3.2)
showed that this procedure is weakly consistent, i.e. the probability of choosing the correct lag vector if it is included
in the set of lags considered approaches one with increasing sample
size. This consistency result may look surprising since the linear FPE
is known to be inconsistent. However, in the present case the rate of
the penalty term in (16.38) depends on the number of
lags
. Thus, if one includes
lags in addition to
correct
ones, the rate of the penalty term becomes slower which implies that
too large models are ruled out asymptotically. Note that this feature
is intrinsic to the local estimation approach since the number of lags
influence the rate of convergence, see (16.22). We remark that the consistency
result breaks down if Assumption (A7) is violated e.g. if the
stochastic process is linear. In this case overfitting (including
superfluous lags in addition to the correct ones) is more likely. The
breakdown of consistency can be avoided if one uses the
Nadaraya-Watson instead of the local linear estimator since the former
is also biased in case of linear processes.
Furthermore, Tschernig and Yang (2000) show that
asymptotically it is more likely to overfit
than to underfit (miss some correct lags). In order to reduce
overfitting and therefore increase correct fitting, they suggest to
correct the AFPE and estimate the Corrected Asymptotic FPE
 |
(16.39) |
The correction does not affect consistency under the stated
assumptions while additional lags are punished more heavily in finite
samples. One chooses the lag vector with the smallest
,
.
We note that if one allows the maximal lag
to grow with sample
size, then one has a doubled nonparametric problem of nonparametric
function estimation and nonparametric lag selection.
The nonparametric lag selection criterion
can be computed
using the quantlet
tp/cafpe/cafpe
. The quantlet
tp/cafpe/cafpefull
also allows to use
. Both are part
of the third party quantlib
tp/cafpe/cafpe
which contains various quantlets for lag and bandwidth
selection for nonlinear autoregressive models (16.16).
The quantlet
tp/cafpe/cafpe
is called as
{crmin, crpro} = cafpe(y, truedat, xdataln, xdatadif,
xdatastand, lagmax, searchmethod, dmax)
with the input variables:
- y
-
matrix of the observed time series or set to zero if truedat
is used,
- truedat
- character variable that contains path and name of ascii data file if
y=0,
- xdataln
- character variable where "yes" takes natural logs, "no" doesn't,
- xdatadif
- character variable where the value "yes" takes first differences of data, "no" doesn't,
- xdatastand
- character variable where "yes" standardizes data, "no"
doesn't,
- lagmax
- scalar variable, largest lag to be considered,
- searchmethod
- character variable where "full" considers all possible lag combinations,
"directed" does directed search (recommended if lagmax
),
- dmax
- scalar variable with maximum number of possible lags,
and output variables
- crmin
- (dmax+1)
vector that stores for all considered lag combinations in the first dmax columns the selected lag vector, in the dmax+1 column the estimated
,
in the dmax+2 column
,
in the dmax+3 column the bias corrected estimate of
, see TY (equation
3.3),
- crpro
- (dmax+1)
(dmax+6) matrix that stores for each number of lags
in the first dmax
colunms the selected lag vector, in the dmax+1 column the
plug-in bandwidth
for estimating the
final prediction error for the true function
and
, in the dmax+2 column the bandwidth
for estimating the constant
which is used for computing
and the
plug-in bandwidth
, in the dmax+3 column the bandwidth
for
estimating the constant
which is used for computing the plug-in bandwidth
, in the dmax+4 column the estimated
, in
the dmax+5 column
, in the dmax+6 column the
bias corrected estimate of
, see TY (equation 3.3).
Some comments may be appropriate. The weight function
is the
indicator function on the range of the observed data. If
is large or the
time series is long, then conducting a full search over all possible lag
combinations may take extraordinarily long. In this case, one should use the
directed search suggested by Tjøstheim and Auestad (1994): lags are
added as long as they reduce the selection criterion and one adds that lag from
the remaining ones which delivers the largest reduction.
For computing
TY follow
Tjøstheim and Auestad (1994) and implement two additional
features for robustification. For estimating
the kernel estimator
 |
(16.40) |
is used where the vectors
,
are all available from the observations
,
. For
example,
is given by
. This robustification is switched off if the
sum stops at
. Furthermore, 5% of those observations whose density
values
are the lowest, are screened off.
These features can be easily switched off or modified in the quantlet
tp/cafpe/cafpefull
. This quantlet also allows to select the lags of
the conditional standard deviation
and is therefore
discussed in detail in Subsection 16.2.4.
If one is only interested in computing the plug-in bandwidth
, then one can directly use the quantlet
tp/cafpe/hoptest
. However, before it can be called it requires to
prepare the time series accordingly so that it is easier to run the
lag selection which automatically delivers the plug-in bandwidth for
the chosen lag vector as well. For the definition of its variables the
reader is referred to the helpfile of
tp/cafpe/hoptest
.
We are now ready to run the quantlet
tp/cafpe/cafpe
on the
lynx data set. The following quantlet conducts a full search among the
first six lags
pathcafpe = "tp/cafpe/" ; path of CAFPE quantlets
; load required quantlibs
library("xplore")
library("times")
func(pathcafpe + "cafpeload") ; load XploRe files of CAFPE
cafpeload(pathcafpe)
setenv("outheadline","") ; no header for each output file
setenv("outlineno","") ; no numbering of output lines
; set parameters
truedat = "lynx.dat" ; name of data file
y = 0
xdataln = "yes"; ; take logarithms
xdatadif = "no"; ; don't take first differences
xdatastand = "no"; ; don't standardize data
lagmax = 6 ; the largest lag considered is 6
searchmethod = "full" ; consider all possible lag comb.
dmax = 6 ; consider at most 6 lags
; conduct lag selection
{ crmin,crpro } = cafpe(y,truedat,xdataln,xdatadif,xdatastand,
lagmax,searchmethod,dmax)
"selected lag vector, estimated CAFPE "
crmin[,1:dmax+1]
"number of lags, chosen lag vector, estimated CAFPE,
plug-in bandwidth"
(0:dmax)~crpro[,1:dmax|(dmax+4)|(dmax+1)]
A screenshot of the output which shows the criteria for all other number of
lags is contained in Figure 16.11. The selected lags are 1 to
4 with plug-in bandwidth
and
. However,
the largest decrease in
occurs if one allows for two lags instead of
one and lag 2 is added. In this case,
drops from 0.64125 to 0.24936.
Therefore lag 2 seems to capture the autocorrelation in the residuals of the
NAR(1) model which was estimated in Subsections 16.1.1 to
16.1.3. For this reason a NAR(2) model could be sufficient for the
lynx data. Its graphical representation is discussed in the next section.
Figure 16.11:
Results of the lag selection procedure using
for lynx data
|
16.2.3 Plotting and Diagnostics
- {hplugin, hB, hC, xs, resid} =
plotloclin
(xdata, xresid, xdataln, xdatadif, xdatastand, volat, lags, h, xsconst, gridnum, gridmax, gridmin)
- computes 1- or 2-dimensional plot of regression function of a
nonlinear autoregressive process for a given lag vector on the range
of the data; if more than 2 lags are used, then only two lags are
allowed to vary, the others have to be fixed
|
Once the relevant lags and an appropriate bandwidth are determined,
one would like to have a closer look at the implied conditional mean
function as well as checking the residuals for potential model
misspecification as discussed in Subsection 16.1.3. The
latter may be done by inspecting the autocorrelation function and
testing the normality of the residuals. The quantlet
tp/cafpe/plotloclin
of the quantlib
tp/cafpe/cafpe
allows to do
both. It generates two- or three-dimensional plots of the autoregression
function on a grid that covers the range of data and computes the residuals for
the given time series. Both is done either with a bandwidth specified by the
user or the plug-in bandwidth
which is automatically computed
if required. The quantlet
tp/capfe/plotloclin
also allows to compute three-dimensional plots of
functions with more than two lags by keeping
lags fixed at user-selected
values. It is called by
{hplugin,hB,hC,xs,resid} = plotloclin(xdata,xresid,xdataln,
xdatadif,xdatastand,volat,lags,h,xsconst,
gridnum,gridmax,gridmin)
with the input variables
- xdata
-
vector of the observed time series
- xresid
-
vector of residuals or observations
for plotting conditional volatility function,
if not needed set xresid = 0,
- xdataln
- character variable, "yes" takes natural logs, "no"
doesn't,
- xdatadif
- character variable, "yes" takes first differences
of data, "no" doesn't,
- xdatastand
- character variable, "yes" standardizes data, "no"
doesn't,
- volat
- character variable, "no" plots
conditional mean function, "resid" plots
conditional volatility function, the residuals of
fitting a conditional mean function have to be contained in
xresid,
- lags
-
vector of lags,
- h
- scalar bandwidth for which if set to zero a scalar plug-in bandwidth
using hoptest is computed or a
vector bandwidth
- xsconst
-
vector (only needed if
) indicates which lags vary and which
are kept fixed for those keeping fixed, the entry
in the correponding row contains the value at which it is fixed for those
to be varied, the entry in the corresponding row
is 1e-100,
- gridnum
- scalar, number of grid points in one direction,
- gridmax
- scalar, maximum of grid,
- gridmin
- scalar, minimum of grid,
and output variables
- hplugin
- scalar plug-in bandwidth
(16.37)
or chosen scalar or vector bandwidth,
- hB
- scalar, rule-of-thumb bandwidth (16.33) for nonparametrically estimating
the constant
in
and for computing the plug-in
bandwidth,
- hC
- scalar, rule-of-thumb bandwidth (16.36) for nonparametrically estimating the constant
for computing the plug-in bandwidth,
- xs
-
matrix with lagged values of time series which are used
to compute plug-in bandwidth and residuals for potential
diagnostics,
- resid
-
vector with residuals after fitting a local linear regression at xs.
Figure 16.12 shows the plot of the conditional
mean function for an NAR(2) model of the lynx data on a grid covering
all observations. The autocorrelation function of the residuals is
shown in Figure 16.13. These graphs and a
plot of the standardized residuals are computed with the following
quantlet. It also returns the Jarcque-Bera test statistic of 2.31 with
-value of 0.32.
pathcafpe = "tp/cafpe/" ; path of CAFPE quantlets
; load required quantlibs
library("xplore")
library("times")
func("jarber")
func(pathcafpe + "cafpeload"); load XploRe files of CAFPE
cafpeload(pathcafpe)
setenv("outheadline","") ; no header for each output file
setenv("outlineno","") ; no numbering of output lines
; set parameters
lynx = read("lynx.dat");
xresid = 0
xdataln = "yes"; ; take logarithms
xdatadif = "no"; ; don't take first differences
xdatastand = "no"; ; don't standardize data
lags = 1|2 ; lag vector for regression function
h = 0
xsconst = 1e-100|1e-100 ; 1e-100 for the lags which are
; varied for those kept fixed it
; includes the chosen constant
gridnum = 30 ; number of gridpoints in one dir.
gridmax = 9 ; maximum of grid
gridmin = 4 ; minimum of grid
; compute opt. bandwidth and plot regression fct. for given lags
{ hplugin,hB,hC,xs,resid } = plotloclin(lynx,xresid,xdataln,
xdatadif,xdatastand,volat,lags,h,
xsconst,gridnum,gridmax,gridmin)
"plug-in bandwidth" hplugin
; diagnostics
acfplot(resid) ; compute and plot acf of residuals
{jb,probjb,sk,k} = jarber(resid,1)
; compute Jarque-Bera test for normality of residuals
From inspecting Figure 16.13 one can
conclude that a NAR(2) model captures most of the linear correlation
structure. However, the autocorrelation at lags 3 and 4 is close to
the boundaries of the confidence intervals of white noise and explains
why the CAFPE procedure suggests lags one to four. The regression
surface in Figure 16.12 nicely shows the
nonlinearity in the conditional mean function which may be difficult
to capture with standard parametric nonlinear models.
Figure 16.12:
Plot of the conditional mean function of a NAR(2) model for the logged lynx data
|
Figure 16.13:
Plot of the autocorrelation function of the residuals of a NAR(2) model for the logged lynx data
|
16.2.4 Estimation of the Conditional Volatility
So far we have considered the estimation and lag selection for the
conditional mean function
. Finally, we turn our
attention to modelling the function of the conditional standard
deviation
. The conditional standard deviation
plays an important role in financial modelling, e.g. for computing
option prices. As an example we consider 300 logged observations
dmus58-300
of a 20 minutes spaced sample of the
Deutschemark/US-Dollar exchange rate. Figures 16.14
and 16.15 display the logged observations and its
first differences. The figures are generated with the quantlet
library("plot")
library("times")
setsize(640,480)
fx = read("dmus58-300.dat"); read data
d1 = createdisplay(1,1)
x1 = #(1:300)~fx
setmaskl (x1, (1:rows(x1))', 0, 1)
show(d1,1,1,x1) ; plot data
setgopt(d1,1,1,"title",
"20 min. spaced sample of DM/US-Dollar rate")
setgopt(d1,1,1,"xlabel","Periods","ylabel","levels")
d2 = createdisplay(1,1)
x2 = #(2:300)~tdiff(fx)
setmaskl (x2, (1:rows(x2))', 0, 1)
show(d2,1,1,x2) ; plot data
setgopt(d2,1,1,"title","20 min. spaced sample of
DM/US-Dollar rate - first differences")
setgopt(d2,1,1,"xlabel","Periods","ylabel","first differences")
Figure 16.14:
Time series of logarithm of 20 minutes spaced sample of DM/US-Dollar rate
|
Figure 16.15:
Time series of 20 minutes spaced sample of exchange rate returns
|
In the following we assume that the conditional mean function
is known and subtracted from
. Thus, we obtain
. After squaring (16.16) and
rearranging we have
 |
(16.41) |
Since
has expectation zero,
the stochastic process (16.41) can be modelled with the
methods described in Subsections 16.2.1 and
16.2.2 by simply replacing the dependent variable
by its squares. However, we have to remark that the existence of the
expectation
is a necessary condition for applying
. Otherwise, the FPE cannot be finite. We note that if
has to be estimated, the asymptotic properties of
are expected to remain the same. Therefore, it may be used
in practice, however, after replacing
by the residuals
. This is possible with the
quantlet
tp/capfe/cafpefull
which extends the functionality of the
quantlet
tp/cafpe/cafpe
and allows the user to change
additional tuning parameters. The quantlet
tp/cafpe/cafpefull
is called by
{crmin,crpro,crstore,crstoreadd,hstore,hstoretest} =
cafpefull(y,truedat,xresid,trueres,xdataln,xdatadif,xdatastand,
lagmax,volat,searchmethod,dmax,selcrit,robden,perA,
perB,startval,noutputf,outpath)
and has input variables
- y
-
vector of univariate time series,
- truedat
- character variable that contains path and name of
ascii data file if y=0,
- xresid
-
vector of residuals or observations
for selecting lags of conditional volatility function,
if not needed set xresid = 0,
- trueres
- character variable, "yes" takes natural logs, "no"
doesn't,
- xdatadif
- character variable, "yes" takes first differences of data,
"no" doesn't,
- xdatastand
- character variable, "yes" standardizes data, "no"
doesn't,
- lagmax
- scalar, largest lag to be considered,
- volat
- character variable, "no" conducts lag selection for
conditional mean function, "resid" conducts lag selection
for conditional volatility function, the residuals of
fitting a conditional mean function have to be contained in
xresid or a file name has to be given in trueres,
- searchmethod
- character variable for determining search method, "full" conducts full search over all possible input variable combinations,
"directed" does directed search,
- dmax
- scalar, maximal number of lags
- selcrit
- character variable to select lag selection critierion,
"lqafpe" estimates the asymptotic
Final Prediction Error
(16.38) using local linear estimation and
the plug-in bandwidth
(16.37),
"lqcafpe" estimates the corrected asymptotic Final Prediction
Error
(16.39) using local linear estimation and the
plug-in bandwidth
(16.37)
- robden
- character variable, "yes" and "no" switch on and off
robustification in density estimation
(16.40),
- perA
- scalar, parameter used for screening off a fraction
of
0
perA
1 observations with the lowest density
in computing
- perB
- scalar, parameter like perA but for screening off a fraction of
perB
observations with lowest density in computing
,
- startval
- character variable to control treatment of starting
values, "different" uses for each lag vector as few starting
values as necessary, "same" uses for each lag vector
the same starting value which is determined by the largest
lag used in the lag selection quantlet
tp/cafpe/xorigxe
,
- noutputf
- character variable, name of output file,
- outpath
- character variable, path for output file.
The output variables are
- crmin
- vector that stores for all considered lag combinations in the first
dmax rows the selected lag vector, in the dmax+1 row the estimated
criterion, in the dmax+2 row
, in
the dmax+3 row the bias corrected estimate of
,
- crpro
- matrix that stores for each number of lags in the first dmax rows
the selected lag vector, in the dmax+1 row the plug-in bandwidth
for estimating
and
,
in the dmax+2 row the bandwidth
used for
estimating
,
in the dmax+3 row the bandwidth
for estimating
,
in the dmax+4 row the estimated criterion
or
,
in the dmax+5 row
,
in the dmax+6 row the bias corrected estimate of
,
- crstore
- matrix that stores lag vector and criterion value for all lag
combinations and bandwidth values considered, in the first dmax rows
all considered lag vector are stored, in the dmax+1 rows the estimated
criterion for each lag vector is stored,
- crstoreadd
- matrix that stores those criteria that are evaluated in passing for
all lag combinations where all values for one lag combination
are stored in one column (see program for details),
- hstore
- row vector that stores the bandwidths used in computing (C)AFPE
for each lag vector
- hstoretest
- matrix that stores for each lag vector in one column the
plug-in bandwidth
,
and
.
The quantlet
XAGflts12.xpl
(for brevity not shown) conducts a lag
selection for the conditional mean function
and
finds lag 1 and 3 with bandwidth
.
If you run the quantlet, you will obtain the
XploRe
warning
``quantlet fvllc: inversion in local linear estimator did not work
because probably the bandwidth is too small''. This means that
for one of the checked combinations of lags, one of the rule-of-thumb
bandwidths or the plug-in bandwidth was too small so that the
matrix
in the local linear estimator (16.21)
is near singular and the matrix inversion failed. In this case, the
relevant bandwidth is doubled (at most 30 times) until the near
singularity disappears.
Therefore, lag selection for the conditional volatility function
is done with replacing the observations
in model (16.41) by the estimated residuals
. The computations are carried out with the
following quantlet which also generates a plot of the conditional mean
function on the range
displayed in Figure
16.16 and plots the autocorrelation function of
the residuals (not shown). The latter plot does not show significant
autocorrelation.
pathcafpe = "tp/cafpe/" ; path of CAFPE quantlets
; load required quantlibs
library("xplore")
library("times")
func("jarber")
func(pathcafpe + "cafpeload") ;load XploRe files of CAFPE
cafpeload(pathcafpe)
; set output format
setenv("outheadline","") ; no header for each output file
setenv("outlineno","") ; no numbering of output lines
; load data
x = read("dmus58-300.dat") ; name of data file
y = tdiff(x) ; compute first differences
xresid = 0
truedat = "" ; name of potential data file
trueres = "" ; name of potential residuals file
xdataln = "no" ; don't take logarithms
xdatadif = "no" ; don't take first differences
xdatastand = "no" ; don't standardize data
lagmax = 6 ; the largest lag considered is 6
searchmethod = "full" ; consider all possible lag comb.
dmax = 6 ; consider at most 6 lags
volat = "no" ; plot cond. mean function
selcrit = "lqcafpe" ; use CAFPE with plug-in bandwidth
robden = "yes" ; robustify density estimation
perA = 0
perB = 0.05 ; screen off data with lowest density
startval = "different"
noutputf = "" ; name of output file
outpath = "test" ; path for output file
lags = 1|3 ; lag vector for regression function
h = 0
xsconst = 1e-100|1e-100 ; 1e-100 for the lags which are
; varied for those kept fixed it
; includes the chosen constant
gridnum = 30 ; number of gridpoints in one direction
gridmax = 0.0015 ; maximum of grid
gridmin = -0.0015 ; minimum of grid
; compute optimal bandwidth and plot cond. mean for given lags
{ hplugin,hB,hC,xs,resid } = plotloclin(y,xresid,xdataln,
xdatadif,xdatastand,volat,lags,h,xsconst,gridnum,
gridmax,gridmin)
"plug-in bandwidth for conditional mean" hplugin
; diagnostics
acfplot(resid); compute and plot acf of residuals
{jb,probjb,sk,k} = jarber(resid,1)
; compute Jarque-Bera test for normality of residuals
; conduct lag selection for cond. standard deviation
xresid = resid
volat = "resid" ; conduct lat selection for cond. vol.
{crmin,crpro,crstore,crstoreadd,hstore,hstoretest}
= cafpefull(y,truedat,xresid,trueres,xdataln,
xdatadif,xdatastand,lagmax,volat,
searchmethod,dmax,selcrit,robden,
perA,perB,startval,noutputf,outpath)
"Lag selection for cond. standard deviation using residuals"
"selected lag vector, estimated CAFPE "
crmin[,1:dmax+1]
"number of lags, chosen lag vector, estimated CAFPE,
plug-in bandwidth"
(0:dmax)~crpro[,1:dmax|(dmax+4)|(dmax+1)]
For the conditional standard deviation one obtains lags 2 and 6 with
bandwidth
. Figures
16.17, 16.18
and 16.19 display the plot of the
estimated conditional standard deviation
, of the standardized residuals of
the modified model (16.41) and of their autocorrelation. The
plots are generated with the following quantlet
pathcafpe = "tp/cafpe/" ; path of CAFPE quantlets
; load required quantlets
library("xplore")
library("times")
func("jarber")
func(pathcafpe + "cafpeload"); load XploRe files of CAFPE
cafpeload(pathcafpe)
setenv("outheadline","") ; no header for each output file
setenv("outlineno","") ; no numbering of output lines
; set parameters
x = read("dmus58-300.dat");
y = tdiff(x)
xresid = 0
xdataln = "no" ; don't take logarithms
xdatadif = "no" ; don't take first differences
xdatastand= "no" ; don't standardize data
volat = "no" ; compute cond. standard deviation
lags = 1|3 ; lag vector for regression function
h = 0 ; compute plug-in bandwidths
xsconst = 1e-100|1e-100
; 1e-100 for the lags which are varied
; for those kept fixed it includes the
; chosen constant
gridnum = 30 ; number of gridpoints in one direction
gridmax = 0.0015 ; maximum of grid
gridmin = -0.0015 ; minimum of grid
; compute optimal bandwidth and plot cond. mean for given lags
{ hplugin,hB,hC,xs,resid } = plotloclin(y,xresid,xdataln,
xdatadif,xdatastand,volat,lags,h,xsconst,gridnum,
gridmax,gridmin)
"plug-in bandwidth for mean" hplugin
; compute plug-in bandwidth and
; plot cond. standard deviation for given lags
lags = 2|6 ; lags for cond. volatility
xresid = resid
volat = "resid"
gridmax = 0.0008 ; maximum of grid
gridmin = -0.0008 ; minimum of grid
{ hplugin,hB,hC,xs,resid } = plotloclin(y,xresid,xdataln,
xdatadif,xdatastand,volat,lags,h,xsconst,gridnum,
gridmax,gridmin)
"plug-in bandwidth for conditional volatility" hplugin
; diagnostics
acfplot(resid); compute and plot acf of residuals
{jb,probjb,sk,k} = jarber(resid,1)
; compute Jarque-Bera test for normality of residuals
The surface plot of the conditional standard deviation is computed on
the range
in order to avoid boundary effects.
Inspecting the range of the standardized residuals in Figure
16.18 indicates that the analysis may be
strongly influenced by outliers which also may explain the extreme
increase of the conditional standard deviation in Figure
16.17 in one corner. Moreover, Figure
16.19 shows some significant
autocorrelation in the residuals. One explanation for this finding
could be the presence of long memory in the squared observations. This
topic is treated in detail in Chapter 14. Therefore, one
should continue to improve the current function estimates by excluding
extreme observations and using models that allow for many lags in the
function of the conditional standard deviation such as, for example,
Yang, Härdle and Nielsen (1999).
Figure 16.16:
Plot of the conditional mean function of a NAR model with lags 1 and
3 for the returns of the Deutschemark/US-Dollar exchange rate
|
Figure 16.17:
Plot of the conditional standard deviation of a NAR model with lags 2 and
6 for the returns of the Deutschemark/US-Dollar exchange rate
|
Figure 16.18:
Plot of the standardized residuals of the modified
model (16.41)
|
Figure 16.19:
Plot of the autocorrelation function of residuals of the modified
model (16.41)
|