16.1 Nonlinear Autoregressive Models of Order One


16.1.1 Estimation of the Conditional Mean


mh = 26135 regxest (x{, h, K, v})
computes the univariate conditional mean function using the Nadaraya-Watson estimator
mh = 26138 regest (x{, h, K, v})
computes the univariate conditional mean function using the Nadaraya-Watson estimator and WARPing
mh = 26141 lpregxest (x{, h, p, v})
computes the univariate conditional mean function using local polynomial estimation
mh = 26144 lpregest (x{, h, p, K, d})
computes the univariate conditional mean function using local polynomial estimation and WARPing

Let us turn to estimating the conditional mean function $ f(\cdot)$ of a nonlinear autoregressive processes of order one (NAR(1) process)

$\displaystyle Y_t = f(Y_{t-1}) + \sigma(Y_{t-1})\xi_t$ (16.2)

using nonparametric techniques. The basic idea is to estimate a Taylor approximation of order $ p$ of the unknown function $ f(\cdot)$ around a given point $ y$. The simplest Taylor approximation is obtained if its order $ p$ is chosen to be zero. One then approximates the unknown function by a constant. Of course, this approximation may turn out to be very bad if one includes observations $ Y_{t-1}$ that are distant to $ y$ since this might introduce a large approximation bias. One therefore weights those observations less in the estimation. Using the least squares principle, the estimated function value $ \widehat{f}(y,h)$ is provided by the estimated constant $ \widehat{c}_0$ of a local constant estimate around $ y$

$\displaystyle \widehat{c}_{0}=\textrm{arg min}_{\left\{ c_{0}\right\} } \sum_{t=2}^{T}\left\{ Y_{t}-c_{0}\right\} ^{2}K_{h}({Y}_{t-1}-{y}),$ (16.3)

where $ K$ denotes the weighting function, which is commonly called a kernel function, and $ K_{h}({Y}_{t-1}-{y})=h^{-1} K\left\{
(Y_{t-1}-y)/h\right\} $. A number of kernel functions are used in practice, e.g. the Gaussian density function or the quartic kernel $ K(u) = 15/16(1-u^2)^2$ on the range $ [-1,1]$ and $ K(u)=0$ elsewhere. $ \widehat{f}({y},h)=\widehat c_0$ is known as the Nadaraya-Watson or local constant function estimator and can be written as

$\displaystyle \widehat{f}(y,h) = \frac{\sum_{t=2}^T K_h(Y_{t-1}-y) Y_t} {\sum_{t=2}^T K_h(Y_{t-1}-y)}.$ (16.4)

The parameter $ h$ is called bandwidth parameter and controls the weighting of the lagged variables $ Y_{t-1}$ with respect to their distance to $ y$. While choosing $ h$ too small and therefore including only few observations in the estimation procedure leads to a too large estimation variance, taking $ h$ too large implies a too large approximation bias. Methods for bandwidth selection are presented in Subsection 16.1.2.

Before one applies Nadaraya-Watson estimation one should be aware of the conditions that the underlying data generating mechanism has to fulfil such that the estimator has nice asymptotic properties: most importantly, the function $ f(\cdot)$ has to be continuous, the stochastic process has to be stationary and the dependence among the observations must decline fast enough if the distance among the observations increases. For measuring dependence in nonlinear time series one commonly uses various mixing concepts. For example, a sequence is said to be $ \alpha$-mixing (strong mixing) (Robinson; 1983) if

$\displaystyle \sup_{A\in {\cal F}_1^n, B\in {\cal F}_{n+k}^\infty} \vert P(A\cap B)-P(A)P(B)\vert
\leq \alpha_k,
$

where $ \alpha_k\rightarrow 0$ and $ {\cal F}_i^j$ is the $ \sigma$-field generated by $ X_i,\dots, X_j$. An alternative and stronger condition is given by the $ \beta$-mixing condition (absolute regularity)

$\displaystyle E\sup \left\{ \left\vert P(B\vert A)-P(B)\right\vert\right\} \leq \beta (k)
$

for any $ A\in {\cal F}_1^n$ and $ B\in {\cal F}_{n+k}^\infty $. An even stronger condition is the $ \phi$-mixing (uniformly mixing) condition (Billingsley; 1968) where

$\displaystyle \vert P(A\cap B)-P(A)P(B)\vert\leq \phi_kP(A)
$

for any $ A\in {\cal F}_1^n$ and $ B\in {\cal F}_{n+k}^\infty $ and $ \phi_k$ tends to zero for $ k \rightarrow \infty$. The rate at which $ \alpha_k$, $ \beta_k$ or $ \phi_k$ go to zero plays an important role in showing asymptotic properties of the nonparametric smoothing procedures. We note that these conditions are in general difficult to check. However, if the process follows a stationary Markov chain, then geometric ergodicity implies absolute regularity, which in turn implies strong mixing conditions. Techniques exist for checking geometric ergodicity, see e.g. Doukhan (1994) or Lu (1998). Further and more detailed conditions will be discussed in Subsection 16.2.2.

The quantlet 26148 regxest allows to compute Nadarya-Watson estimates of $ f(\cdot)$ for an array of different $ y$'s. Its syntax is

  mh = regxest(x{, h, K, v})
with the input variables
x
$ (T-1) \times 2$ matrix, in the first column the independent, in the second column the dependent variable,
h
scalar, bandwidth for which if not given, 20% of the range of the values in the first column of x is used,
K
string, kernel function on [-1,1] or Gaussian kernel "gau" for which if not given, the Quartic kernel "qua" is used,
v
$ m \times 1$ vector of values of the independent variable on which to compute the regression for which if not given, x is used.
This quantlet returns a $ (T-1) \times 2$ or $ m \times 2$ matrix mh, where the first column is the sorted first column of x or the sorted v, the second column contains the regression estimate on the values of the first column.

In order to illustrate the methods presented in this chapter, we model the dynamics underlying the famous annual Canadian lynx trappings in 1821-1934, see e.g. Brockwell andDavis (1991, Appendix, Series G). Figures 16.1 and 16.2 of their original and logged time series are obtained with the quantlet

  library("plot")
  setsize(640,480)
  lynx        = read("lynx.dat")  ; read data
  d1          = createdisplay(1,1)
  x1          = #(1821:1934)~lynx
  setmaskl (x1, (1:rows(x1))', 0, 1)
  show(d1,1,1,x1)                 ; plot data
  setgopt(d1,1,1,"title","Annual Canadian Lynx
                                  Trappings, 1821-1934")
  setgopt(d1,1,1,"xlabel","Years","ylabel","Lynx")
  d2          = createdisplay(1,1)
  x2          = #(1821:1934)~log(lynx)
  setmaskl (x2, (1:rows(x2))', 0, 1)
  show(d2,1,1,x2)                 ; plot data
  setgopt(d2,1,1,"title","Logs of Annual Canadian
                                  Lynx Trappings, 1821-1934")
  setgopt(d2,1,1,"xlabel","Years","ylabel","Lynx")
26152 XAGflts01.xpl

Their inspection indicates that taking logarihms is required to make the time series look stationary.

Figure 16.1: Time series of annual Canadian Lynx Trappings, 1821-1934
\includegraphics[scale=.55]{plotlynx}

Figure 16.2: Time series of logarithm of annual Canadian Lynx Trappings, 1821-1934
\includegraphics[scale=.55]{plotloglynx}

The following quantlet reads the lynx data set, constructs the vectors of the dependent and lagged variables, computes the Nadaraya-Watson estimator and plots the resulting function including the scatter plot which is displayed in Figure 16.3. For selecting the bandwidth we use here the primitive rule to take one fifth of the data range.
  library("smoother")
  library("plot")
  setsize(640,480)
;                       data preparation
  lynx      = read("lynx.dat")
  lynxrows  = rows(lynx)
  lag1      = lynx[1:lynxrows-1]    ; vector of first lag
  y         = lynx[2:lynxrows]      ; vector of dep. var.
  data      = lag1~y
  data      = log(data)
;                       estimation
  h         = 0.2*(max(data[,1])-min(data[,1])); crude bandwidth
  "Bandwidth used" h
  mh        = regxest(data,h)      ; N-W estimation
;                       graphics
  mh        = setmask(mh,"line","blue")
  xy        = setmask(data,"cross","small")
  plot(xy,mh)
  setgopt(plotdisplay,1,1,"title","Estimated NAR(1) 
                                                 mean function")
  setgopt(plotdisplay,1,1,"xlabel","First Lag","ylabel","Lynx")
26162 XAGflts02.xpl

Figure 16.3: Nadaraya-Watson estimates of NAR(1) mean function for lynx data and scatter plot
\includegraphics[scale=.55]{lynx1}

For long time series the computation of the Nadaraya-Watson estimates may become quite slow since there are more points at which to estimate the function and each estimation involves more data. In this case one may use the WARPing, weighted average of rounded points, technique. The basic idea is the ``binning'' of the data in bins of length $ d$. Each observation is then replaced by the bincenter of the corresponding bin which means that each point is rounded to the precision given by $ d$. A typical choice for $ d$ is $ h/5$ or $ (\max
Y_{t-1}-\min Y_{t-1})/100$. In the latter case, the effective sample size $ r$, i.e. the number of nonempty bins, for computation is at most 101. If WARPing is necessary, just call the quantlet 26168 regest which has the same parameters as the quantlet 26171 regxest .

While the Nadaraya-Watson function estimate is simple to compute it may suffer from a substantial estimation bias due to the zero order Taylor expansion. Therefore, it seems natural to increase the order $ p$ of the expansion. For example, by selecting $ p = 1$ one obtains the local linear estimator which corresponds to the following weighted minimiziation problem

$\displaystyle \{\widehat{c}_{0},\widehat{c}_1\}=\textrm{arg min}_{\left\{ c_{0}...
...=2}^{T}\left\{ Y_{t}-c_{0}-c_1({Y}_{t-1}-{y})\right\} ^{2}K_{h}({Y}_{t-1}-{y}),$ (16.5)

where the estimated function value $ \widehat{f}_2(y,h)$ is provided as before by the estimated constant $ \widehat{c}_0$. In a similar way one obtains the local quadratic estimator if one chooses $ p=2$. The quantlet 26175 lpregxest allows to compute local linear or local quadratic function estimates using the quartic kernel. Its syntax is
  y = lpregxest (x,h {,p {,v}})
where the inputs are:
x
$ (T-1) \times 2$ matrix, in the first column the independent, in the second column the dependent variable,
h
scalar, bandwidth for which if not given, the rule-of-thumb bandwidth computed by the quantlet lpregrot is used,
p
integer, order of polynomial: p=0 yields the Nadaraya-Watson estimator, p=1 yields local linear estimation (which is default), p=2 (local quadratic) is the highest possible order,
v
$ m \times 1$, values of the independent variable on which to compute the regression for which if not given, x is used.
The output is given by the
mh
$ (T-1) \times 2$ or $ m \times 2$ matrix, the first column is the sorted first column of x or the sorted v, the second column contains the regression estimate on the values of the first column.

The following quantlet allows to visualize the difference between local constant and local linear estimation of the first order nonlinear autoregressive mean function for the lynx data. It produces Figure 16.4 where the solid and dotted lines display the local linear and local constant estimates, respectively. One notices that the local linear function estimate shows less variation.

  library("smoother")
  library("plot")
  setsize(640,480)
;                       data preparation
  lynx      = read("lynx.dat")
  lynxrows  = rows(lynx)
  lag1      = lynx[1:lynxrows-1]    ; vector of first lag
  y         = lynx[2:lynxrows]      ; vector of dep. var.
  data      = lag1~y
  data      = log(data)
;                       estimation
  h         = 0.2*(max(data[,1])-min(data[,1])); crude bandwidth
  mh        = regxest(data,h)       ; N-W estimation
  mhlp      = lpregxest(data,h)     ; local linear estimation
;                       graphics
  mh        = setmask(mh,"line","blue","dashed")
  mhlp      = setmask(mhlp,"line","red")
  xy        = setmask(data,"cross","small")
  plot(xy,mh,mhlp)
  setgopt(plotdisplay,1,1,"title","Estimated NAR(1) 
                                                mean function")
  setgopt(plotdisplay,1,1,"xlabel","First Lag","ylabel","Lynx")

26179 XAGflts03.xpl

Figure 16.4: Local linear estimates (solid line) and Nadaraya-Watson estimates (dotted line) of NAR(1) mean function for lynx data and scatter plot
\includegraphics[scale=.55]{lynx2}

Like Nadaraya-Watson estimation local linear estimation may become slow for long time series. In this case, one may use the quantlet 26185 lpregest which uses the WARPing technique.


16.1.2 Bandwidth Selection


{hcrit, crit} = 26574 regxbwsel (x{, h, K})
interactive tool for bandwidth selection in univariate kernel regression estimation.
{hcrit, crit} = 26577 regbwsel (x{, h, K, d})
interactive tool for bandwidth selection in univariate kernel regression estimation using the WARPing method.

So far we have used a primitive way of selecting the bandwidth parameter $ h$. Of course, there are better methods for bandwidth choice. They are all based on minimizing some estimated distance measures. Since we are interested in one bandwidth for various $ y$, we look at ``global'' distances like, for instance, the integrated squared error (ISE)

$\displaystyle d_I(h) = \int \left\{ f(y)-\widehat{f}(y,h)\right\}^2 w(y)\mu(y)dy.$ (16.6)

Here $ \mu(\cdot)$ denotes the density of the stationary distribution and $ w(\cdot)$ is a weight function with compact support. Note that the bandwidth which minimizes the ISE $ d_I(h)$ in generally varies from sample to sample. In practice, one may want to avoid the integration and consider an approximation of the ISE, namely the average squared error (ASE)

$\displaystyle d_A(h) = \frac{1}{T-1}\sum_{t=2}^T \left\{ f(Y_{t-1})-\widehat{f}(Y_{t-1},h)\right\}^2 w(Y_{t-1}).$ (16.7)

Since the measure of accuracy $ d_A(h)$ involves the unknown autoregression function $ f(\cdot)$, it cannot be used directly. Instead, one may estimate $ f(Y_{t-1})$ by $ Y_t$. One then obtains the average squared error of prediction (ASEP)

$\displaystyle d_{AP}(h) = \frac{1}{T-1}\sum_{t=2}^T \left\{ Y_t-\widehat{f}(Y_{t-1},h)\right\}^2 w(Y_{t-1}).$ (16.8)

This, however, implies the new problem that $ d_{AP}(h)$ can be driven to zero by choosing $ h$ small enough. To see this consider the Nadaraya-Watson estimator (16.4) and imagine that the bandwidth $ h$ is chosen so small that (16.4) becomes $ \widehat{f}(Y_{t-1},h)=Y_t$. This implies $ d_{AP}(h)=0$. This estimation problem can easily be solved by always leaving out $ Y_t$ in computing (16.4) which leads to

$\displaystyle \widehat{f}_{-t}(y) = \frac{\sum_{i=2,i\neq t}^T K_h(Y_{i-1}-y) Y_i} {\sum_{i=2,i\neq t}^T K_h(Y_{i-1}-y)}$ (16.9)

and is called the leave-one-out cross-validation estimate of the autoregression function. One therefore estimates $ d_{AP}(h)$ with the cross-validation function

$\displaystyle CV(h) = \frac{1}{T-1}\sum_{t=2}^T \left\{ Y_t-\widehat{f}_{-t}(Y_{t-1},h)\right\}^2 w(Y_{t-1}).$ (16.10)

Let $ \widehat{h}$ be the bandwidth that minimizes $ CV(h)$. Härdle (1990) and Härdle and Vieu (1992) proved that under an $ \alpha$-mixing condition,

$\displaystyle \frac{d_{A}(\widehat{h})}{\inf_h d_A(h)}\rightarrow 1\quad \textrm{in
probability}.
$

The interactive quantlet 26581 regxbwsel offers cross-validation and other bandwidth selection methods. The latter may be used in case of independent data. It is called by

  {hcrit, crit} = regxbwsel(x{, h, K})
with the input variables:
x
$ (T-1) \times 2$ vector of the data,
h
$ m \times 1$ vector of bandwidths,
K
string, kernel function on $ [-1,1]$ e.g. quartic kernel "qua" (default) or Gaussian kernel "gau".
The output variables are:
hcrit
$ p \times 1$ vector, selected bandwidths by the different criteria,
crit
$ p \times 1$ string vector, criteria considered for bandwidth selection.
If one wants to use WARPing one has to use the quantlet 26584 regbwsel . Using the following quantlet one may estimate the cross-validation bandwidth for the lynx data set and obtains $ \widehat
h=1.12085$.
  library("smoother")
  library("plot")
  setsize(640,480)
;                       data preparation
  lynx      = read("lynx.dat")
  lynxrows  = rows(lynx)
  lag1      = lynx[1:lynxrows-1]            ; vector of first lag
  y         = lynx[2:lynxrows]              ; vector of dep. var.
  data      = lag1~y
  data      = log(data)
;
  tmp       = regxbwsel(data)
26588 XAGflts04.xpl

It was already noted that the optimal bandwidth with respect to ISE (16.6) or ASE (16.7) may vary across samples. In order to obtain a sample independent optimal bandwidth one may consider the mean integrated squared error (MISE)

$\displaystyle d_M(h) = E\left[\int \left\{ f(y)-\widehat{f}(y,h)\right\}^2 w(y)\mu(y)dy\right].$ (16.11)

Like $ d_I(h)$ or $ d_A(h)$, it also cannot be used directly. It is, however, possible to derive the asymptotic expansion of $ d_M(h)$. This allows to obtain an explicit formula for the asymptotically optimal bandwidth $ h_{opt}$ which, however, contains unknown constants. In Subsection 16.2.2 we show how one can estimate these unknown quantities in order to obtain a plug-in bandwidth $ \widehat{h}_{opt}$.


16.1.3 Diagnostics


26756 acfplot (x)
generates plot of autocorrelation function of time series contained in vector x.
{jb, probjb, sk, k} = 26759 jarber (x, 1)
checks for normality of the data contained in vector x using the Jarque-Bera test.

It is well known that if a fitted model is misspecified, then resulting inference can be misleading like, for example, for confidence intervals or significance tests. One way to check whether a chosen model is correctly specified is to investigate the resulting residuals. Most importantly, one checks for autocorrelation remaining in the residuals. This can easily be done by inspecting the graph of the autocorrelation function using the quantlet 26762 acfplot . It only requires the $ (T-1) \times 1$ vector x with the estimated residuals as input variable. The quantlet also draws 95% confidence intervals for the case of no autocorrelation.

Another issue is to check the normality of the residuals. This is commonly done by using the Bera-Jarque test suggested by Bera and Jarque (1982). It is commonly called JB-test and can be computed with the quantlet 26765 jarber which is called by

  {jb, probjb, sk, k} = jarber(resid, printout)
with input variables
resid
$ (T-1) \times 1$ matrix of residuals,
printout
scalar, 0 no printout, 1 printout,
and output variables
jb
scalar, test statistic of Jarque-Bera test,
probjb
scalar, probability value of test statistics,
sk
scalar, skewness,
k
scalar, kurtosis.
In the following quantlet these diagnostics are applied to the residuals of the NAR(1) model fitted to the lynx data using the Nadaraya-Watson estimator (16.4) with the cross-validation bandwidth $ \widehat
h=1.12085$
;               load required quantlets
  library("smoother")
  library("plot")
  func("acfplot")
  func("jarber")
  setsize(640,480)
;               data preparation
  lynx      = read("lynx.dat")
  lynxrows  = rows(lynx)
  lag1      = lynx[1:lynxrows-1]        ; vector of first lag
  y         = lynx[2:lynxrows]          ; vector of dep. var.
  data      = lag1~y
  data      = log(data)
  datain    = data~#(1:lynxrows-1)      ; add index to data
  dataso    = sort(datain,1)            ; sorted data
;               estimation
  h         = 1.12085               ; Cross-validation bandwidth
  mhlp      = regxest(dataso[,1|2],h)   
                                    ; local constant estimation
;               graphics
  mhlp      = setmask(mhlp,"line","red")
  xy        = setmask(data,"cross","small")
  plot(xy,mhlp)
  setgopt(plotdisplay,1,1,"title",
                                "Estimated NAR(1) mean function")
  setgopt(plotdisplay,1,1,"xlabel","First Lag","ylabel","Lynx")
;               diagnostics
  yhatso    = mhlp.data[,2]~dataso[,3]  ; sorted est. fct. values
  yhat      = sort(yhatso,2)            ; undo sorting
  eps       = data[,2] - yhat[,1]       ; compute residuals
  acfplot(eps)            ; plot autocorrelation function of res.
  setgopt(dacf,1,1,"title","Autocorrelation function of NAR(1)
                                                      residuals")
;
  {jb,probjb,sk,k} = jarber(eps,1)
        ; compute Jarque-Bera test for normality of residuals

26769 XAGflts05.xpl

The plot of the resulting autocorrelation function of the residuals is shown in Figure 16.5. It clearly shows that the residuals are not white noise. This indicates that one should use a higher order nonlinear autoregressive process for modelling the dynamics of the lynx data. This will be discussed in Section 16.2. Moreover, normality is rejected even at the 1% significance level since the JB-test statistic is 11.779 which implies a $ p$-value of 0.003.

Figure 16.5: Autocorrelation function of estimated residuals based on a NAR(1) model for the lynx data
\includegraphics[scale=.55]{lynxresacf}


16.1.4 Confidence Intervals


{mh, clo, cup} = 26928 regxci (x{, h, alpha, K, xv})
computes pointwise confidence intervals with prespecified confidence level for univariate regression using the Nadaraya-Watson estimator.
{mh, clo, cup} = 26931 regci (x{, h, alpha, K, d})
computes pointwise confidence intervals with prespecified confidence level for univariate regression using the Nadaraya-Watson estimator. The computation uses WARPing.

Once one selected the bandwidth and checked the residuals one often wants to investigate the variance of estimating the autoregression function. Under appropriate conditions, the variance of both the Nadaraya-Watson and the local linear estimator can be approximated by

$\displaystyle \textrm{Var}(\widehat{f}(y,h)) \approx \frac{1}{Th}\frac{\sigma^2(y)}{\mu(y)}\vert\vert K\vert\vert\vert _2^2$ (16.12)

as will be seen in Subsection 16.2.1. (16.12) can be used for constructing confidence intervals for $ \widehat{f}(\cdot)$ since one can estimate the conditional variance $ \sigma^2(y)$ by the kernel estimate

$\displaystyle \widehat{\sigma}^2(y,h) = \frac{\sum_{t=2}^T K_h(Y_{t-1}-y) Y_t^2} {\sum_{t=2}^T K_h(Y_{t-1}-y)} - \widehat{f}(y,h)$ (16.13)

and the density $ \mu(y)$ by the kernel estimate

$\displaystyle \widehat{\mu}(y,h) = \sum_{t=1}^T K_h(Y_{t}-y).$ (16.14)

Based on these estimates the quantlet 26935 regxci computes pointwise confidence intervals using the Nadaraya-Watson estimator. It is called with

  {mh, clo, cup} = regxci(x{, h, alpha, K, xv})
with input variables:
x
$ (T-1) \times 2$ matrix of the data with the independent and the dependent variable in the first and second column, respectively,
h
scalar, bandwidth for which if not given 20% of the range of the values in the first column x is used,
alpha
confidence level with 0.05 as default value,
K
string, kernel function on $ [-1,1]$ and the quartic kernel "qua" as default,
xv
$ m \times 1$ matrix of the values of the independent variable on which to compute the regression and x as default.
The output variables are:
mh
$ (T-1) \times 2$ or $ m \times 2$ matrix, the first column is the sorted first column of x or the sorted xv, the second column contains the regression estimate on the values of the first column,
clo
$ (T-1) \times 2$ or $ m \times 2$ matrix, the first column is the sorted first column of x or the sorted xv, the second column contains the lower confidence bounds on the values of the first column,
cup
$ (T-1) \times 2$ or $ m \times 2$ matrix, the first column is the sorted first column of x or the sorted xv, the second column contains the upper confidence bounds on the values of the first column.
If the WARPing technique is required, one uses the quantlet 26938 regci .

In Subsection 16.1.3 we found that the NAR(1) model for the lynx data is misspecified. Therefore, it is not appropriate for illustrating the computation of pointwise confidence intervals. Instead we will use a simulated time series. The quantlet below generates 150 observations of a stationary exponential AR(1) process

$\displaystyle Y_t = 0.3 Y_{t-1} + 2.2 Y_{t-1}\exp\left(-0.1 Y_{t-1}^2\right) + \xi_t, \quad \xi \sim N(0,1),$ (16.15)

calls the interactive quantlet 26943 regxbwsel for bandwidth selection where one has to choose for the first time cross-validation and for the second time stop, computes the confidence intervals and plots the true and estimated function (solid and dashed line) as well as the pointwise confidence intervals (dotted line) as shown in Figure 16.6.

  library("smoother")
  library("plot")
  library("times")
  setsize(640,480)

;                   generate exponential AR(1) process
  phi1    = 0.3
  phi2    = 2.2
  g       = 0.1
  randomize(0)
  x       = genexpar(1,g,phi1,phi1+phi2,normal(150))

;                   data preparation
  xrows   = rows(x)
  lag1    = x[1:xrows-1]             ; vector of first lag
  y       = x[2:xrows]               ; vector of dep. var.
  data    = lag1~y

;                   true function
  f       = sort(lag1~(phi1*lag1 + phi2*lag1.*exp(-g*lag1^2)),1)

;                   estimation
  {hcrit,crit}    = regxbwsel(data)
  {mh, clo, cup}  = regxci(data,hcrit)

  f       = setmask(f,"line","solid","red")
  data    = setmask(data,"cross")
  mh      = setmask(mh,"line","dashed","blue")
  clo     = setmask(clo,"line","blue","thin","dotted")
  cup     = setmask(cup,"line","blue","thin","dotted")
  plot(data,f,mh,clo,cup)
  setgopt(plotdisplay,1,1,"title","Confidence intervals of
                                estimated NAR(1) mean function")
  setgopt(plotdisplay,1,1,"xlabel","First Lag","ylabel","Y")

26947 XAGflts06.xpl

Figure 16.6: True and estimated mean function plus pointwise confidence intervals for a generated exponential AR(1) process
\includegraphics[scale=.55]{earci}


16.1.5 Derivative Estimation


mh = 27159 lpderxest (x, h{, q, p, K, v})
estimates the q-th derivative of a regression function using local polynomial kernel regression with quartic kernel.
mh = 27162 lpderest (x, h{, q, p, K, d})
estimates the q-th derivative of a autoregression function using local polynomial kernel regression. The computation uses WARPing.

When investigating the properties of a conditional mean function, one is often interested in its derivatives. The estimation of derivatives can be accomplished by using local polynomial estimation as long as the order $ p$ of the polynomial is at least as large as the order $ q$ of the derivative to be estimated. Using a local quadratic estimator

$\displaystyle \{\widehat{c}_{0},\widehat{c}_1,\widehat{c}_2\}
$

$\displaystyle =\textrm{arg}\!\!\!\!\!\!\min_{\left\{ c_{0},c_1,c_2\right\} }
\s...
...c_{0}-c_1({Y}_{t-1}-{y})-c_2({Y}_{t-1}-{y})^2\right\} ^{2}K_{h}({Y}_{t-1}-{y})
$

one estimates the first and second derivative of $ f(y)$ at $ y$ with

$\displaystyle \widehat{f}'(y,h) = \widehat{c}_1, \quad \widehat{f}''(y,h)=2\widehat{c}_2.
$

In general, one uses a $ q+1$ instead of a $ q$-th order polynomial for the estimation of the $ q$-th derivative since this reduces the complexity of the estimation bias, see e.g. Fan and Gijbels (1995). The estimated derivative is then obtained as $ \widehat{f}^{(q)}=q! \widehat{c}_q$. The quantlet 27166 lpderxest allows to estimate first and second order derivatives where maximally a second order polynomial is used. It is called by
  mh = lpderxest (x, h{, q, p, K, v})
with input variables
x
$ (T-1) \times 2$ matrix of the data with the independent and dependent variable in the first and second column, respectively.
h
scalar, bandwidth for which if not given the rule-of-thumb bandwidth is computed with 27169 lpderrot ,
q
integer $ \leq 2$, order of derivative for which if not given, q=1 (first derivative) is chosen,
p
integer, order of polynomial for which if not given, p=q + 1 is used for q$ <2$ and p=q is used for q=2,
v
$ m \times 1$, values of the independent variable on which to compute the regression for which if not given, x is used.
The output variable is
mh
$ (T-1) \times 2$ or $ m \times 2$ matrix where the first column is the sorted first column of x or the sorted v and the second column contains the derivative estimate on the values of the first column.
The quantlet 27172 lpderest which applies the WARPing technique (Fan and Marron; 1994) allows for p $ \leq 5$ and q $ \leq 4$. We note, however, that WARPing may waste a lot of information. Bandwidth selection remains an important issue and can be done using the quantlet 27175 lpderrot .

In the following quantlet we estimate the first and second derivatives of the conditional mean function of the exponential AR(1) process (16.15) based on 150 observations. The true derviatives (solid lines) and their estimates (dashed lines) are shown in Figures 16.7 and 16.8.

  library("smoother")
  library("plot")
  library("times")
  setsize(640,480)
;                   generate exponential AR(1) process
  phi1    = 0.3
  phi2    = 2.2
  g       = 0.1
  randomize(0)
  x       = genexpar(1,g,phi1,phi1+phi2,normal(150))

;                       data preparation
  xrows   = rows(x)
  lag1    = x[1:xrows-1]             ; vector of first lag
  y       = x[2:xrows]               ; vector of dep. var.
  data    = lag1~y
  ffder   = sort(lag1~(phi1 + exp(-g*lag1^2).*
                                  phi2.*(1-2.*g.*lag1^2)),1)
  fsder   = sort(lag1~(exp(-g*lag1^2).*(-2*g.*lag1)*
                                  phi2.*(3-2.*g.*lag1^2)),1)

;                       estimate first derivative
  ffder   = setmask(ffder,"line","solid","red")
  mhfder  = lpderxest(data)
  mhfder  = setmask(mhfder, "line","dashed","blue")
  plotder = createdisplay(1,1)
  show(plotder,1,1,ffder,mhfder)
  setgopt(plotder,1,1,"title","Estimated first derivative
                                  of mean function")
  setgopt(plotder,1,1,"xlabel","First lag","ylabel",
                                  "First derivative")
;                       estimate second derivative
  fsder   = setmask(fsder,"line","solid","red")
  hrot    = 2*lpderrot(data,2)
  mhsder  = lpderxest(data,hrot,2)
  mhsder  = setmask(mhsder, "line","dashed","blue")
  plot(fsder,mhsder)
  setgopt(plotdisplay,1,1,"title","Estimated second
                                  derivative of mean function")
  setgopt(plotdisplay,1,1,"xlabel","First lag","ylabel",
                                  "Second derivative")

27179 XAGflts07.xpl

Figure 16.7: True and estimated first derivative for a generated exponential AR(1) process
\includegraphics[scale=.55]{earffder}

Figure 16.8: True and estimated second derivative a generated exponential AR(1) process
\includegraphics[scale=.55]{earfsder}