12.5 Dynamic Panel Data Models


{output, beta} = 30371 pandyn (z, p, IVmeth {,T})
computes 1st-stage GMM estimate of a dynamic linear model with p lags of the dependent variables
output = 30374 pandyn2 (z, p, IVmeth, beta {,T})
computes 2nd-stage (robust) GMM estimate of a dynamic linear model

The dynamic model is given by

$\displaystyle y_{it} = \gamma_1 y_{i,t-1} + \cdots + \gamma_p y_{i,t-p} + x_{it}^T \beta +
\alpha_i + \varepsilon_{it}
$

For such a model the within-group estimator (for the fixed effects models) and the GLS estimator (for the random effects model) are not applicable. Therefore, Arrelano and Bond (1991) suggest to estimate the model using a GMM estimation procedure. The idea is to estimate the differenced model

$\displaystyle \Delta y_{it} = \gamma_1 \Delta y_{i,t-1} + \cdots + \gamma_p \Delta
y_{i,t-p} + x_{it}^T \beta + \Delta \varepsilon_{it},
$

where $ \Delta$ is the difference operator such that $ \Delta y_{it} =
y_{it} - y_{i,t-1}$ by using the instruments

$\displaystyle y_{i,t-2},y_{i,t-3},\ldots,Y_{i1},\Delta x_t.
$

Using these instruments, different GMM estimators can be constructed. Let $ z_{it}=[y_{it},y_{i,t-1},\ldots,y_{i1}]^T$. Then, Arrelano and Bond (1991) suggest to use the instrumental variable (IV) matrix

$\displaystyle \textrm{Method 4:} \quad \left[ \begin{array}{ccccc}
z_{ip} & 0 &...
...\Delta x_{i,p+3} & \Delta x_{i,p+4} & . & \Delta x_{iT}
\end{array}\right]\,.
$

A problem with this IV matrix is that the number of rows grows with $ T^2$ so that for large $ T$ the number of instruments may be larger as $ N$ and the GMM estimator cannot be computed. Therefore, in many cases a more parsimonious arrangement of the instruments is required:

$\displaystyle \textrm{Method 3:} \quad \left[ \begin{array}{ccccc}
y_{ip} & 0 &...
...\
. & . & . & . & . \\
0 & 0 & 0 & . & \Delta x_{iT}
\end{array} \right]\,.
$

In this IV matrix the number of moments grows with $ k\cdot T$. However, the number of instruments is still rather large so that a further reduction may be necessary. In these cases the following IV matrix is used:

$\displaystyle \textrm{Method 2:} \quad \left[ \begin{array}{ccccc}
y_{ip} & 0 &...
...\Delta x_{i,p+3} & \Delta x_{i,p+4} & . & \Delta x_{iT}
\end{array}\right]\,.
$

Finally we may have a panel with $ T>N$. In this case these three approaches (methods 2-4) are not applicable. Therefore another method is implemented using only $ y_{i,t-2}$ and $ x_{i,t-1}$ as instruments for the lagged differences:

$\displaystyle \textrm{Method 1:} \quad
\left[ \begin{array}{ccccc}
y_{ip} & y_...
...\Delta x_{i,p+3} & \Delta x_{i,p+4} & . & \Delta x_{iT}
\end{array}\right]\,.
$

Accordingly, only $ k$ over-identifying moment conditions are used and thus the resulting estimator should be applicable in almost all cases.

The computation of the GMM estimator may be computationally burdensome when the number of instruments gets large. Therefore, it is highly recommended to start with the simplest GMM estimator (i.e. method 1) and then to try the more computer intensive methods 2-4. For a more efficient estimator, the standard errors of the coefficients should be smaller than for less efficient estimators. Therefore, it is expected that the standard errors tend to decrease with a more efficient GMM method. However in small samples the difference may be small or one may even encounter situations where the standard errors increase with an (asymptotically) more efficient estimator. This may occur by chance in a limited sample size or it may indicate a serious misspecification of the model.

To estimate the optimal weight matrix of the GMM estimate, two different approaches can be used. First, under the standard assumptions of the errors, the weight matrix may be estimated as

$\displaystyle W_N = \left[ \sum_{i=1}^N \widehat \sigma_\varepsilon^2 Z_i D Z_i...
... & . & 0 \\
. & . & . & . & . \\ 0 & 0 & 0 & 0 & . & 2
\end{array} \right] ,
$

where $ Z_i$ is the IV matrix. If, however, the error are heteroskedastic or if $ \Delta \varepsilon_{it}$ is uncorrelated but not independent of $ z_{t-2}$, then the weight matrix is estimated as

$\displaystyle W_N = \left[ \sum_{i=1}^N \sigma_\varepsilon^2 Z_i \Delta \widehat \varepsilon_i
\Delta \widehat \varepsilon_i^T Z_i^T \right]^{-1} ,
$

where $ \Delta \widehat \varepsilon_i$ is the residual vector $ \widehat
\varepsilon_i = [\Delta \widehat \varepsilon_{i,p+2},\ldots,\Delta \widehat
\varepsilon_{iT}]^T$ obtained from a consistent first-step estimation of the model. Therefore, the estimation is more cumbersome and may have poor small sample properties if the number of instruments is relatively large compared to the sample size $ N$.

To assess the validity of the model specification, Hansen's misspecification statistic is used. This statistic tests the validity of the overidentifying restrictions resulting from the difference of the number of conditional moments and the number of estimated coefficients. If the model is correctly specified, the statistic is $ \chi^2$ distributed and the $ p$-value of the statistic is given in the output string of the 30381 pandyn quantlet. Furthermore, a Hausman test (computed as a conditional moments test) can be used to test the hypothesis that the individual effects are correlated with the explanatory variables.

The data set z is similarly arranged as in the case of a static panel data estimation. If the data set is unbalanced, the identification number of the cross-section unit and the time period are given in the first two columns. However, the quantlet 30384 pandyn only uses the time index so that the first column may have arbitrary values. The following columns are the dependent variable $ y_{it}$ and the explanatory variables $ x_{1it}, \ldots,x_{kit}$, where all explanatory variables must vary in time. This is necessary because if the variables are constant, they become zero after performing differences.

If the data set is in a balanced form, the first two columns can be dropped and the common number of time periods is given. Furthermore, the number of lagged dependent variables $ p$ must be indicated. Accordingly, the quantlet is called by

  {output,beta} = pandyn(z,p,IVmeth {,T})
The output table is returned in the string output and the coefficient estimates are stored in the vector $ \beta$. The variable IVmeth specifies the method for constructing the instrument matrix. For example, IVmeth=1 give the GMM estimator with the smallest set of instruments, while IVmeth=4 gives the Arellano-Bond estimator. If IVmeth=0, then the program will select an appropriate instrument matrix by choosing the (asymptotically) most efficient GMM procedure subject to the constraint that the number of instruments does not exceed $ 0.9N$.

For the two-stage GMM estimator the weight matrix is computed using a consistent first-step estimate of the model. This can be done using the following estimation stages:

  {out1,beta} = pandyn(z,p,IVmeth {,T})
  out2 = pandyn(z,p,IVmeth,beta {,T})
  out2
The output for the second estimation stage is presented in the string out2. In small samples, the two-stage GMM estimator may have poor small sample properties. Thus, if the results of the two stages differ substantially, it is recommended to use the (more stable) first-stage estimator.