13.3 Estimation with Kalman Filter Techniques


13.3.1 Kalman Filtering given all parameters

Given the above SSF and all unknown parameters $ \psi\stackrel{\mathrm{def}}{=}
(\phi_1,\phi_2,\sigma^2_{\nu},\sigma^2_{\varepsilon})$, we can use Kalman filter techniques to estimate the unknown coefficients $ \beta $ and the process of $ I_t$. The Kalman filter technique is an algorithm for estimating the unobservable state vectors by calculating its expectation conditional on information up to $ s\leqslant T$. In the ongoing, we use the following general notation:

$\displaystyle \begin{equation}a_{t\vert s}\stackrel{\mathrm{def}}{=}\textrm{E}[...
...t-a_{t\vert s}) (\alpha_t-a_{t\vert s})^\top \vert\mathcal{F}_s] \end{equation}$    

denotes the covariance matrix of the estimation error and $ \mathcal{F}_s$ is a shorthand for the information available at time $ s$.

Generally, the estimators delivered by Kalman filtering techniques have minimum mean-squared error among all linear estimators (Shumway and Stoffer; 2000, Chapter 4.2). If the initial state vector, the noise $ \varepsilon^m$ and $ \varepsilon^s$ are multivariate Gaussian, then the Kalman filter delivers the optimal estimator among all estimators, linear and nonlinear (Hamilton; 1994, Chapter 13).

The Kalman filter techniques can handle missing observations in the measurement equation (13.3b). For periods with less than $ N$ observations, one has to adjust the measurement equations. One can do this by just deleting all elements of the measurement matrices $ d_t$, $ Z_t$, $ H_t$ for which the corresponding entry in $ y_t$ is a missing value. The quantlets in XploRe use this procedure. Another way to take missing values into account is proposed by Shumway and Stoffer (2000,1982): replace all missing values with zeros and adjust the other measurement matrices accordingly. We show in Appendix 13.6.1 that both methods deliver the same results. For periods with no observations the Kalman filter techniques recursively calculate an estimate given recent information (Durbin and Koopman; 2001).


13.3.2 Filtering and state smoothing

The Kalman filter is an algorithm for sequently updating our knowledge of the system given a new observation $ y_t$. It calculates one step predictions conditional on $ s=t$. Using our general expressions, we have

$\displaystyle \begin{equation*}a_t= \textrm{E}[\alpha_t\vert\mathcal{F}_t] \end...
...rm{E}[(\alpha_t-a_t) (\alpha_t-a_t)^\top \vert\mathcal{F}_t]\;. \end{equation*}$    

Here we use the standard simplified notation $ a_t$ and $ P_t$ for $ a_{t\vert t}$ and $ P_{t\vert t}$. As a by-product of the filter, the recursions calculate also

$\displaystyle \begin{equation*}a_{t\vert t-1}= \textrm{E}[\alpha_t\vert\mathcal...
...t-1}) (\alpha_t-a_{t\vert t-1})^\top \vert\mathcal{F}_{t-1}]\;. \end{equation*}$    

We give the filter recursions in detail in Subsection 13.5.3.

The Kalman smoother is an algorithm to predict the state vector $ \alpha_t$ given the whole information up to $ T$. Thus we have with our general notation $ s=T$ and

$\displaystyle \begin{equation*}a_{t\vert T}=\textrm{E}[\alpha_t\vert\mathcal{F}...
...{t\vert T}) (\alpha_t-a_{t\vert T})^\top \vert\mathcal{F}_T]\;. \end{equation*}$    

We see that the filter makes one step predictions given the information up to $ t\in\{1,\hdots,T\}$ whereas the smoother is backward looking. We give the smoother recursions in detail in Subsection 13.5.5.


13.3.3 Maximum likelihood estimation of the parameters

Given the system matrices $ c_t$, $ T_t$, $ R_t$, $ d_t$, $ Z_t$, and $ H_t$, Kalman filtering techniques are the right tool to estimate the elements of the state vector. However, in our model some of these system matrices contain unknown parameters $ \psi $. These parameters have to be estimated by maximum likelihood.

Given a multivariate Gaussian error distribution, the value of the log likelihood function $ l(\psi)$ for a general SSF is up to an additive constant equal to:

$\displaystyle -\frac{1}{2}\sum_{t=1}^T\ln{\vert F_t\vert}-\frac{1}{2}\sum_{t=1}^T v_t^\top F_t^{-1}v_t\;.$ (13.9)

Here,

$\displaystyle v_t\stackrel{\mathrm{def}}{=}y_t-d_t-Z_ta_{t\vert t-1}$ (13.10)

are the innovations of the filtering procedure and $ a_{t\vert t-1}$ is the conditional expectation of $ \alpha_t$ given information up to $ t-1$. As we have already mentioned, these expressions are a by-product of the filter recursions. The matrix $ F_t$ is the covariance matrix of the innovations at time $ t$ and also a by-product of the Kalman filter. The above log likelihood is known as the prediction error decomposition form (Harvey; 1989). Periods with no observations do not contribute to the log likelihood function.

Starting with some initial value, one can use numerical maximization methods to obtain an estimate of the parameter vector $ \psi $. Under certain regularity conditions, the maximum likelihood estimator $ \tilde{\psi}$ is consistent and asymptotically normal. One can use the information matrix to calculate standard errors of $ \tilde{\psi}$ (Hamilton; 1994).


13.3.4 Diagnostic checking

After fitting a SSF, one should check the appropriateness of the results by looking at the standardized residuals

$\displaystyle v^{st}_t=F_t^{-1/2}v_t\;.$ (13.11)

If all parameters of the SSF were known, $ v^{st}_t$ would follow a multivariate standardized normal distribution (Harvey; 1989, see also (13.9)). We know that $ F_t$ is a symmetric matrix and that it should be positive definite (recall that it is just the covariance matrix of the innovations $ v_t$). So

$\displaystyle F_t^{-1/2}=C_t\Lambda_t^{-1/2}C^\top _t\;,$ (13.12)

where the diagonal matrix $ \Lambda_t$ contains all eigenvalues of $ F_t$ and $ C_t$ is the matrix of corresponding normalized eigenvectors (Greene; 2000, p.43). The standardized residuals should be distributed normally with constant variance, and should show no serial correlation. It is a signal for a misspecified model when the residuals do not possess these properties. To check the properties, one can use standard test procedures. For example, a Q-Q plot indicates if the quantiles of the residuals deviate from the corresponding theoretical quantiles of a normal distribution. This plot can be used to detect non-normality. The Jarque-Bera test for normality can also be used for testing non-normality of the residuals (Bera and Jarque; 1982). This test is implemented in XploRe as 26507 jarber .

In the empirical part, we combine Kalman filter techniques and maximum likelihood to estimate the unknown parameters and coefficients of the SSF for the house prices in a district of Berlin.