5.3 Identification of Multiplicative SARIMA Models

This section deals with the identification of a multiplicative SARIMA model. The required procedure is explained step by step, using the famous airline data of Box and Jenkins (1976, Series G) for illustrative purposes. The date give the number of airline passengers (in thousands) in international air travel from 1949:1 to 1960:12. In the following $ G_t$ denotes the original series.

The identification procedure comprises the following steps: plotting the data, possibly transforming the data, identifying the dependence order of the model, parameter estimation, and diagnostics. Generally, selecting the appropriate model for a given data set is quite difficult. But the task becomes less complicated, if the following approach is observed: one thinks first in terms of finding difference operators that produce a roughly stationary series and then in terms of finding a set of simple ARMA or multiplicative SARMA to fit the resulting residual series.

As with any data analysis, the time series has to be plotted first so that the graph can be inspected. Figure 5.5 shows the airline data of Box and Jenkins.

Figure 5.5: Number of airline passengers $ G_t$ (in thousands) in international air travel from 1949:1 to 1960:12.
\includegraphics[width=1.5\defpicwidth]{XEGmsarimadisplay5.ps}

The series $ G_t$ shows a strong seasonal pattern and a definite upward trend. Furthermore, the variability in the data grows with time. Therefore, it is necessary to transform the data in order to stabilize the variance. Here, the natural logarithm is used for transforming the data. The new time series is defined as follows

$\displaystyle g_t\stackrel{\mathrm{def}}{=}\ln{G_t}\;.$    

Figure 5.6 displays the logarithmically transformed data $ g_t$. The strong seasonal pattern and the obvious upward trend remain unchanged, but the variability is now stabilized.

Figure 5.6: Log number of airline passengers $ g_t$ in international air travel from 1949:1 to 1960:12.
\includegraphics[width=1.5\defpicwidth]{XEGmsarimadisplay6.ps}

Now, the first difference of time series $ g_t$ has to be taken in order to remove its nonseasonal unit root, i.e. we have $ d=1$. The new variable

$\displaystyle \Delta g_t\equiv (1-L)g_t$ (5.7)

has a nice interpretation: it gives approximately the monthly growth rate of the number of airline passengers.

The next step is plotting the sample ACF of the monthly growth rate $ \Delta g_t$.

Figure 5.7: Sample ACF of the monthly growth rate of the number of airline passengers $ \Delta g_t$.
\includegraphics[width=1.5\defpicwidth]{XEGmsarimadisplay7.ps}

The sample ACF in Figure 5.7 displays a recurrent pattern: there are significant peaks at the seasonal frequencies (lag 12, 24, 36, etc.) which decay slowly. The autocorrelation coefficients of the months in between are much smaller and follow a regular pattern. The characteristic pattern of the ACF indicates that the underlying time series possesses a seasonal unit root. Typically, $ D = 1$ is sufficient to obtain seasonal stationarity. Therefore, we take the seasonal difference and obtain the following time series

$\displaystyle \Delta_{12}\Delta g_t = (1-L)(1-L^{12})g_t$    

that neither incorporates an ordinary nor a seasonal unit root.

After that, the sample ACF and PACF of $ \Delta _{12}\Delta g_t$ has to be inspected in order to explore the remaining dependencies in the stationary series. The autocorrelation functions are given in Figures 5.8 and 5.9. Compared with the characteristic pattern of the ACF of $ \Delta g_t$ (Figure 5.7) the pattern of the ACF and PACF of $ \Delta _{12}\Delta g_t$ are far more difficult to interpret. Both ACF and PACF show significant peaks at lag 1 and 12. Furthermore, the PACF displays autocorrelation for many lags. Even these patterns are not that clear, we might feel that we face a seasonal moving average and an ordinary MA(1). Another possible specification could be an ordinary MA(12), where only the coefficients $ \theta_1 $ and $ \theta_{12}$ are different from zero.

Figure 5.8: Sample ACF of the seasonally differenced growth rate of the airline data $ \Delta _{12}\Delta g_t$.
\includegraphics[width=1.5\defpicwidth]{XEGmsarimadisplay8.ps}

Figure 5.9: Sample PACF of the seasonally differenced growth rate of the airline data $ \Delta _{12}\Delta g_t$.
\includegraphics[width=1.5\defpicwidth]{XEGmsarimadisplay9.ps}

Thus, the identification procedure leads to two different multiplicative SARIMA specifications. The first one is a SARIMA(0,1,1)$ \times$(12,0,1,1). Using the lag-operator this model can be written as follows:


$\displaystyle (1-L)(1-L^{12})G_t$ $\displaystyle =$ $\displaystyle (1+\theta_1 L)(1+\theta_{s,1}L^{12})a_t$  
  $\displaystyle =$ $\displaystyle (1+\theta_1L+\theta_{s,1}L^{12}+\theta_1\theta_{s,1}L^{13})a_t.$  

The second specification is a SARIMA(0,1,12)$ \times$(12,0,1,0). This model has the following representation:

$\displaystyle (1-L)(1-L^{12})G_t=(1+\theta_1L+\theta_{12}L^{12})a_t.$    

Note, that in the last equation all MA coefficients other than $ \theta_1 $ and $ \theta_{12}$ are zero. With the specification of the SARIMA models the identification process is finished. We saw that modeling the seasonal ARMA after removing the nonseasonal and the seasonal unit root was quite difficult, because the sample ACF and the PACF did not display any clear pattern. Therefore, two different SARIMA models were identified that have to be tested in the further analysis.