19.4 Forecasts of Financial Time Series with Neural Networks

To forecast the future development of financial time series an autoregressive model is particularly suitable. The value of the time series at date $ t+1$ is a function of infinite many observations from the past in addition to an innovation independent of the past:

$\displaystyle Z_{t+1} = f(Z_t, \ldots, Z_{t-p+1}) + \varepsilon _{t+1},\quad - \infty < t < \infty,$ (19.2)

where $ \varepsilon _t,\ - \infty < t < \infty,$ is independently and identically distributed with $ {\mathop{\text{\rm\sf E}}} (\varepsilon _t) = 0$, $ \mathop{\mathit{Var}}(\varepsilon _t) = \sigma_\varepsilon ^ 2 < \infty$. The analogy of this formula for this non-linear autoregressive model of order $ p$ (NLAR($ p$)) to the regression model considered in a previous section is obvious, where the $ p$-variate random vector $ (Z_t, \ldots, Z_{t-p+1})^\top $ takes the place of the $ d$-variate independent variable $ X_t$. The autoregression function $ f: \mathbb{R}^p \rightarrow \mathbb{R}$ in this model immediately gives the best forecast for $ Z_{t+1}$ given the value of the time series up to date $ t$:

$\displaystyle \hat{Z}^ 0_{t+1\vert t} = f (Z_{t}, \ldots, Z_{t-p+1}). $

Since $ f$ is in general not known, it seems obvious in view of the last section to approximate the autoregression function with a neural network when observations of the times series $ Z_1, \ldots,
Z_{n+1}$ are available. For training the network, i.e., for estimating the network weights, the vector $ (Z_t, \ldots, Z_{t-p+1})^\top $ is used as input values and as output values $ Z_{t+1}$ for $ t = p, \ldots, n,$ is used. We will restrict ourselves for simplicity to the MLP with one hidden layer. $ \hat{\vartheta}_n$ again represents the least squares estimator for the weight vector:

$\displaystyle \hat{\vartheta }_n = \mathop{\rm arg min}_{\vartheta \in\Theta_H}...
...ft\{Z_{t+1} - \nu_H \left( Z_t, \ldots, Z_{t-p+1}; \vartheta
\right)\right\}^ 2$

where $ \nu_H$ is defined as in the previous section. We thus obtain a non-parametric forecast based on a neural network for $ Z_{t+1}:$

$\displaystyle \hat{Z}_{t+1\vert t} = \nu_H (Z_t, \ldots, Z_{t-p+1}; \hat{\vartheta}_n ). $

Fig.: Approximation of exchange rate JPY/USD (red) through RBF neural network (blue): Training set(above) and forecasts(below) 34017 SFEnnjpyusd.xpl
\includegraphics[width=1\defpicwidth]{nnjpyusd.ps}

The result of this procedure is illustrated in Figure 18.9: it shows the forecasting of the exchange rate time series JPY/USD using neural networks considering $ 3$ periods of time dependency.

The asymptotic normality of the parameters and of the function estimators and the consistency of $ \nu_H (\cdot
;\hat{\vartheta}_n)$ as an estimator of $ f$ for an increasing $ H$ remain robust even in the case where the stochastic process $ \{
Z_t,\ -\infty < t< \infty\}$ is $ \alpha$-mixing with exponentially decreasing mixing coefficients, White (1989b) and White (1990). Franke, Kreiss, Mammen and Neumann (2003) have formulated conditions for the case where $ p=1$ for the autoregression function $ f$ and for the distribution of the innovations $ \varepsilon_t$, which guarantee for the NLAR(1) process the strongest $ \beta$-mix properties with exponentially decreasing coefficients. Next to technical details it is essential that

$\displaystyle \lim_{\vert x\vert\rightarrow\infty} \vert f(x)/x\vert < 1
$

is fulfilled, because it is sufficient for the innovation distribution that the density does not vanish anywhere. The last condition can be considerably weakened.

The conditions on the autoregression function is comparatively weak and obvious when one considers the stationarity conditions $ \vert\alpha\vert < 1$ for linear AR(1) processes $ Z_{t+1} = \alpha \, Z_t +
\varepsilon _{t+1} $, where $ f(x) = \alpha \, x$. Accordingly also for NLAR($ p$) process of large order $ (p>1)$ it is sufficient to use weaker conditions on $ f$, which above all guarantees stationarity in order to make the neural network a useful tool as a non-parametric estimator of $ f$.

For the practical forecast one not only wants to use the last values in the time series, but also economic data available at time $ t$ such as exchange rates, index values, oil prices or the non-linear transformation of prices. To do this the non-linear autoregressive process with exogenous components of order $ p$ (NLARX($ p$)) process is suitable:

$\displaystyle Z_{t+1} = f (Z_t, \ldots, Z_{t-p+1}, X_t) + \varepsilon _t,\ \, - \infty < t < \infty ,$ (19.3)

where the innovations $ \varepsilon _t,\ - \infty < t < \infty,$ are again independently and identically distributed with $ {\mathop{\text{\rm\sf E}}}
(\varepsilon _t) = 0,\ \, \mathop{\mathit{Var}}(\varepsilon _t) = \sigma
_\varepsilon ^2 < \infty,$ and $ X_t$ is the value of a $ d$-variate stochastic process that contains all external information available at date $ t$, which is used in the forecast.

The practical application of the forecast on financial time series with neural networks is illustrated with a pilot study that was done in cooperation with the Commerzbank AG, Franke (1999). The goal was to develop a trading strategy for a portfolio made up of 28 of the most important stocks from the Dutch CBS-Index. We will restrict ourselves here to the buy-and-hold strategy with a time horizon of a quarter of a year (60 trading days), i.e., the portfolio is created at the beginning of a quarter and then held for three months with no alterations. At the end of the three months the value of the portfolio should be as large as possible.

As a basis for the trading strategy a three month forecast of the stocks is used. $ S_t$ represents the price of one of the 28 stocks. To model the time series $ S_t$ we use a NLARX process of the form (18.3); the system function $ f$ is approximated with a network function $ \nu_H (S_t, A_t, X_t; \vartheta )$. Here $ A_t$ is a vector made up of constant non-linear transformations of $ S_t, ..., S_{t-p+1}$ that were taken from the technical market analysis, for example, a moving average, momentum or Bollinger-intervals, Müller and Nietzer (1993), Welcker (1994). The random vector $ X_t$ represents the chosen market data such as index prices, exchange rates, international interest rates, etc. As is expected with a forecast horizon of 60 units of time into the future, the actual forecasts of the stock prices in 60 days,

$\displaystyle \hat{S}_{t+60\vert t} = \nu_H (S_t, A_t, X_t; \hat{\vartheta}_n),$

is not very reliable. For making the decision whether a stock should be included in the portfolio or not, the general trend of the price developments are most important instead of the actual price of the stock at the end of the holding period. To realize this aspect in formulating the portfolio, it should be considered whether based on the network based forecast, $ \hat{S}_{t+60\vert t}$, the price is expected to increase considerably (more than 5 %), decrease considerably (more than 5 %) or whether it is essentially expected to stay at the same level. The network based portfolio is composed of those stocks (with relative proportions that are taken from the stock's corresponding weight in the CBS Index) for which $ ( \hat{S}_{t+60\vert t} - S_t )/S_t > 0.05$. Here the same network function $ \nu_H (S_t, A_t, X_t; \vartheta )$ is used for all 28 stocks taken into consideration whose price dependent arguments $ S_t$ actually take on the stock specific values.

In choosing a suitable network and in estimating the network weight vector $ \vartheta$ the data from 1993 to 1995 is used. In choosing the network structure a statistical model selection technique and the experience of the experts was used. The resulting network is a multiple layered perceptron with one hidden layer made up of $ H=3$ neurons. The input vector $ (S_t, A_t, X_t)$ has the dimension 25, so that a parameter vector $ \vartheta \in
\mathbb{R}^{82}$ needed to be estimated.

To check the quality of the network based trading strategy, it is applied to the data from 1996. At the beginning of every quarter a portfolio made up of 28 stocks is created based on the network based forecast. At the end of the quarter the percentage increase in value is considered. As a comparison the increase in value of a portfolio replicating the CBS Index exactly is considered. Since in the years considered the market was of the most part in an increasing phase, it is known from experience that it is hard to beat an index. As Table 18.1 shows, the network portfolio achieved a higher percentage increase in value in every quarter than the index portfolio, that is in the quarters, such as the first and fourth, where the index has substantially increased, as well as in the quarters, such as the second, where the index has minimally decreased. Nevertheless the results need to be interpreted with a bit of caution. Even in the training phase (1993-1995) the CBS Index tended to increase, so that the network was able to specialize in a trend forecast in a generally increasing market. Presumably one would need to use a different network as a basis for the trading strategy, when the market fluctuates within a long-term lateral motion or when the index dramatically decreases.

Table: Quarterly returns of a network portfolio and the index portfolio in 1996.
    Quarterly returns    
  I. II. III. IV.
Network portfolio 0.147 0.024 0.062 0.130
Index portfolio 0.109 -0.004 0.058 0.115