19.5 Quantifying Risk with Neural Networks

In the previous chapters the most popular measurements of risk, volatility and Value-at-Risk, have been introduced. Both are most often defined as conditional standard deviations or as conditional quantiles respectively, based on a given historical information set. As with other non-parametric methods the neural network can also be used to estimate these measurements of risk. The advantage of the neural network based volatility and VaR estimators lies in the fact that the information used for estimating the risk can be represented by a large dimensional data vector without hurting the practicality of the method. It is possible, for example, to estimate the conditional 5% quantile of the return process of a stock from the DAX given the individual returns of all of the DAX stocks and additional macroeconomic data such as interest rates, exchange rates, etc. In the following section we briefly outline the necessary procedure.

As in (13.1) we assume a model of the form

$\displaystyle Z_{t+1} = f(Z_t, \ldots, Z_{t-p+1}, X_t) + s(Z_t, \ldots, Z_{t-p+1}, X_t) \, \xi _{t+1}$ (19.4)

to estimate the volatility, where $ \xi_t$ are independent, identically distributed random variables with $ {\mathop{\text{\rm\sf E}}}(\xi_t) = 0$, $ {\mathop{\text{\rm\sf E}}}(\xi_t^ 2) = 1$. $ X_t \in \mathbb{R}^ d $ represents as in the previous section the exogenous information available at date $ t$ which we will use in estimating the risk of the time series $ Z_t$. The time series given by (18.4) is a non-linear AR(p) ARCH(p) process with exogenous components.

To simplify we use $ Z_t (p) = (Z_t, \ldots, Z_{t-p+1})^\top \in
\mathbb{R}^ p.$ Then it holds for $ z \in \mathbb{R}^ p,\ x \in
\mathbb{R}^ d$ that
$\displaystyle {\mathop{\text{\rm\sf E}}}[Z_{t+1} \vert Z_t (p) = z,\ X_t = x]$ $\displaystyle =$ $\displaystyle f(z,x)$  
$\displaystyle \mathop{\mathit{Var}}[Z_{t+1} \vert Z_t (p) = z, X_t = x]$ $\displaystyle =$ $\displaystyle s^ 2 (z,x)$  
  $\displaystyle =$ $\displaystyle {\mathop{\text{\rm\sf E}}} [Z_{t+1}^ 2 \vert Z_t(p) = z, X_t = x] - f^ 2 (z,x) .$  

The conditional expectation function $ f(z,x)$ is approximated as in the previous section by a neural network function $ \nu _H (z,x;
\vartheta)$ of the form (18.1). With the non-linear least squares estimator $ \hat{\vartheta}_n$ we obtain for $ \vartheta$ an estimator for $ f:$

$\displaystyle \hat{f}_H (z,x) = \nu_H (z,x; \hat{\vartheta}_n). $

Analogously we could estimate the conditional mean

$\displaystyle {\mathop{\text{\rm\sf E}}}[Z_{t+1}^ 2 \vert Z_t (p) = z,\ X_t = x] = g(z,x) $

by approximating the function with a neural network with output function $ \nu_G (z,x; \delta)$ and estimate its parameter $ \delta$ with a least squares estimator $ \hat{\delta}$ within a sufficiently large compact subset $ \Delta_G \subset \mathbb{R}^
{(p+d+1) G + G+1} $, such as $ \Theta_H$, chosen from a fundamental range:

$\displaystyle \hat{\delta}_n = \mathop{\rm arg min}_{\delta \in \Delta_G} \frac...
...sum^
n_{t=p} \left\{ Z_{t+1}^ 2 - \nu _G (Z_t (p), X_t; \delta )
\right\}^ 2 , $

$\displaystyle \hat{g}_G (z,x) = \nu _G (z,x; \hat{\delta}_n ). $

As an estimator for the conditional volatility we immediately obtain:

$\displaystyle \hat{s}_{H,G}^ 2 (z,x) = \hat{g}_G (z,x) - \hat{f}_H ^ 2 (z,x). $

This estimator is in general guaranteed to be positive only for $ G=H$. In order to avoid this restriction one can follow the procedure used by Fan and Yao (1998), who have studied a similar problem for the kernel estimator of the conditional variance in a heteroscedastic regression model. Using this application the residuals

$\displaystyle \varepsilon_{t+1} = Z_{t+1} - f(Z_t (p), X_t) = s(Z_t (p), X_t) \, \xi _{t+1}
$

are approximated by the sample residuals

$\displaystyle \hat{\varepsilon}_{t+1} = Z_{t+1} - \hat{f}_H (Z_t (p), X_t),\ t = p, \ldots, n, $

Since the $ \xi_{t+1} $ has mean 0 and variance 1,

$\displaystyle {\mathop{\text{\rm\sf E}}}[\varepsilon _{t+1}^ 2 \vert Z_t (p) = z,\ X_t = x] = s^ 2 (z,x). $

We could approximate this function directly with a neural network with $ G$ neurons and the output function $ \nu_G (z,x; \delta)$, whose parameter are estimated by

$\displaystyle \hat{\delta}_n = \mathop{\rm arg min}_{\delta \in \Delta _G} \fra...
...} \left\{ \hat{\varepsilon}_{t+1}^ 2 - \nu_G (Z_t(p), X_t;
\delta)\right\}^ 2. $

The resulting estimators for the conditional volatility, which through the $ \hat{\varepsilon}_t$ is also dependent on $ H$, is then

$\displaystyle \hat{s}_{H,G} (z,x) = \nu_G (z,x; \hat{\delta}_n). $

Figure 18.10 shows the conditional volatilities estimated from the log returns of the exchange rate time series BP/USD together with some financial indicators using the procedure described above ($ 3$ periods are considered as time dependency and radial basis functions networks are used).

Fig.: Log-returns of exchange rate BP/USD and the estimated conditional variances by RBF neutral network. 34349 SFEnnarch.xpl
\includegraphics[width=1\defpicwidth]{nnarch.ps}

It is for arbitrary $ G,H$ automatically non-negative. Since the number of neurons essentially determines the smoothness of the network function, it can make sense when approximating $ f$ and $ s^
2$ to choose different networks with $ H\not= G$ neurons when it is believed that the smoothness of both functions are quite different from each other.

When the distribution of the innovations $ \xi_t$ is additionally specified in the model (18.4), we immediately obtain together with the estimators of $ f$ and $ s^
2$ an estimator of the conditional Value-at-Risk. If the distribution of $ \xi_t$ is, for example, N$ (0,1),$ then the conditional distribution of $ Z_{t+1}$ given the information $ Z_t(p)$ and $ X_t$ at date $ t$ is also a normal distribution with mean $ f(Z_t(p),X_t)$ and variance $ s^ 2 (Z_t(p), X_t).$ If $ q_\alpha ^ \circ $ is the $ \alpha$ quantile of the standard normal distribution, then the VaR process $ \{Z_t\}$, i.e., the conditional $ \alpha$ quantile of $ Z_{t+1}$ given $ Z_t(p), X_t$ is:

$\displaystyle VaR_{t+1} = f(Z_t(p), X_t) + s(Z_t(p), X_t) q_\alpha^ \circ.$

An estimator for this conditional Value-at-Risk based on a neural network can be obtained by replacing $ f$ and $ s$ with the appropriate estimator:

$\displaystyle \widehat{VaR}_{t+1} = \hat{f}_H (Z_t(p), X_t) + \hat{s}_{H,G}^ 2 (Z_t (p), X_t) q_\alpha ^ \circ.$ (19.5)

In doing this we can replace the standard normal distribution with another distribution, for example, with a standardized $ t$-distribution with mean 0 and variance 1. $ q_\alpha ^ \circ $ is then the corresponding $ \alpha$ quantile of the innovation distribution, i.e., the distribution of $ \xi_t$.

The estimator (18.5) for the Value-at-Risk assumes that $ Z_t$ is a non-linear ARX-ARCHX process of the form (18.4). Above all, however, it has the disadvantage of depending on the critical assumption of a specific distribution of $ \xi_t$. Above all the above mentioned procedure, in assuming a stochastic volatility model from the standard normal distribution, has been recently criticized in financial statistics due to certain empirical findings. The thickness of the tails of a distribution of a financial time series appears at times to be so pronounced that in order to adequately model it even the distribution of the innovations must be assumed to be leptokurtic. Due to the simplicity of the representation a $ t$-distribution with only a few degrees of freedom is often considered. In order to avoid the arbitrariness in the choice of the distribution of the innovations, it is possible to estimate the conditional quantile directly without relying on a model of the form (18.4). This application goes back to the regression quantile from Koenker and Bassett and has been applied by Abberger (1997) to time series in connection with kernel estimation.
We assume that $ Z_t$ is a stationary time series. As in Chapter 17 $ P_{t+1}$ represents the forecast distribution, i.e., the conditional distribution of $ Z_{t+1}$ given $ Z_t(p),
X_t.$ With $ F_{t+1}$ we depict the corresponding conditional distribution function

$\displaystyle F_{t+1} (y\vert z,x) = \P(Z_{t+1} \le y \vert Z_t (p) = z, X_t = x) $

for $ y \in \mathbb{R},\ z \in \mathbb{R}^ p, x \in \mathbb{R}^
d.$ $ q_\alpha (z,x)$ is the conditional $ \alpha$ quantile, i.e., the solution to the equation $ F_{t+1} (q_\alpha (z,x) \vert z,x) =
\alpha.$ The conditional quantile function $ q_\alpha (z,x)$ solves the minimization problem

$\displaystyle {\mathop{\text{\rm\sf E}}}\{ \alpha (Z_{t+1} - q)^+ + (1-\alpha) (Z_{t+1}-q)^ - \vert Z_t (p) = z,\ X_t = x\} = \min_{q\in \mathbb{R}} !$ (19.6)

where $ y^ + = y-\boldsymbol{1}(y\geq 0) $ and $ y^ - = \vert y\vert\cdot \boldsymbol{1}(y\leq 0)$ represent the positive and negative parts of $ y \in \mathbb{R}$. In order to estimate the quantile function directly with a neural network with $ H$ neurons, we approximate $ q_\alpha (z,x)$ with a network function $ \nu _H (z,x; \gamma)$ of the form (18.1), whose weight parameter $ \gamma$ lies in a fundamental range $ \Gamma _H \subset \mathbb{R} ^ {(p+d+1)
H+H+1}$. $ \gamma$ is estimated, however, not with the least squares method, but with the minimization of the corresponding sample values from (18.6):

$\displaystyle \hat{\gamma}_n = \mathop{\rm arg min}_{\gamma \in \Gamma_H} \frac{1}{n-p+1} \sum^
n_{t=p} \{ \alpha [Z_{t+1} - \nu _H (Z_t (q),X_t)]^ + $

$\displaystyle +(1-\alpha ) [Z_{t+1} -
\nu_H (Z_t(q), X_t)]^ - \} . $

As an estimator for the quantile function we obtain

$\displaystyle \hat{q}_{H\alpha} (z,x) = \nu_H (z,x; \hat{\gamma}_n) $

and with this the estimator for the conditional Value-at-Risk given $ Z_t, \ldots,$
$ Z_{t-p+1}$, $ X_t$

$\displaystyle \widehat{VaR}_{t+1} = \hat{q}_{H\alpha} (Z_t, \ldots, Z_{t-p+1}, X_t). $

White has shown that under suitable assumptions the function estimators $ \hat{q}_{H\alpha}(z,x)$ converge in probability to $ q(z,x)$ when the sample observations $ n \rightarrow
\infty$ and when at the same time the number of neurons $ H\rightarrow \infty$ at a suitable rate.