1.5 Variance Reduction Techniques in Monte-Carlo Simulation


1.5.1 Monte-Carlo Sampling Method

The partial Monte-Carlo method is a Monte-Carlo simulation that is performed by generating underlying prices given the statistical model and then valuing them using the simple delta-gamma approximation. We denote $ X$ as a vector of risk factors, $ \Delta V$ as the change in portfolio value resulting from $ X$, $ L$ as $ -\Delta V$, $ \alpha$ as a confidence level and $ l$ as a loss threshold.

We also let

Equation 1.1 defines the class of Delta-Gamma normal methods. The detailed procedures to implement the partial Monte-Carlo method are as follows

  1. Generate $ N$ scenarios by simulating risk factors $ X_{1}
,...,X_{N}$ according to $ \Sigma_{X}$;
  2. Revalue the portfolio and determine the loss in the portfolio values $ L_{1},...,L_{N}$ using the simple delta-gamma approximation;
  3. Calculate the fraction of scenarios in which losses exceed $ l$:

    $\displaystyle N^{-1}\sum_{i=1}^{N} \boldsymbol{1}(L_{i}>l),$ (1.44)

    where $ \boldsymbol{1}(L_{i}>l)=1$ if $ L_{i}>l$ and 0 otherwise.

The partial Monte-Carlo method is flexible and easy to implement. It provides the accurate estimation of the VaR when the loss function is approximately quadratic. However, one drawback is that for a large number of risk factors, it requires a large number of replications and takes a long computational time. According to Boyle et al. (1998), the convergence rate of the Monte-Carlo estimate is $ 1/\sqrt{N}$. Different variance reduction techniques have been developed to increase the precision and speed up the process. In the next section, we will give a brief overview of different types of variance reduction techniques, Boyle et al. (1998).

  1. Antithetic Method

    We assume $ W_{i}=f(z_{i})$, where $ z_{i}\in\mathbb{R}^{m}$ are independent samples from the standard normal distribution. In our case, the function $ f$ is defined as

    $\displaystyle f(z_{i})=I(L_{i}>l)=I[-\sum_{i=1}^{m}(\delta_{i}z_{i}+\frac{1}{2}\lambda_{i}z_{i}^{2})>l].$ (1.45)

    Based on $ N$ replications, an unbiased estimator of the $ \mu=E(W)$ is given by

    $\displaystyle \hat{\mu}=\frac{1}{N}\sum_{i=1}^{N}W_{i}=\frac{1}{N}\sum_{i=1}^{N}f(z_{i}).$ (1.46)

    In this context, the method of antithetic variates is based on the observation that if $ z_{i}$ has a standard normal distribution, then so does $ -z_{i}$. Similarly, each

    $\displaystyle \tilde{\mu}=\frac{1}{N}\sum_{i=1}^{N}f(-z_{i})$ (1.47)

    is also an unbiased estimator of $ \mu$. Therefore,

    $\displaystyle \hat{\mu}_{AV}=\frac{\hat{\mu}+\tilde{\mu}}{2}$ (1.48)

    is an unbiased estimator of $ \mu$ as well.

    The intuition behind the antithetic method is that the random inputs obtained from the collection of antithetic pairs $ (z_{i},-z_{i})$ are more regularly distributed than a collection of $ 2N$ independent samples. In particular, the sample mean over the antithetic pairs always equals the population mean of 0, whereas the mean over finitely many independent samples is almost surely different from 0.

  2. Control Variates

    The basic idea of control variates is to replace the evaluation of an unknown expectation with the evaluation of the difference between the unknown quantity and another expectation whose value is known. The standard Monte-Carlo estimate of $ \mu=E[W_{i}]=E[f(z_{i})]$ is $ \frac{1}{N}\sum_{i=1}^{N}W_{i}$. Suppose we know $ \tilde{\mu}=E[g(z_{i})]$. The method of control variates uses the known error

    $\displaystyle \frac{1}{N}\sum_{i=1}^{N}\tilde{W_{i}}-\tilde{\mu}$ (1.49)

    to reduce the unknown error

    $\displaystyle \frac{1}{N}\sum_{i=1}^{N}W_{i}-\mu.$ (1.50)

    The controlled estimator has the form

    $\displaystyle \frac{1}{N}\sum_{i=1}^{N}W_{i}-\beta(\frac{1}{N}\sum_{i=1}^{N}\tilde{W_{i}}-\tilde{\mu}).$ (1.51)

    Since the term in parentheses has expectation zero, equation (1.52) provides an unbiased estimator of $ \mu$ as long as $ \beta $ is independent. In practice, if the function $ g(z_{i})$ provides a close approximation of $ f(z_{i})$, we usually set $ \beta=1$ to simplify the calculation.

  3. Moment Matching Method

    Let $ z_{i}, i=1,...,n,$ denote an independent standard normal random vector used to drive a simulation. The sample moments will not exactly match those of the standard normal. The idea of moment matching is to transform the $ z_{i}$ to match a finite number of the moments of the underlying population. For example, the first and second moment of the normal random number can be matched by defining

    $\displaystyle \tilde{z_{i}}=(z_{i}-\tilde{z})\frac{\sigma_{z}}{s_{z}}+\mu_{z}, i=1,.....n$ (1.52)

    where $ \tilde{z}$ is the sample mean of the $ z_{i}$, $ \sigma_{z}$ is the population standard deviation, $ s_{z}$ is the sample standard deviation of $ z_{i}$, and $ \mu_{z}$ is the population mean.

    The moment matching method can be extended to match covariance and higher moments as well.

  4. Stratified Sampling

    Like many variance reduction techniques, stratified sampling seeks to make the inputs to simulation more regular than the random inputs. In stratified sampling, rather than drawing $ z_{i}$ randomly and independent from a given distribution, the method ensures that fixed fractions of the samples fall within specified ranges. For example, we want to generate $ N$ $ m$-dimensional normal random vectors for simulation input. The empirical distribution of an independent sample $ (z_{1},\ldots,z_{N})$ will look only roughly like the true normal density; the rare events - which are important for calculating the VaR - will inevitably be underrepresented. Stratified sampling can be used to ensure that exactly one observation $ z_{i}^{k}$ lies between the $ (i-1)/N$ and $ i/N$ quantiles ($ i=1,...,N$) of the $ k$-th marginal distribution for each of the $ m$ components. One way to implement that is to generate $ N m$ independent uniform random numbers $ u_{i}^{k}$ on $ [0,1]$ ( $ k=1,\ldots,m, i=1,\ldots,N$) and set

    $\displaystyle \tilde{z}_{i}^{k}=\Phi^{-1}[(i+u_{i}^{k}-1)/N], i=1,....,N$ (1.53)

    where $ \Phi^{-1}$ is the inverse of the standard normal cdf. (In order to achieve satisfactory sampling results, we need a good numerical procedure to calculate $ \Phi^{-1}$.) An alternative is to apply the stratification only to the most important components (directions), usually associated to the eigenvalues of largest absolute value.

  5. Latin Hypercube Sampling

    The Latin Hypercube Sampling method was first introduced by McKay et al. (1979). In the Latin Hypercube Sampling method, the range of probable values for each component $ u_{i}^{k}$ is divided into $ N$ segments of equal probability. Thus, the $ m$-dimensional space, consisting of $ k$ parameters, is partitioned into $ N^{m}$ cells, each having equal probability. For example, for the case of dimension $ m=2$ and $ N=10$ segments, the parameter space is divided into $ 10\times10$ cells. The next step is to choose 10 cells from the $ 10\times10$ cells. First, the uniform random numbers are generated to calculate the cell number. The cell number indicates the segment number the sample belongs to, with respect to each of the parameters. For example, a cell number (1,8) indicates that the sample lies in the segment 1 with respect to first parameter, segment 10 with respect to second parameter. At each successive step, a random sample is generated, and is accepted only if it does not agree with any previous sample on any of the segment numbers.

  6. Importance sampling

    The technique builds on the observation that an expectation under one probability measure can be expressed as an expectation under another through the use of a likelihood ratio. The intuition behind the method is to generate more samples from the region that is more important to the practical problem at hand. In next the section, we will give a detailed description of calculating VaR by the partial Monte-Carlo method with importance sampling.


1.5.2 Partial Monte-Carlo with Importance Sampling

In the basic partial Monte-Carlo method, the problem of sampling changes in market risk factors $ X_{i}$ is transformed into a problem of sampling the vector $ z$ of underlying standard normal random variables. In importance sampling, we will change the distribution of $ z$ from $ \textrm{N}(0,I)$ to $ \textrm{N}(\mu, \Sigma)$. The key steps proposed by Glasserman et al. (2000) are to calculate

$\displaystyle P(L>l)=E_{\mu,\Sigma}[\theta(z)I(L>l)]$ (1.54)

Expectation is taken with $ z$ sampled from $ \textrm{N}(\mu, \Sigma)$ rather than its original distribution $ \textrm{N}(0,I)$. To correct for this change of distribution, we weight the loss indictor $ I(L>l)$ by the likelihood ratio

$\displaystyle \theta(z)=\vert\Sigma\vert^{1/2}e^{-\frac{1}{2}\mu^{\top}\Sigma^{-1}\mu}e^{-\frac{1}{2}[z^{\top} (I-\Sigma^{-1})z-2\mu^{\top}\Sigma^{-1}z]},$ (1.55)

which is simply the ratio of $ \textrm{N}[0,I]$ and $ \textrm{N}[\mu,\Sigma]$ densities evaluated at $ z$.

The next task is to choose $ \mu$ and $ \Sigma$ so that the Monte-Carlo estimator will have minimum variance. The key to reducing the variance is making the likelihood ratio small when $ L>l$. Equivalently, $ \mu$ and $ \Sigma$ should be chosen in the way to make $ L>l$ more likely under $ \textrm{N}(\mu, \Sigma)$ than under $ \textrm{N}(0,I)$. The steps of the algorithm are following:

  1. Decomposition Process

    We follow the decomposition steps described in the section 1.2 and find the cumulant generating function of $ L$ given by

    $\displaystyle \kappa(\omega)=\sum_{i=1}^{m}\frac{1}{2}[\frac{(\omega\delta_{i})^{2}}{1-\omega\lambda_{i}} -\log(1-\omega\lambda_{i})]$ (1.56)

  2. Transform $ \textrm{N}(0,I)$ to $ \textrm{N}(\mu, \Sigma)$

    If we take the first derivative of $ \kappa(\omega)$ with respect to $ \omega$, we will get:

    $\displaystyle \frac{d}{d\omega}\kappa(\omega)=E_{\mu(\omega),\Sigma(\omega)}[L]=l$ (1.57)

    where $ \Sigma(\omega)=(I-\omega\Lambda)^{-1}$ and $ \mu(\omega)=\omega\Sigma(\omega)\delta$. Since our objective is to estimate $ P(L>l)$, we will choose $ \omega$ to be the solution of equation (1.58). The loss exceeding scenarios $ (L>l)$, which were previously rare under $ \textrm{N}(0,I)$, are typical under $ \textrm{N}(\mu, \Sigma)$, since the expected value of the approximate value $ L$ is now $ l$. According to Glasserman et al. (2000), the effectiveness of this importance sampling procedure is not very sensitive to the choice of $ \omega$.

    After we get $ \textrm{N}(\mu(\omega),\Sigma(\omega))$, we can follow the same steps in the basic partial Monte-Carlo simulation to calculate the VaR. The only difference is that the fraction of scenarios in which losses exceed $ l$ is calculated by:

    $\displaystyle \frac{1}{N}\sum_{i=1}^{N}[\exp(-\omega L_{i}+\kappa(\omega))I(L_{i}>l)]$ (1.58)

    An important feature of this method is that it can be easily added to an existing implementation of partial Monte-Carlo simulation. The importance sampling algorithm differs only in how it generates scenarios and in how it weights scenarios as in equation (1.59).


1.5.3 XploRe Examples


VaRMC
= 5139 VaRestMC (VaRdelta, VaRgamma, VaRcovmatrix, smethod, opt) Partial Monte-Carlo method to calculate VaR based on Delta-Gamma Approximation.

The function 5142 VaRestMC uses the different types of variance reduction to calculate the VaR by the partial Monte-Carlo simulation. We employ the variance reduction techniques of moment matching, Latin Hypercube Sampling and importance sampling. The output is the estimated VaR. In order to test the efficiency of different Monte-Carlo sampling methods, we collect data from the M D *BASE and construct a portfolio consisting of three German stocks (Bayer, Deutsche Bank, Deutsche Telekom) and corresponding 156 options on these underlying stocks with maturity ranging from 18 to 211 days on May 29, 1999. The total portfolio value is 62,476 EUR. The covariance matrix for the stocks is provided as well. Using the Black-Scholes model, we also construct the aggregate delta and aggregate gamma as the input to the Quantlet. By choosing the importance sampling method, 0.01 confidence level, 1 days forecast horizon and 1,000 times of simulation, the result of the estimation is as follows.




5158 XFGVaRMC.xpl

Contents of VaRMC

[1,]   771.73

It tells us that we expect the loss to exceed 771.73 EUR or 1.24% of portfolio value with less than 1% probability in 1 day. However, the key question of the empirical example is that how much variance reduction is achieved by the different sampling methods. We run each of the four sampling methods 1,000 times and estimated the standard error of the estimated VaR for each sampling method. The table (1.1) summarizes the results.


Table 1.1: Variance Reduction of Estimated VaR for German Stock Option Portfolio
1.3mm4pt
Estimated VaR Standard Error Variance Reduction
Plain-Vanilla 735.75 36.96 0%
Moment Matching 734.92 36.23 1.96%
Latin Hypercube 757.83 21.32 42.31%
Importance Sampling 761.75 5.66 84.68%


As we see from the table (1.1), the standard error of the importance sampling is 84.68% less than those of plain-vanilla sampling and it demonstrates that approximately 42 times more scenarios would have to be generated using the plain-vanilla method to achieve the same precision obtained by importance sampling based on Delta-Gamma approximation. These results clearly indicate the great potential speed-up of estimation of the VaR by using the importance sampling method. This is why we set the importance sampling as the default sampling method in the function 5163 VaRestMC . However, the Latin Hypercube sampling method also achieved 42.31% of variance reduction. One advantage of the Latin Hypercube sampling method is that the decomposition process is not necessary. Especially when the number of risk factors ($ m$) is large, the decomposition ( $ {\mathcal{O}}(m^{3})$) dominates the sampling ( $ {\mathcal{O}}(m)$) and summation $ O(1)$ in terms of computational time. In this case, Latin Hypercube sampling may offer the better performance in terms of precision for a given computational time.