next up previous contents index
Next: References Up: 2. Random Number Generation Previous: 2.7 Available Software and

Subsections



2.8 Non-uniform Random Variate Generation

Like for the uniform case, non-uniform variate generation often involves approximations and compromises. The first requirement is, of course, correctness. This does not mean that the generated random variate $ X$ must always have exactly the required distribution, because this would sometimes be much too costly or even impossible. But we must have a good approximation and, preferably, some understanding of the quality of that approximation. Robustness is also important: when the accuracy depends on the parameters of the distribution, it must be good uniformly over the entire range of parameter values that we are interested in.

The method must also be efficient both in terms of speed and memory usage. Often, it is possible to increase the speed by using more memory (e.g, for larger precomputed tables) or by relaxing the accuracy requirements. Some methods need a one-time setup to compute constants and construct tables. The setup time can be significant but may be well worth spending if it is amortized by a large number of subsequent calls to the generator. For example, it makes sense to invest in a more extensive setup if we plan to make a million calls to a given generator than if we expert to make only a few calls, assuming that this investment can improve the speed of the generator sufficiently.

In general, compromises must be made between simplicity of the algorithm, quality of the approximation, robustness with respect to the distribution parameters, and efficiency (generation speed, memory requirements, and setup time).

In many situations, compatibility with variance reduction techniques is another important issue ([3,34]). We may be willing to sacrifice the speed of the generator to preserve inversion, because the gain in efficiency obtained via the variance reduction methods may more than compensate (sometimes by orders of magnitude) for the slightly slower generator.


2.8.1 Inversion

The inversion method, defined in the introduction, should be the method of choice for generating non-uniform random variates in a majority of situations. The fact that $ X = F^{\,-1}(U)$ is a monotone (non-decreasing) function of $ U$ makes this method compatible with important variance reduction techniques such as common random numbers, antithetic variates, latin hypercube sampling, and randomized quasi-Monte Carlo methods ([3,34,52]).

For some distributions, an analytic expression can be obtained for the inverse distribution function $ F^{\,-1}$ and inversion can be easily implemented. As an example, consider the Weibull distribution function with parameters $ \alpha>0$ and $ \beta>0$, defined by $ F(x) = 1 - \exp[-(x/\beta)^{\alpha}]$ for $ x>0$. It is easy to see that $ F^{\,-1}(U) = \beta[-\ln(1-U)]^{1/\alpha}$. For $ \alpha=1$, we have the special case of the exponential distribution with mean $ \beta$.

For an example of a simple discrete distribution, suppose that $ P[X=i]
= p_i$ where $ p_0 = 0.6$, $ p_1 = {0.3}$, $ p_2 = {0.1}$, and $ p_i = 0$ elsewhere. The inversion method in this case will return 0 if $ U < {0.6}$, $ 1$ if $ {0.6}\le U < {0.9}$, and $ 2$ if $ U \ge {0.9}$. For the discrete uniform distribution over $ \{0,\ldots,k-1\}$, return $ X = \lfloor kU\rfloor$. As another example, let $ X$ have the geometric distribution with parameter $ p$, so $ P[X=x] = p(1-p)^x$ for $ x=0,1,2,\ldots$, where $ 0<p<1$. Then, $ F(x) = 1 - (1-p)^{\lfloor x+1\rfloor}$ for $ x \ge 0$ and one can show that $ X = F^{\,-1}(U) = \lceil\ln(1-U)/\ln(1-p)\rceil -
1$.

There are other distributions (e.g., the normal, Student, chi-square) for which there is no closed-form expression for $ F^{\,-1}$ but good numerical approximations are available ([3,23,74]). When the distribution has only scale and location parameters, we need to approximate $ F^{\,-1}$ only for a standardized version of the distribution. For the normal distribution, for example, it suffices to have an efficient method for evaluating the inverse distribution function of a $ N(0,1)$ random variable $ Z$, since a normal with mean $ \mu$ and variance $ \sigma ^2$ can be generated by $ X = \sigma Z + \mu$. When shape parameters are involved (e.g., the gamma and beta distributions), things are more complicated because $ F^{\,-1}$ then depends on the parameters in a more fundamental manner.

[28] propose a general adaptive and automatic method that constructs a highly accurate Hermite interpolation method of $ F^{\,-1}$. In a one-time setup, their method produces tables for the interpolation points and coefficients. Random variate generation using these tables is then quite fast.

A less efficient but simpler way of implementing inversion when a method is available for computing $ F$ is via binary search ([6]). If the density is also available and if it is unimodal with known mode, a Newton-Raphson iteration method can avantageously replace the binary search ([6,16]).

To implement inversion for general discrete distributions, sequential search and binary search with look-up tables are the standard methods ([3,6]). For a discrete distribution over the values $ x_1 < \cdots < x_k$, one first tabulates the pairs $ (x_i, F(x_i))$, where $ F(x_i) = P[X \le x_i]$ for $ i=1,\ldots,k$. To generate $ X$, it then suffices to generate $ U\sim U(0,1)$, find $ I = \min\{i\mid
F(x_i)\ge U\}$, and return $ X = x_I$. The following algorithms do that.


Sequential search (needs $ O(k)$ iterations in the worst case);

    generate $ U\sim U(0,1)$;    let $ i = 1$;

    while $ F(x_i) < U$ do $ i = i+1$;

    return $ x_i$.


Binary search (needs $ O(\log k)$ iterations in the worst case);

    generate $ U\sim U(0,1)$;    let $ L = 0$ and $ R = k$;

    while $ L < R-1$ do

         $ m = \lfloor(L+R)/2\rfloor$;

        if $ F(x_m) < U$ then $ L = m$ else $ R = m$;

        /* Invariant: at this stage, the index $ I$ is in $ \{L+1,\ldots,R\}$. */

    return $ x_R$.


These algorithms can be modified in many different ways. For example, if $ k=\infty$, in the binary search, one can start with an arbitrary value of $ R$, double it until $ F(x_R)\ge U$, and start the algorithm with this $ R$ and $ L = R/2$. Of course, only a finite portion of the table (a portion that contains most of the probability mass) would be precomputed in this case, the other values can be computed only when needed. This can also be done if $ k$ is finite but large.

Another class of techniques use indexing or buckets to speed up the search ([5,3,16]). For example, one can partition the interval $ (0,1)$ into $ c$ subintervals of equal sizes and use (pre-tabulated) initial values of $ (L, R)$ that depend on the subinterval in which $ U$ falls. For the subinterval $ [\,j/c,\,
(\,j+1)/c)$ the values of $ L$ and $ R$ would be $ L_j = F^{\,-1}(\,j/c)$ and $ R_j = F^{\,-1}((\,j+1)/c)$, for $ j=0,\ldots,c-1$. The subinterval number that corresponds to a given $ U$ is simply $ J = \lfloor
cU\rfloor$. Once we know that subinterval, we can search it by linear of binary search. With a larger value of $ c$ the search is faster (on the average) but the setup is more costly and a larger amount of memory is needed. So a compromise must be made depending on the situation (e.g., the value of $ k$, the number of variates we expect to generate, etc.). For $ c=1$, we recover the basic sequential and binary search algorithms given above. A well-implemented indexed search with a large enough $ c$ is usually competitive with the alias method (described in the next paragraph). A combined indexed/binary search algorithm is given below. An easy adaptation gives the combined indexed/sequential search, which is generally preferable when $ k/c$ is small, because it has smaller overhead.


Indexed search (combined with binary search);

    generate $ U\sim U(0,1)$;     let $ J = \lfloor
cU\rfloor$, $ L = L_J$, and $ R = R_J$;

    while $ L < R-1$ do

         $ m = \lfloor(L+R)/2\rfloor$;

        if $ F(x_m) < U$ then $ L = m$ else $ R = m$;

    return $ x_R$.


These search methods are also useful for piecewise-linear (or piecewise-polynomial) distribution functions. Essentially, it suffices to add an interpolation step at the end of the algorithm, after the appropriate linear (or polynomial) piece has been determined ([3]).

Finally, the stochastic model itself can sometimes be selected in a way that makes inversion easier. For example, one can fit a parametric, highly-flexible, and easily computable inverse distribution function $ F^{\,-1}$ to the data, directly or indirectly ([79,94]).

There are situations where speed is important and where non-inversion methods are appropriate. In forthcoming subsections, we outline the main non-inversion methods.


2.8.2 The Alias Method

Sequential and binary search require $ O(k)$ and $ O(\log k)$ time, respectively, in the worst case, to generate a random variate $ X$ by inversion over the finite set $ \{x_1,\ldots,x_k\}$. The alias method ([95,96]) can generate such a $ X$ in $ O(1)$ time per variate, after a table setup that takes $ O(k)$ time and space. On the other hand, it does not implement inversion, i.e., the transformation from $ U$ to $ X$ is not monotone.

To explain the idea, consider a bar diagram of the distribution, where each index $ i$ has a bar of height $ p_i = P[X = x_i]$. The idea is to ''equalize'' the bars so that they all have height $ 1/k$, by cutting-off bar pieces and transfering them to other bars. This is done in a way that in the new diagram, each bar $ i$ contains one piece of size $ q_i$ (say) from the original bar $ i$ and one piece of size $ 1/k - q_i$ from another bar whose index $ j$, denoted $ A(i)$, is called the alias value of $ i$. The setup procedure initializes two tables, $ A$ and $ R$, where $ A(i)$ is the alias value of $ i$ and $ R(i) = (i-1)/k + q_i$. See [16] and [34] for the details. To generate $ X$, we generate $ U \sim
U[0,1]$, define $ I = \lceil k U\rceil$, and return $ X = x_I$ if $ U <
R(I)$ and $ X = x_{A(I)}$ otherwise.

There is a version of the alias method for continuous distributions, called the acceptance-complement method ([32,16,23]). The idea is to decompose the density $ f$ of the target distribution as the convex combination of two densities $ f_1$ and $ f_2$, $ f = w f_1 + (1-w) f_2$ for some real number $ w \in (0,1)$, in a way that $ w f_1\le g$ for some other density $ g$ and so that it is easy to generate from $ g$ and $ f_2$. The algorithm works as follows: Generate $ X$ from density $ g$ and $ U\sim U(0,1)$; if $ U g(X) \le w f_1(Y)$ return $ X$, otherwise generate a new $ X$ from density $ f_2$ and return it.


2.8.3 Kernel Density Estimation and Generation

Instead of selecting a parametric distribution that is hard to invert and estimating the parameters, one can estimate the density via a kernel density estimation method for which random variate generation is very easy ([16,27]). In the case of a gaussian kernel, for example, one can generate variates simply by selecting one observation at random from the data and adding random noise generated form a normal distribution with mean zero. However, this method is not equivalent to inversion. Because of the added noise, selecting a larger observation does not necessarily guarantee a larger value for the generated variate.


2.8.4 The Rejection Method

Now suppose we want to generate $ X$ from a complicated density $ f$. In fact $ f$ may be known only up to a multiplicative constant $ \kappa>0$, i.e., we know only $ \kappa f$. If we know $ f$, we may just take $ \kappa=1$. We select another density $ r$ such that $ \kappa f(x)
\le t(x) {\stackrel {\text{def}}{=}} ar(x)$ for all $ x$ for some constant $ a$, and such that generating variates $ Y$ from the density $ r$ is easy. The function $ t$ is called a hat function or majorizing function. By integrating this inequality with respect to $ x$ on both sides, we find that $ \kappa\le a$. The following rejection method generates $ X$ with density $ f$ ([93,16,19]):


Rejection method;

    repeat

        generate $ Y$ from the density $ r$ and $ U\sim U(0,1)$, independently;

    until $ U t(Y) \le \kappa f(Y)$;

    return $ X = Y$.


The number $ R$ of turns into the ''repeat'' loop is one plus a geometric random variable with parameter $ \kappa/a$, so $ E[R] =
a/\kappa$. Thus, we want $ a/\kappa\ge 1$ to be as small as possible, i.e., we want to minimize the area between $ \kappa f$ and $ t$. There is generally a compromise between bringing $ a/\kappa$ close to $ 1$ and keeping $ r$ simple.

When $ \kappa f$ is expensive to compute, we can also use squeeze functions $ q_1$ and $ q_2$ that are faster to evaluate and such that $ q_1(x) \le \kappa f(x)\le q_2(x)\le t(x)$ for all $ x$. To verify the condition $ U t(Y) \le \kappa f(Y)$, we first check if $ U t(Y) \le q_1(Y)$, in which case we accept $ Y$ immediately, otherwise we check if $ U t(Y) \ge q_2(Y)$, in which case we reject $ Y$ immediately. The value of $ \kappa f(Y)$ must be computed only when $ U
t(Y)$ falls between the two squeezes. Sequences of imbedded squeezes can also be used, where the primary ones are the least expensive to compute, the secondary ones are a little more expensive but closer to $ \kappa f$, etc.

It is common practice to transform the density $ f$ by a smooth increasing function $ T$ (e.g., $ T(x)=\log x$ or $ T(x) = - x^{-1/2}$) selected so that it is easier to construct good hat and squeeze functions (often piecewise linear) for the transformed density $ T(\,f(\cdot))$. By transforming back to the original scale, we get hat and squeeze functions for $ f$. This is the transformed density rejection method, which has several variants and extensions ([16,19,29]).

The rejection method works for discrete distributions as well; it suffices to replace densities by probability mass functions.


2.8.5 Thinning for Point Processes with Time-varying Rates

Thinning is a cousin of acceptance-rejection, often used for generating events from a non-homogeneous Poisson process. Suppose the process has rate $ \lambda(t)$ at time $ t$, with $ \lambda(t) \le \bar\lambda$ for all $ t$, where $ \bar\lambda$ is a finite constant. One can generate Poisson pseudo-arrivals at constant rate $ \bar\lambda$ by generating interarrival times that are i.i.d. exponentials of mean $ 1/\bar\lambda$. Then, a pseudo-arrival at time $ t$ is accepted (becomes an arrival) with probability $ \lambda(t)/\bar\lambda$ (i.e., if $ U \le
\lambda(t)/\bar\lambda$, where $ U$ is an independent $ U[0,1]$), and rejected with probability $ 1-\lambda(t)/\bar\lambda$. Non-homogeneous Poisson processes can also be generated by inversion ([3]). The idea is to apply a nonlinear transformation to the time scale to make the process homogeneous with rate $ 1$ in the new time scale. Arrival times are generated in this new time scale (which is easy), and then transformed back to the original time scale. The method can be adapted to other types of point processes with time-varying rates.


2.8.6 The Ratio-of-Uniforms Method

If $ f$ is a density over the real-line, $ \kappa$ an arbitrary positive constant, and the pair $ (U,V)$ has the uniform distribution over the set

$\displaystyle {\mathcal{C}} = \left\{(u,v) \in{\mathbb{R}}^2 \quad\text{such that}\quad 0\le u\le \sqrt{\kappa f(v/u)}\right\}\;,$    

then $ V/U$ has density $ f$ ([30,16,23]). This interesting property can be exploited to generate $ X$ with density $ f$: generate $ (U,V)$ uniformly over $ {\mathcal{C}}$ and return $ X =
V/U$. This is the ratio-of-uniforms method. The key issue is how do we generate a point uniformly over  $ {\mathcal{C}}$. In the cases where this can be done efficienly, we immediately have an efficient way of generating $ X$.

The most frequent approach for generating $ (U,V)$ uniformly over $ {\mathcal{C}}$ is the rejection method: Define a region $ {\mathcal{C}}_2$ that contains $ {\mathcal{C}}$ and in which it is easy to generate a point uniformly (for example, a rectangular box or a polygonal region). To generate $ X$, repeat: generate $ (U,V)$ uniformly over $ {\mathcal{C}}_2$, until it belongs to  $ {\mathcal{C}}$. Then return $ X =
V/U$. If there is another region $ {\mathcal{C}}_1$ contained in $ {\mathcal{C}}$ and for which it is very fast to check if a point $ (U,V)$ is in $ {\mathcal{C}}_1$, this $ {\mathcal{C}}_1$ can also be used as a squeeze to accelerate the verification that the point belongs to $ {\mathcal{C}}$. Several special cases and refinements are described in [16,23,66], and other references given there.


2.8.7 Composition and Convolution

Suppose $ F$ is a convex combination of several distributions, i.e., $ F(x) = \sum_{j} p_j F_j(x)$, or more generally $ F(x) = \int F_y(x)
\d H(y)$. To generate from $ F$, one can generate $ J=j$ with probability $ p_j$ (or $ Y$ from $ H$), then generate $ X$ from $ F_J$ (or $ F_Y$). This method, called the composition algorithm, is useful for generating from compound distributions such as the hyperexponential or from compound Poisson processes. It is also frequently used to design specialized algorithms for generating from complicated densities. The idea is to partition the area under the complicated density into pieces, where piece $ j$ has surface $ p_j$. To generate $ X$, first select a piece (choose piece $ j$ with probability $ p_j$), then draw a random point uniformly over that piece and project it to the horizontal axis. If the partition is defined so that it is fast and easy to generate from the large pieces, then $ X$ will be returned very quickly most of the time. The rejection method with a squeeze is often used to generate from some of the pieces.

A dual method to composition is the convolution method, which can be used when $ X = Y_1 + Y_2 + \cdots + Y_n$, where the $ Y_i$'s are independent with specified distributions. With this method, one just generates the $ Y_i$'s and sums up. This requires at least $ n$ uniforms. Examples of random variables that can be expressed as sums like this include the hypoexponential, Erlang, and binomial distributions.


2.8.8 Other Special Techniques

Besides the general methods, several specialized and sometimes very elegant techniques have been designed for commonly used distributions such as the Poisson distribution with small mean, the normal (e.g., the Box-Muller method), for generating points uniformly on a $ k$-dimensional sphere, for generating random permutations, and so on. Details can be found, e.g., in [3,6,16,21,23].

Recently, there has been an effort in developping automatic or black box algorithms for generating variates from an arbitrary (known) density, and reliable software that implements these methods ([27,29,67,68]).

Acknowledgements. This work has been supported by the Natural Sciences and Engineering Research Council of Canada (NSERC) Grant No. ODGP0110050, NATEQ-Québec grant No. 02ER3218, and a Canada Research Chair to the author. Wolfgang Hörmann, Josef Leydold, François Panneton, and Richard Simard made helpful comments and corrections on an earlier draft. The author has been asked to write chapters on Random Number Generation for several handbooks and encyclopedia recently. Inevitably, there is a large amount of duplication between these chapters.


next up previous contents index
Next: References Up: 2. Random Number Generation Previous: 2.7 Available Software and