Next: References Up: 2. Random Number Generation Previous: 2.7 Available Software and

2.8 Non-uniform Random Variate Generation

Like for the uniform case, non-uniform variate generation often involves approximations and compromises. The first requirement is, of course, correctness. This does not mean that the generated random variate must always have exactly the required distribution, because this would sometimes be much too costly or even impossible. But we must have a good approximation and, preferably, some understanding of the quality of that approximation. Robustness is also important: when the accuracy depends on the parameters of the distribution, it must be good uniformly over the entire range of parameter values that we are interested in.

The method must also be efficient both in terms of speed and memory usage. Often, it is possible to increase the speed by using more memory (e.g, for larger precomputed tables) or by relaxing the accuracy requirements. Some methods need a one-time setup to compute constants and construct tables. The setup time can be significant but may be well worth spending if it is amortized by a large number of subsequent calls to the generator. For example, it makes sense to invest in a more extensive setup if we plan to make a million calls to a given generator than if we expert to make only a few calls, assuming that this investment can improve the speed of the generator sufficiently.

In general, compromises must be made between simplicity of the algorithm, quality of the approximation, robustness with respect to the distribution parameters, and efficiency (generation speed, memory requirements, and setup time).

In many situations, compatibility with variance reduction techniques is another important issue ([3,34]). We may be willing to sacrifice the speed of the generator to preserve inversion, because the gain in efficiency obtained via the variance reduction methods may more than compensate (sometimes by orders of magnitude) for the slightly slower generator.

2.8.1 Inversion

The inversion method, defined in the introduction, should be the method of choice for generating non-uniform random variates in a majority of situations. The fact that $X = F^{\,-1}(U)$ is a monotone (non-decreasing) function of makes this method compatible with important variance reduction techniques such as common random numbers, antithetic variates, latin hypercube sampling, and randomized quasi-Monte Carlo methods ([3,34,52]).

For some distributions, an analytic expression can be obtained for the inverse distribution function $F^{\,-1}$ and inversion can be easily implemented. As an example, consider the Weibull distribution function with parameters $\alpha>0$ and $\beta>0$ , defined by $F(x) = 1 - \exp[-(x/\beta)^{\alpha}]$ for . It is easy to see that $F^{\,-1}(U) = \beta[-\ln(1-U)]^{1/\alpha}$ . For $\alpha=1$ , we have the special case of the exponential distribution with mean $\beta$ .

For an example of a simple discrete distribution, suppose that where , $p_1 = {0.3}$ , $p_2 = {0.1}$ , and elsewhere. The inversion method in this case will return 0 if , if ${0.6}\le U < {0.9}$ , and if $U \ge {0.9}$ . For the discrete uniform distribution over $\{0,\ldots,k-1\}$ , return $X = \lfloor kU\rfloor$ . As another example, let have the geometric distribution with parameter , so for $x=0,1,2,\ldots$ , where . Then, $F(x) = 1 - (1-p)^{\lfloor x+1\rfloor}$ for $x \ge 0$ and one can show that $X = F^{\,-1}(U) = \lceil\ln(1-U)/\ln(1-p)\rceil - 1$ .

There are other distributions (e.g., the normal, Student, chi-square) for which there is no closed-form expression for $F^{\,-1}$ but good numerical approximations are available ([3,23,74]). When the distribution has only scale and location parameters, we need to approximate $F^{\,-1}$ only for a standardized version of the distribution. For the normal distribution, for example, it suffices to have an efficient method for evaluating the inverse distribution function of a random variable , since a normal with mean $\mu$ and variance $\sigma ^2$ can be generated by $X = \sigma Z + \mu$ . When shape parameters are involved (e.g., the gamma and beta distributions), things are more complicated because $F^{\,-1}$ then depends on the parameters in a more fundamental manner.

[28] propose a general adaptive and automatic method that constructs a highly accurate Hermite interpolation method of $F^{\,-1}$ . In a one-time setup, their method produces tables for the interpolation points and coefficients. Random variate generation using these tables is then quite fast.

A less efficient but simpler way of implementing inversion when a method is available for computing is via binary search ([6]). If the density is also available and if it is unimodal with known mode, a Newton-Raphson iteration method can avantageously replace the binary search ([6,16]).

To implement inversion for general discrete distributions, sequential search and binary search with look-up tables are the standard methods ([3,6]). For a discrete distribution over the values $x_1 < \cdots < x_k$ , one first tabulates the pairs , where $F(x_i) = P[X \le x_i]$ for $i=1,\ldots,k$ . To generate , it then suffices to generate $U\sim U(0,1)$ , find $I = \min\{i\mid F(x_i)\ge U\}$ , and return . The following algorithms do that.

Sequential search (needs iterations in the worst case);

generate $U\sim U(0,1)$ ; let ;

while do ;

return .

Binary search (needs $O(\log k)$ iterations in the worst case);

generate $U\sim U(0,1)$ ; let and ;

while do

$m = \lfloor(L+R)/2\rfloor$ ;

if then else ;

/* Invariant: at this stage, the index is in $\{L+1,\ldots,R\}$ . */

return .

These algorithms can be modified in many different ways. For example, if $k=\infty$ , in the binary search, one can start with an arbitrary value of , double it until $F(x_R)\ge U$ , and start the algorithm with this and . Of course, only a finite portion of the table (a portion that contains most of the probability mass) would be precomputed in this case, the other values can be computed only when needed. This can also be done if is finite but large.

Another class of techniques use indexing or buckets to speed up the search ([5,3,16]). For example, one can partition the interval into subintervals of equal sizes and use (pre-tabulated) initial values of that depend on the subinterval in which falls. For the subinterval $[\,j/c,\, (\,j+1)/c)$ the values of and would be $L_j = F^{\,-1}(\,j/c)$ and $R_j = F^{\,-1}((\,j+1)/c)$ , for $j=0,\ldots,c-1$ . The subinterval number that corresponds to a given is simply $J = \lfloor cU\rfloor$ . Once we know that subinterval, we can search it by linear of binary search. With a larger value of the search is faster (on the average) but the setup is more costly and a larger amount of memory is needed. So a compromise must be made depending on the situation (e.g., the value of , the number of variates we expect to generate, etc.). For , we recover the basic sequential and binary search algorithms given above. A well-implemented indexed search with a large enough is usually competitive with the alias method (described in the next paragraph). A combined indexed/binary search algorithm is given below. An easy adaptation gives the combined indexed/sequential search, which is generally preferable when is small, because it has smaller overhead.

Indexed search (combined with binary search);

generate $U\sim U(0,1)$ ; let $J = \lfloor cU\rfloor$ , , and ;

while do

$m = \lfloor(L+R)/2\rfloor$ ;

if then else ;

return .

These search methods are also useful for piecewise-linear (or piecewise-polynomial) distribution functions. Essentially, it suffices to add an interpolation step at the end of the algorithm, after the appropriate linear (or polynomial) piece has been determined ([3]).

Finally, the stochastic model itself can sometimes be selected in a way that makes inversion easier. For example, one can fit a parametric, highly-flexible, and easily computable inverse distribution function $F^{\,-1}$ to the data, directly or indirectly ([79,94]).

There are situations where speed is important and where non-inversion methods are appropriate. In forthcoming subsections, we outline the main non-inversion methods.

2.8.2 The Alias Method

Sequential and binary search require and $O(\log k)$ time, respectively, in the worst case, to generate a random variate by inversion over the finite set $\{x_1,\ldots,x_k\}$ . The alias method ([95,96]) can generate such a in time per variate, after a table setup that takes time and space. On the other hand, it does not implement inversion, i.e., the transformation from to is not monotone.

To explain the idea, consider a bar diagram of the distribution, where each index has a bar of height . The idea is to ''equalize'' the bars so that they all have height , by cutting-off bar pieces and transfering them to other bars. This is done in a way that in the new diagram, each bar contains one piece of size (say) from the original bar and one piece of size from another bar whose index , denoted , is called the alias value of . The setup procedure initializes two tables, and , where is the alias value of and . See [16] and [34] for the details. To generate , we generate $U \sim U[0,1]$ , define $I = \lceil k U\rceil$ , and return if and $X = x_{A(I)}$ otherwise.

There is a version of the alias method for continuous distributions, called the acceptance-complement method ([32,16,23]). The idea is to decompose the density of the target distribution as the convex combination of two densities and , for some real number $w \in (0,1)$ , in a way that $w f_1\le g$ for some other density and so that it is easy to generate from and . The algorithm works as follows: Generate from density and $U\sim U(0,1)$ ; if $U g(X) \le w f_1(Y)$ return , otherwise generate a new from density and return it.

2.8.3 Kernel Density Estimation and Generation

Instead of selecting a parametric distribution that is hard to invert and estimating the parameters, one can estimate the density via a kernel density estimation method for which random variate generation is very easy ([16,27]). In the case of a gaussian kernel, for example, one can generate variates simply by selecting one observation at random from the data and adding random noise generated form a normal distribution with mean zero. However, this method is not equivalent to inversion. Because of the added noise, selecting a larger observation does not necessarily guarantee a larger value for the generated variate.

2.8.4 The Rejection Method

Now suppose we want to generate from a complicated density . In fact may be known only up to a multiplicative constant $\kappa>0$ , i.e., we know only $\kappa f$ . If we know , we may just take $\kappa=1$ . We select another density such that $\kappa f(x) \le t(x) {\stackrel {\text{def}}{=}} ar(x)$ for all for some constant , and such that generating variates from the density is easy. The function is called a hat function or majorizing function. By integrating this inequality with respect to on both sides, we find that $\kappa\le a$ . The following rejection method generates with density ([93,16,19]):

Rejection method;

repeat

generate from the density and $U\sim U(0,1)$ , independently;

until $U t(Y) \le \kappa f(Y)$ ;

return .

The number of turns into the ''repeat'' loop is one plus a geometric random variable with parameter $\kappa/a$ , so $E[R] = a/\kappa$ . Thus, we want $a/\kappa\ge 1$ to be as small as possible, i.e., we want to minimize the area between $\kappa f$ and . There is generally a compromise between bringing $a/\kappa$ close to and keeping simple.

When $\kappa f$ is expensive to compute, we can also use squeeze functions and that are faster to evaluate and such that $q_1(x) \le \kappa f(x)\le q_2(x)\le t(x)$ for all . To verify the condition $U t(Y) \le \kappa f(Y)$ , we first check if $U t(Y) \le q_1(Y)$ , in which case we accept immediately, otherwise we check if $U t(Y) \ge q_2(Y)$ , in which case we reject immediately. The value of $\kappa f(Y)$ must be computed only when falls between the two squeezes. Sequences of imbedded squeezes can also be used, where the primary ones are the least expensive to compute, the secondary ones are a little more expensive but closer to $\kappa f$ , etc.

It is common practice to transform the density by a smooth increasing function (e.g., $T(x)=\log x$ or $T(x) = - x^{-1/2}$ ) selected so that it is easier to construct good hat and squeeze functions (often piecewise linear) for the transformed density $T(\,f(\cdot))$ . By transforming back to the original scale, we get hat and squeeze functions for . This is the transformed density rejection method, which has several variants and extensions ([16,19,29]).

The rejection method works for discrete distributions as well; it suffices to replace densities by probability mass functions.

2.8.5 Thinning for Point Processes with Time-varying Rates

Thinning is a cousin of acceptance-rejection, often used for generating events from a non-homogeneous Poisson process. Suppose the process has rate $\lambda(t)$ at time , with $\lambda(t) \le \bar\lambda$ for all , where $\bar\lambda$ is a finite constant. One can generate Poisson pseudo-arrivals at constant rate $\bar\lambda$ by generating interarrival times that are i.i.d. exponentials of mean $1/\bar\lambda$ . Then, a pseudo-arrival at time is accepted (becomes an arrival) with probability $\lambda(t)/\bar\lambda$ (i.e., if $U \le \lambda(t)/\bar\lambda$ , where is an independent ), and rejected with probability $1-\lambda(t)/\bar\lambda$ . Non-homogeneous Poisson processes can also be generated by inversion ([3]). The idea is to apply a nonlinear transformation to the time scale to make the process homogeneous with rate in the new time scale. Arrival times are generated in this new time scale (which is easy), and then transformed back to the original time scale. The method can be adapted to other types of point processes with time-varying rates.

2.8.6 The Ratio-of-Uniforms Method

If is a density over the real-line, $\kappa$ an arbitrary positive constant, and the pair has the uniform distribution over the set

$\displaystyle {\mathcal{C}} = \left\{(u,v) \in{\mathbb{R}}^2 \quad\text{such that}\quad 0\le u\le \sqrt{\kappa f(v/u)}\right\}\;,$

then

has density

([30,16,23]). This interesting property can be exploited to generate

with density

: generate

uniformly over ${\mathcal{C}}$ and return

. This is the ratio-of-uniforms method. The key issue is how do we generate a point uniformly over ${\mathcal{C}}$ . In the cases where this can be done efficienly, we immediately have an efficient way of generating

The most frequent approach for generating uniformly over ${\mathcal{C}}$ is the rejection method: Define a region ${\mathcal{C}}_2$ that contains ${\mathcal{C}}$ and in which it is easy to generate a point uniformly (for example, a rectangular box or a polygonal region). To generate , repeat: generate uniformly over ${\mathcal{C}}_2$ , until it belongs to ${\mathcal{C}}$ . Then return . If there is another region ${\mathcal{C}}_1$ contained in ${\mathcal{C}}$ and for which it is very fast to check if a point is in ${\mathcal{C}}_1$ , this ${\mathcal{C}}_1$ can also be used as a squeeze to accelerate the verification that the point belongs to ${\mathcal{C}}$ . Several special cases and refinements are described in [16,23,66], and other references given there.

2.8.7 Composition and Convolution

Suppose is a convex combination of several distributions, i.e., $F(x) = \sum_{j} p_j F_j(x)$ , or more generally $F(x) = \int F_y(x) \d H(y)$ . To generate from , one can generate with probability (or from ), then generate from (or ). This method, called the composition algorithm, is useful for generating from compound distributions such as the hyperexponential or from compound Poisson processes. It is also frequently used to design specialized algorithms for generating from complicated densities. The idea is to partition the area under the complicated density into pieces, where piece has surface . To generate , first select a piece (choose piece with probability ), then draw a random point uniformly over that piece and project it to the horizontal axis. If the partition is defined so that it is fast and easy to generate from the large pieces, then will be returned very quickly most of the time. The rejection method with a squeeze is often used to generate from some of the pieces.

A dual method to composition is the convolution method, which can be used when $X = Y_1 + Y_2 + \cdots + Y_n$ , where the 's are independent with specified distributions. With this method, one just generates the 's and sums up. This requires at least uniforms. Examples of random variables that can be expressed as sums like this include the hypoexponential, Erlang, and binomial distributions.

2.8.8 Other Special Techniques

Besides the general methods, several specialized and sometimes very elegant techniques have been designed for commonly used distributions such as the Poisson distribution with small mean, the normal (e.g., the Box-Muller method), for generating points uniformly on a -dimensional sphere, for generating random permutations, and so on. Details can be found, e.g., in [3,6,16,21,23].

Recently, there has been an effort in developping automatic or black box algorithms for generating variates from an arbitrary (known) density, and reliable software that implements these methods ([27,29,67,68]).

Acknowledgements. This work has been supported by the Natural Sciences and Engineering Research Council of Canada (NSERC) Grant No. ODGP0110050, NATEQ-Québec grant No. 02ER3218, and a Canada Research Chair to the author. Wolfgang Hörmann, Josef Leydold, François Panneton, and Richard Simard made helpful comments and corrections on an earlier draft. The author has been asked to write chapters on Random Number Generation for several handbooks and encyclopedia recently. Inevitably, there is a large amount of duplication between these chapters.

Next: References Up: 2. Random Number Generation Previous: 2.7 Available Software and