Like for the uniform case, non-uniform variate generation often
involves approximations and compromises. The first requirement is, of
course, correctness. This does not mean that the generated
random variate must always have exactly the required
distribution, because this would sometimes be much too costly or even
impossible. But we must have a good approximation and,
preferably, some understanding of the quality of that approximation.
Robustness is also important: when the accuracy depends on
the parameters of the distribution, it must be good uniformly
over the entire range of parameter values that we are interested in.
The method must also be efficient both in terms of speed and memory usage. Often, it is possible to increase the speed by using more memory (e.g, for larger precomputed tables) or by relaxing the accuracy requirements. Some methods need a one-time setup to compute constants and construct tables. The setup time can be significant but may be well worth spending if it is amortized by a large number of subsequent calls to the generator. For example, it makes sense to invest in a more extensive setup if we plan to make a million calls to a given generator than if we expert to make only a few calls, assuming that this investment can improve the speed of the generator sufficiently.
In general, compromises must be made between simplicity of the algorithm, quality of the approximation, robustness with respect to the distribution parameters, and efficiency (generation speed, memory requirements, and setup time).
In many situations, compatibility with variance reduction techniques is another important issue ([3,34]). We may be willing to sacrifice the speed of the generator to preserve inversion, because the gain in efficiency obtained via the variance reduction methods may more than compensate (sometimes by orders of magnitude) for the slightly slower generator.
The inversion method, defined in the introduction, should be the
method of choice for generating non-uniform random variates in
a majority of situations. The fact that
is a monotone
(non-decreasing) function of
makes this method compatible with
important variance
reduction techniques such as common random numbers, antithetic
variates, latin hypercube sampling, and randomized quasi-Monte Carlo
methods ([3,34,52]).
For some distributions, an analytic expression can be obtained for the
inverse distribution function and inversion can be easily
implemented. As an example, consider the Weibull
distribution function with parameters
and
,
defined by
for
. It is
easy to see that
. For
, we have the special case of the exponential distribution
with mean
.
For an example of a simple discrete distribution, suppose that
where
,
,
,
and
elsewhere. The inversion method in this case will
return 0 if
,
if
, and
if
. For the discrete uniform distribution over
, return
. As another
example, let
have the geometric distribution with
parameter
, so
for
, where
. Then,
for
and one can show that
.
There are other distributions (e.g., the normal, Student, chi-square)
for which there is no closed-form expression for but good
numerical approximations are available
([3,23,74]). When the distribution has only scale
and location parameters, we need to approximate
only for
a standardized version of the distribution. For the normal
distribution, for example, it suffices to have an efficient method for
evaluating the inverse distribution function of a
random
variable
, since a normal with mean
and variance
can be generated by
.
When shape parameters are involved (e.g., the gamma and beta
distributions), things are more complicated because
then
depends on the parameters in a more fundamental manner.
[28] propose a general
adaptive and automatic method that constructs a highly accurate
Hermite interpolation method of . In a one-time setup, their
method produces tables for the interpolation points and coefficients.
Random variate generation using these tables is then quite fast.
A less efficient but simpler way of implementing inversion when
a method is available for computing is via binary search
([6]). If the density is also available and if it is
unimodal with known mode, a Newton-Raphson iteration method can
avantageously replace the binary search ([6,16]).
To implement inversion for general discrete distributions, sequential
search and binary search with look-up tables are the standard methods
([3,6]). For a discrete distribution over the values
, one first tabulates the pairs
,
where
for
. To generate
, it
then suffices to generate
, find
, and return
. The following algorithms do
that.
Sequential search (needs iterations in the worst case);
generate
; let
;
while
do
;
return .
Binary search (needs iterations in the worst case);
generate
; let
and
;
while do
;
if
then
else
;
/* Invariant: at this stage, the index is in
. */
return .
These algorithms can be modified in many different ways. For example,
if , in the binary search, one can start with an arbitrary
value of
, double it until
, and start the algorithm
with this
and
. Of course, only a finite portion of the
table (a portion that contains most of the probability mass) would be
precomputed in this case, the other values can be computed only when
needed. This can also be done if
is finite but large.
Another class of techniques use indexing or buckets to speed up the
search ([5,3,16]). For example, one can
partition the interval into
subintervals of equal sizes
and use (pre-tabulated) initial values of
that depend on the
subinterval in which
falls. For the subinterval
the values of
and
would be
and
, for
. The subinterval number
that corresponds to a given
is simply
. Once we know that subinterval, we can search it by linear
of binary search. With a larger value of
the search is faster (on
the average) but the setup is more costly and a larger amount of
memory is needed. So a compromise must be made depending on the
situation (e.g., the value of
, the number of variates we expect to
generate, etc.). For
, we recover the basic sequential and
binary search algorithms given above. A well-implemented indexed
search with a large enough
is usually competitive with the alias
method (described in the next paragraph). A combined indexed/binary
search algorithm is given below. An easy adaptation gives the
combined indexed/sequential search, which is generally preferable
when
is small, because it has smaller overhead.
Indexed search (combined with binary search);
generate
;
let
,
, and
;
while do
;
if
then
else
;
return .
These search methods are also useful for piecewise-linear (or piecewise-polynomial) distribution functions. Essentially, it suffices to add an interpolation step at the end of the algorithm, after the appropriate linear (or polynomial) piece has been determined ([3]).
Finally, the stochastic model itself can sometimes be selected in
a way that makes inversion easier. For example, one can fit
a parametric, highly-flexible, and easily computable inverse
distribution function to the data, directly or indirectly
([79,94]).
There are situations where speed is important and where non-inversion methods are appropriate. In forthcoming subsections, we outline the main non-inversion methods.
Sequential and binary search require and
time,
respectively, in the worst case, to generate a random variate
by
inversion over the finite set
. The alias
method ([95,96]) can generate such a
in
time
per variate, after a table setup that takes
time and space. On
the other hand, it does not implement inversion, i.e., the
transformation from
to
is not monotone.
To explain the idea, consider a bar diagram of the distribution, where
each index has a bar of height
. The idea is to
''equalize'' the bars so that they all have height
, by
cutting-off bar pieces and transfering them to other bars. This is
done in a way that in the new diagram, each bar
contains one piece
of size
(say) from the original bar
and one piece of size
from another bar whose index
, denoted
, is
called the alias value of
. The setup procedure
initializes two tables,
and
, where
is the alias value
of
and
. See [16] and
[34] for the details. To generate
, we generate
, define
, and return
if
and
otherwise.
There is a version of the alias method for continuous distributions,
called the acceptance-complement method
([32,16,23]). The idea is to decompose the
density of the target distribution as the convex combination of
two densities
and
,
for some real
number
, in a way that
for some other
density
and so that it is easy to generate from
and
.
The algorithm works as follows: Generate
from density
and
; if
return
, otherwise
generate a new
from density
and return it.
Instead of selecting a parametric distribution that is hard to invert and estimating the parameters, one can estimate the density via a kernel density estimation method for which random variate generation is very easy ([16,27]). In the case of a gaussian kernel, for example, one can generate variates simply by selecting one observation at random from the data and adding random noise generated form a normal distribution with mean zero. However, this method is not equivalent to inversion. Because of the added noise, selecting a larger observation does not necessarily guarantee a larger value for the generated variate.
Now suppose we want to generate from a complicated density
.
In fact
may be known only up to a multiplicative constant
, i.e., we know only
. If we know
, we may just
take
. We select another density
such that
for all
for some constant
, and such
that generating variates
from the density
is easy. The
function
is called a hat function
or majorizing function. By integrating this inequality with
respect to
on both sides, we find that
. The
following rejection method generates
with density
([93,16,19]):
Rejection method;
repeat
generate from the density
and
, independently;
until
;
return .
The number of turns into the ''repeat'' loop is one plus
a geometric random variable with parameter
, so
. Thus, we want
to be as small as possible,
i.e., we want to minimize the area between
and
. There
is generally a compromise between bringing
close to
and
keeping
simple.
When is expensive to compute, we can also use
squeeze functions
and
that are faster to
evaluate and such that
for
all
. To verify the condition
, we first
check if
, in which case we accept
immediately,
otherwise we check if
, in which case we reject
immediately. The value of
must be computed only when
falls between the two squeezes. Sequences of imbedded squeezes
can also be used, where the primary ones are the least expensive to
compute, the secondary ones are a little more expensive but closer to
, etc.
It is common practice to transform the density by a smooth
increasing function
(e.g.,
or
)
selected so that it is easier to construct good hat and squeeze
functions (often piecewise linear) for the transformed density
. By transforming back to the original scale, we get hat
and squeeze functions for
.
This is the transformed density rejection method, which has
several variants and extensions ([16,19,29]).
The rejection method works for discrete distributions as well; it suffices to replace densities by probability mass functions.
Thinning is a cousin of acceptance-rejection, often used for
generating events from a non-homogeneous Poisson process.
Suppose the process has rate
at time
, with
for all
, where
is
a finite constant. One can generate Poisson pseudo-arrivals
at constant rate
by generating interarrival times that
are i.i.d. exponentials of mean
. Then,
a pseudo-arrival at time
is accepted (becomes an arrival) with
probability
(i.e., if
, where
is an independent
), and
rejected with probability
. Non-homogeneous
Poisson processes can also be generated by inversion ([3]).
The idea is to apply a nonlinear transformation to the time scale to
make the process homogeneous with rate
in the new time scale.
Arrival times are generated in this new time scale (which is easy),
and then transformed back to the original time scale. The method can
be adapted to other types of point processes with time-varying rates.
If is a density over the real-line,
an arbitrary positive
constant, and the pair
has the uniform distribution over the
set
![]() |
The most frequent approach for generating uniformly over
is the rejection method: Define a region
that contains
and in which it is easy to generate a point uniformly (for example,
a rectangular box or a polygonal region). To generate
, repeat:
generate
uniformly over
, until it belongs
to
. Then return
. If there is another
region
contained in
and for which it
is very fast to check if a point
is in
, this
can also be used as a squeeze to accelerate the
verification that the point belongs to
. Several
special cases and refinements are described in
[16,23,66], and other references given there.
Suppose is a convex combination of several distributions, i.e.,
, or more generally
. To generate from
, one can generate
with
probability
(or
from
), then generate
from
(or
).
This method, called the composition algorithm, is useful for
generating from compound distributions such as the
hyperexponential or from compound Poisson processes. It is also
frequently used to design specialized algorithms for generating from
complicated densities. The idea is to partition the area under the
complicated density into pieces, where piece
has surface
.
To generate
, first select a piece (choose piece
with
probability
), then draw a random point uniformly over that piece
and project it to the horizontal axis. If the partition is defined so
that it is fast and easy to generate from the large pieces, then
will be returned very quickly most of the time. The rejection method
with a squeeze is often used to generate from some of the pieces.
A dual method to composition is the convolution method, which
can be used when
, where the
's are
independent with specified distributions. With this method, one just
generates the
's and sums up. This requires at least
uniforms. Examples of random variables that can be expressed as sums
like this include the hypoexponential, Erlang, and binomial
distributions.
Besides the general methods, several specialized and sometimes very
elegant techniques have been designed for commonly used distributions
such as the Poisson distribution with small mean, the normal (e.g.,
the Box-Muller method), for generating points uniformly on a
-dimensional sphere, for generating random permutations, and so on.
Details can be found, e.g., in
[3,6,16,21,23].
Recently, there has been an effort in developping automatic or black box algorithms for generating variates from an arbitrary (known) density, and reliable software that implements these methods ([27,29,67,68]).
Acknowledgements. This work has been supported by the Natural Sciences and Engineering Research Council of Canada (NSERC) Grant No. ODGP0110050, NATEQ-Québec grant No. 02ER3218, and a Canada Research Chair to the author. Wolfgang Hörmann, Josef Leydold, François Panneton, and Richard Simard made helpful comments and corrections on an earlier draft. The author has been asked to write chapters on Random Number Generation for several handbooks and encyclopedia recently. Inevitably, there is a large amount of duplication between these chapters.