Next: 2.7 Available Software and Up: 2. Random Number Generation Previous: 2.5 Nonlinear RNGs

2.6 Examples of Statistical Tests

As mentioned earlier, a statistical test for RNGs is defined by a random variable whose distribution under ${\mathcal{H}}_0$ can be well approximated. When takes the value , we define the right and left -values of the test by

$\displaystyle p_{\text{R}} = P[X\ge x\mid {\mathcal{H}}_0] \quad\text{and}\quad p_{\text{L}} = P[X\le x\mid {\mathcal{H}}_0]\;.$

When testing RNGs, there is no need to prespecify the level of the test. If any of the right or left

-value is extremely close to zero, e.g., less than $10^{-15}$ , then it is clear that ${\mathcal{H}}_0$ (and the RNG) must be rejected. When a suspicious

-value is obtained, e.g., near $10^{-2}$ or $10^{-3}$ , one can just repeat this particular test a few more times, perhaps with a larger sample size. Almost always, things will then clarify.

Most tests are defined by partitioning the possible realizations of $(u_0, \ldots, u_{\tau-1})$ into a finite number of subsets (where the integer $\tau$ can be random or deterministic), computing the probability of each subset under ${\mathcal{H}}_0$ , and measuring the discrepancy between these probabilities and empirical frequencies from realizations simulated by the RNG.

A special case that immediately comes to mind is to take $\tau=t$ (a constant) and cut the interval into equal segments for some positive integer , in order to partition the hypercube into subcubes of volume . We then generate points $\boldsymbol{u}_i = (u_{ti},\ldots,u_{ti+t-1}) \in [0,1)^t$ , for $i=0,\ldots,n-1$ , and count the number of points falling in subcube , for $j=0,\ldots, k-1$ . Any measure of distance (or divergence) between the numbers and their expectations can define a test statistic . The tests thus defined are generally called serial tests of uniformity ([31,60]). They can be sparse (if $n/k \ll 1$ ), or dense (if $n/k \gg 1$ ), or somewhere in between. There are also overlapping versions, where the points are defined by $\boldsymbol{u}_i = (u_{i},\ldots,u_{i+t-1})$ for $i=0,\ldots,n-1$ (they have overlapping coordinates).

Special instances for which the distribution of under ${\mathcal{H}}_0$ is well-known are the chi-square, the (negative) empirical entropy, and the number of collisions ([51,60,85]). For the latter, the test statistic is the number of times a point falls in a subcube that already had a point in it. Its distribution under ${\mathcal{H}}_0$ is approximately Poisson with mean $\lambda_1 = n^2/(2k)$ , if is large and $\lambda_1$ is not too large.

A variant is the birthday spacings test, defined as follows ([71,31,57]). Let $I_{(1)}\le \cdots \le I_{(n)}$ be the numbers of the subcubes that contain the points, sorted by increasing order. Define the spacings $S_j = I_{(j+1)}-I_{(j)}$ , for $j=1,\ldots,n-1$ , and let be the number of collisions between these spacings. Under ${\mathcal{H}}_0$ , is approximately Poisson with mean $\lambda_2 = n^3/(4k)$ , if is large and $\lambda_2$ not too large.

Consider now a MRG, for which $\Psi_t$ has a regular lattice structure. Because of this regularity the points of $\Psi_t$ will tend to be more evenly distributed among the subcubes than random points. For a well-chosen and large enough , we expect the collision test to detect this: it is likely that there will be too few collisions. In fact, the same applies to any RNG whose set $\Psi_t$ is very evenly distributed. When a birthday spacings test with a very large is applied to a MRG, the numbers of the subcubes that contain one point of $\Psi_t$ tend to be too evenly spaced and the test detects this by finding too many collisions.

These specific interactions between the test and the structure of the RNG lead to systematic patterns in the -values of the tests. To illustrate this, suppose that we take slightly larger than the cardinality of $\Psi_t$ (so $k\approx\rho$ ) and that due to the excessive regularity, no collision is observed in the collision test. The left -value will then be $p_{\text{L}} \approx P[X \le 0\mid X\sim$ Poisson $(\lambda_1)] = \exp[-n^2 / (2k)]$ . For this -value to be smaller than a given $\epsilon$ , we need a sample size proportional to the square root of the period length $\rho$ . And after that, $p_{\text{L}}$ decreases exponentially fast in .

Extensive experiments with LCGs, MRGs, and LFSR generators confirms that this is actually what happens with these RNGs ([51,44,60]). For example, if we take $\epsilon = 10^{-15}$ and define as the minimal sample size for which $p_{\text{L}} < \epsilon$ , we find that $n_0 \approx 16 \rho^{1/2}$ (plus some noise) for LCGs that behave well in the spectral test as well as for LFSR generators. For the birthday spacings test, the rule for LCGs is $n_0 \approx 16 \rho^{1/3}$ instead ([57]). So to be safe with respect to these tests, the period length $\rho$ must be so large that generating more than $\rho^{1/3}$ numbers is practically unfeasible. This certainly disqualifies all LCGs with modulus smaller than $2^{100}$ or so.

Other types of tests for RNGs include tests based on the closest pairs of points among points generated in the hypercube, tests based on random walks on the real line or over the integers, tests based on the linear complexity of a binary sequence, tests based on the simulation of dices or poker hands, and many others ([23,31,58,72,86,92]).

When testing RNGs, there is no specific alternative hypothesis to ${\mathcal{H}}_0$ . Different tests are needed to detect different types of departures from ${\mathcal{H}}_0$ . Test suites for RNGs include a selection of tests, with predetermined parameters and sample sizes. The best known are probably DIEHARD ([72]) and the NIST test suite ([86]). The library TestU01 ([58]) implements a large selection of tests in the C language and provides a variety of test suites, some designed for i.i.d. output sequences and others for strings of bits.

Next: 2.7 Available Software and Up: 2. Random Number Generation Previous: 2.5 Nonlinear RNGs