Random Numbers

Introduction

Author's notes…

This web page is a casual essay on my interest in random numbers and related topics. The article will progress through time from my childhood to current times and discuss major breakthroughs and discoveries along the way. It's interesting to see how a topic that was once on the edge of obscurity has matured to become a subject of serious research in the computer age.

The tone of this article is mostly conversational and is written from the viewpoint of a software developer.

To compare different random number generators I will sometimes display their output in a hobby app of mine called RandPlot (Wiki page and source code). This app plots random numbers in a unit square and it can reveal generator flaws like poor distribution or coarse lattice structure. The app is not backed by any solid theory, it was created for entertainment, but it generally works and is fun to play with.

Definitions

Some acronyms will appear frequently, so it's worth giving them some brief definitions first.

PRNG – Pseudorandom Number Generator

An algorithm (computer program) for generating a sequence of numbers whose properties approximate the properties of sequences of random numbers.

LCG – Linear congruential generator

One of the oldest and best-known PRNG algorithms. An LCG loops through a recurrence relation like this:

X_n+1 = aX_n mod m

Each step takes the value X, multiplies it by a, then the remainder when divided by m is assigned back to X. The step is repeated. When the values of a and m are chosen correctly, the resulting sequence has reasonably good quality. The LCG is often the building block of more sophisticated PRNGs.

LFSR – Linear-Feedback Shift Register

Another classical PRNG algorithm where in each step a set of binary bits is rotated and then XORd with parts of itself. The step is repeated. When the bits and XOR are chosen correctly, the resulting sequence is also reasonably good quality. A combination of 3 LFSRs is used to protect playback of some commercial DVDs.

CSPRNG – Cryptographically secure pseudorandom number generator

An algorithm that generates a random number sequence where it is computationally infeasible to predict the future of the sequence from the history of the sequence. That property makes a CSPRNG specifically useful in the fields of security and cryptography. Outside of those fields, a CSPRNG is unlikely to be useful. If you're using a CSPRNG to simply generate random numbers, then it's probably time for a code review.

School Days

In the latter years of high school we were progressively introduced to subjects like elementary probability and statistics, permutations and combinations, the Normal and Poisson distributions, correlation, expectation, and so on. Quite often during these years, the expression random number was used, but the subject of random numbers was never the formal part of any course and the exact meaning of random number remained tantalisingly unclear. I had this comical image in my head of hundreds of monkeys throwing dice or hammering away at typewriters to create randomness (that wouldn't work of course! See this Simpsons episode clip).

The topic of random numbers or randomness in general was not discussed in any publications I had access to, so the whole subject remained idle for many years.

I was bemused to find a list of random numbers on page 38 of my Four-Figure Mathematical Tables book.

Partial scan of the log tables book with random digits

If you look closely you'll see that the numbers are familar. You're looking at the decimal part of π. Firstly, it's really lazy of the authors to simply stick π digits into a table and claim they're random. Secondly, this opens up a random can of worms, because it is suspected that the decimal digits of π are actually random, but this remains unproven. For a technical discussion of this, see Wikipedia's article on Normal Number and for a more casual discussion see Math is Fun.

Jumping into the future of 1986 for a moment, it's when I purchased a copy of the famous Handbook of Mathematical Functions by Abramowitz and Stegun, originally published in 1964. I never knew about this book in high school, and had I seen one back then I would have thought it was a set of log tables that fell through a wormhole from the distant future. I was bemused once again to see 2500 five-digit random numbers occupying pages 991 to 995. A footnote says:

Compiled from Rand Corporation. A million random digits with 100,000 normal deviates. The Free Press, Glencoe, Ill., 1955 (with permission).

Cover of Handbook of Mathematical Functions

Partial scan of page 991 showing some random digits

Programming

In 1974 I was lucky enough to be able to use a Monash University Minitran computer to learn to write my first computer programs (see my Computers History page for more details). I was fascinated by the function F4 which generated random numbers, so I used it to print random bars and points made from asterisks. Therefore the first hobby program I ever wrote plotted random numbers. Sadly, I don't have any copies of the 15×11 inch fanfold paper printouts from the time.

In 1992, the very first C program I wrote with Microsoft C/C++ 7 under DOS 5.0 generated random coloured pixels on the screen. A few months later, the first Windows program I wrote with the Microsoft SDK 3.1 filled the window client area with random lines. My first Windows screen saver filled the screen with customisable random shapes and colours. The first Java applet I wrote in early 1996 mimicked the workings of my screen saver in miniature. It seems that every "hello world" program I have written in the last 30 years when learning a new programming language seems to play with random numbers.

Until late 2021 I had some Silverlight applications called Boxes, Hypno and SilverRand which used random numbers to generate entertaining visual effects. Sadly, Silverlight was deprecated and all that remains on those web pages are screenshots of the original animations.

Early Research (an aside)

I could not possibly know at the time I was in high school, but a small and dedicated group of people in specialist occupations in the 1960s were researching how to generate pseudo-random numbers and how to evaluate their quality. The research was being performed on mainframe computers, which were evolving rapidly at the time.

The researchers knew that computers could not generate truly random numbers, because by design they were completely deterministic and any program fed the same input would produce the same output. However, the search was on to find algorithms that could produce pseudorandom number sequences that had sufficient length (period) and quality that they would fool statistical tests into thinking they were truly random. In that case, the pseudorandom numbers could be safely used in practical applications such as simulation and modelling.

One of the prominent names in early computer research was Donald Knuth (born 1938). His epic and respected books The Art of Computer Programming first published in 1968 devote 193 dense pages in chapter three to random numbers. It contains the first serious and definitive study of computer algorithms to generate pseudorandom numbers. Knuth presents a wide variety of now classical algorithms and analyses them in great detail. He also discusses a variety of tests for randomness. Section 3.5 titled What is a Random Sequence contains a fascinating discussion that almost bridges mathematics and philosophy. If you ever get into an argument about what 'random' means, just smugly point your antagonist to this section of TAOCP (as insiders know it).

1970s

In 1975 during first year at Monash University I purchased Schaum's Outline Series book titled Numerical Analysis. The final chapter 30 devotes 6 pages to Monte Carlo Methods. The introduction contains the first formal mention of a PRNG algorithm I had seen.

Partial page scan of the Schaum book page on random numbers

It's quaint to see a distinction between decimal and binary computers, which has no meaning now. Both of the algorithms are LCGs as previously defined.

Binary Algorithm

To try the binary computer algorithm I picked N=2³¹ (2147483648) and r=46333. A slice of RandPlot with a horizontal magnification of 1000 looks like this at about 10% through the period:

Plot slice of the Schaum binary algorithm from RandPlot

The density is quite high, but a lattice is clearly visible. Wikipedia explains that this type of PRNG can only have maximum period N/4 and that the generated bit patterns are not very random, making the algorithm quite poor by modern standards. I confirmed that the period of this PRNG is 536,870,913 (as predicted).

Decimal Algorithm

Out of curiosity I tried the decimal algorithm with N=10⁹ and r=7⁹ (40353607). The corresponding plot shows that the resulting sequence is dreadful when compared to the binary algorithm, and it only has a period of 5 million.

Plot slice of the Schaum decimal algorithm from RandPlot

Completely on a whim, I added 2 to the r value and I got a much denser plot with a period of 25 million (Plot 1). On a whim again I added another 2 to r and got an even denser plot with a period of 50 million (Plot 2). So the authors of the original decimal algorithm accidentally picked an appalling value for the multiplier r. In seconds I found a better value, then an even better one. I can't find any r that gives a period longer than 50 million (a mere 5% of the full possible sequence), so I presume it's the limit and I'll bet there's a proof of this I'm not aware of.

Plot 3 is a slight mutation of the book's decimal algorithm to use nearby numbers where N is a prime number and r is a Primitive Root modulo N, which guarantees a maximum period generator. It's not in the same spirit as the decimal algorithm in the book, but it's shown for comparison. The plot was taken when the generator was about 10% through the period.


Plot 1 N=10⁹ r=7⁹+2	Plot 2 N=10⁹ r=7⁹+4	Plot 3 N=10⁹+3 r=7⁹+2

Combined PRNGs

Around 1990 while working at Fujitsu I stumbled upon the June 1988 issue of Communications of the ACM which contained a chapter titled Efficient and Portable Combined Random Number Generators by Pierre L'Ecuyer. Scans of the article pages are at the top of my Maths Pages web page. Pierre investigates the properties of LCGs to find the best ones, then combines them to create a PRNG with a period of about 2.3×10¹⁸. The original code is so short I will include it here:

// Seed s1 in the range [1,2147483562] and s2 in the range [1,2147483398]
float Uniform()
{
  int z,k;
  k = s1 / 53668;
  s1 = 40014 * (s1 - k * 53668) - k * 12211;
  if (s1 < 0) s1 = s1 + 2147483563;
  k = s2 / 52774;
  s2 = 40692 * (s2 - k * 52774) - k * 3791;
  if (s2 < 0) s2 = s2 + 2147483399;
  z = s1 - s2;
  if (z < 1) z = z + 2147483562;
  return z * 4.65661305956E-10;
}

A RandPlot slice of the original L'Ecuyer output stretched by a factor of 1000 horizontally shows no patterns. This is heuristic evidence that it doesn't suffer from the typical flaws of many original LCGs.

Plot slice of Ecuyer's plain algorithm from RandPlot

Below left is a sample of a stretched plot that appeared in L'Ecuyer's article. It shows his own algorithm plotted with a horizontal stretch factor of 1000. This is the plot that inspired me to create my own RandPlot application.

Square plot scanned from L'Eecuyer's' original article Despite being a combined PRNG having a very long period, flaws were later found in the original algorithm due to correlations in the bit patterns of sequential outputs. Numerical Recipes in C (PDF) recommended adding a Bays-Durham Shuffle (PDF) as a cure.

Around 2000 I emailed Pierre L'Ecuyer to ask if he had further PRNGs, especially 64‑bit ones. He pointed me to his web page which contained many new 32 and 64‑bit PRNGs, some written specifically for Java, which was quite popular at the time. I have preserved code for five of his algorithms in my Orthogonal.Common.Basic DevOps repository. They are only preserved for historical interest, as modern advances have rendered them generally obsolete.

I've noticed that Pierre L'Ecuyer is still active in this area of random algorithm research and his name pops-up in many projects, collaborations, articles and web sites in 2022.

Comparing PRNGs

Around 2002 a friend who works in IT security asked me to evaluate the Visual Basic 6 Rnd function to see how well it behaved. As part of the exercise I used L'Ecuyer's plot technique again to make a heuristic visual comparison of different popular algorithms.


A good primitive LCG with a=40692 and m=2147483399. The lattice struture of points is quite obvious.	The standard C runtime rand() function. The coarse lattice is as obvious as the previous LCG.

L'Ecuyer's combined LCG generator. No obvious structure is obvious at this scale.	The Visual Basic 6 Rnd function. I was quite surprised to see that the plot shows no obvious structure at this scale.

The plots were generated using Mathematica. The very old Version 4 Notebook is available for download. It still loads and runs in Version 11.

Mersenne Twister

In 2004 I was shocked to discover that a whole new class of random number generator existed that I had not heard of, even though it had been published for about 5 years. This new algorithm called Mersenne Twister (MT) is a linear-feedback shift register (LFSR) with a very large period and excellent statistical properties in 623 dimensions.

The MT was a significant breakthrough at the time, performing so well that it basically crushed previous PRNGs into insignificance. It has been well researched and has become so respected that it's used in many commercial products and software developer platforms. I met a programmer at a conference in the late 2000s who told me that the MT was used in gambling slot machines (pokies) in Australian casinos.

The downside of the MT is that the code is slightly longer and more cryptic than previous code based upon LCGs, and it's therefore slightly slower than LCG based algorithms as well. Another problem is that the state of the MT generator is held in an array of 624×32‑bit words, which might make it unsuitable in constrained environments such as microprocessors.

There are 32‑bit and 64‑bit versions of the Mersenne Twister.

Quick and Dirty

Around 2007 I needed some code to generate random numbers as simply as possible, with the least code possible. The algorithm didn't need to be high quality, as it was just going to be used for shuffling test data and other trivial tasks.

I remembered that NRC had a few paragraphs on quick and dirty generators in chapter 7. Their example was:

u = [32‑bit seed value]
u = 1664525u * u + 1013904223u;

Despite the simplicity, this generator is amazingly good. The following slices from RandPlot show the difference in distribution between the standard C language rand function and the quick and dirty. If you need a really simple but effective bit of code to "shuffle things up" without the overkill need for high-quality random numbers, then you can't go past a quick and dirty generator. Don't forget though that very simple LCGs like these produce bit patterns, such as alternating odd-even numbers in this case (if that matters).

C rand function

Quick and Dirty

In May 2015 while updating my Blog post titled Implementing SymmetricAlgorithm I needed a quick and dirty algorithm to perform mock encryption of 64‑bit blocks. As the Blog describes, I picked two large primes of the same relative order of magnitude as the 32‑bit version and ran the output through 5 billion iterations in the RandPlot application and they showed no obvious structure or cycles. So if you need a 64‑bit quick and dirty random generator, try this:

ulong s = [64‑bit seed value]
s = s * 2770643476577ul + 4354685564936844689ul;

You still get the odd-even flaw. As an experiment, I found that if you take the two 32‑bit halves of the numbers and xor them together like this:

uint u = (uint)(s >> 32) ^ (uint)s

This two-liner eradicates the odd-even flaw and it easily passes the runs test. Running 10MB of output through the FI MU online tests shows that it passes most (not all) of the very strict tests by a low but acceptable margin.

Guids

Guids (or UUIDs) are 128-bit values that promise to be globally unique in the world. Operating systems such as Windows^® and Linux have Guid generators built into them. This one-liner generates a random Int32:

int r = Guid.NewGuid().GetHashCode();

Be aware that generating new Guid hashcodes in a tight loop runs at about 2% of the speed of a typical PRNG. For more information see my blob post PRNG Relative Performance.

As my blog post titled Guid constant bits discusses, there is evidence that Windows Guids are cryptographically strong, which infers that they could be used as a CSPRNG and therefore as a PRNG. Microsoft Whitepaper MS-SECO Windows Security Overview (2014) is vague reading on this subject. As a result, I prefer to use a famous PRNG for bulk random number generation, but I enjoy using the Guid one-liner for casual or less demanding scenarios.

Real Random Numbers

Around 2009 I ran a web search for something like "real random numbers" and I was pleased to discover many web services that delivered real random numbers distilled from natural processes.

The first one I tried was random.org which samples noise generated by radio receivers tuned between stations. Later I found the ANU's Quantum Random Number Generator which samples quantum fluctuations of the vacuum at 5.7 GBit/sec. I was so impressed by the science fiction background of the QRNG that I wrote a pair of C# classes that wrapped the service as a buffered stream and published it in a style similar to the standard .NET Random class (both withdrawn now). I also added a tab to my Randgen application that provided the facility to download random bytes from both of the services.

Sadly, in 2022 I discovered that both services had moved behind a paywall and their free usage tiers required registration and had limits so cripplingly small that they were useless in hobby projects. In response, I removed the web service download feature from Randgen, but several weeks later I noticed that random.org offered free download of integers (not bytes) with a quota large enough for hobby use, so I put the feature back. My 2014 vintage blog post about Real Random Numbers has been updated to document the payment model change. The following screenshot shows the RandGen tab where you can download real random numbers from random.org.

Partial screenshot of the RandGen application

Paywall issues aside, the ability to retrieve 'real' random numbers is a subtle but wonderful new facility for software developers. No longer do you have worry about the various pros and cons of different PRNG algorithms, you can simply pluck truly random data out of a web service. The downside is of course the possible bandwidth limits, as a typical PRNG can continuously generate millions of values per second, but a web service is subject to unpredictable speed limits.

Speed is probably not a problem in realistic usage scenarios as you'd probably only need to get small amounts of real random values for seeds or keys rather than read from a continuous high-speed stream. A strategy could be to lazily cache true random values for future use.

Shift and Multiply

Some websites suggest this is the correct way of converting 64 random bits into a normalised double:

ulong u = [current 64‑bit value]
double d = (u >> 11) * 1.11022302462515654e-16   // * 2^-53

An experiment in LINQPad shows that ulong values starting at zero and incrementing by 2048 (which allows for the shift right by 11) produce an incrementing set of significand bit patterns. This demonstrates we're getting the desired behaviour at the start of the range.

Table of good ascending floating-point numbers

A reversed table from the maximum 53-bit value decrementing shows that there is a corresponding decrementing set of significand bit patterns. This suggests that the technique of shifting right 11 and multiplying by 2^-53 produces a continuous set of double-precision floating point numbers.

Table of good descending floating-point numbers

I'm still wondering if there's an efficient way of converting 53 random bits directly into a double precision floating point number via bit manipulation and avoiding a multiply.

32-Bit Divide

As expected, dividing an integer from 0 to uint.MaxValue (2³²-1) by MaxValue produces a distinct set of floating-point numbers because they can all be represented as 53 floating-point bits. The following table shows the start of the range ascending and the end of the range descending.

Table of floating-point numbers generated by 32‑bit divide

There are significant errors in the start range numbers. The row 1 value for example should be 2.32830643653869628906250000×10^-10. The end ranges values are accurate to all digits shown.

64-Bit Divide

Dividing an integer from 0 to ulong.MaxValue (2⁶⁴-1) by MaxValue cannot possibly produce a complete set of distinct values, so it's interesting to see what happens at the limits in comparison to the 32‑bit case.

Table of floating-point numbers generated by 64‑bit divide

The start of the range from 0 produces distinct floating-point values which are accurate to all digits shown (unlike the 32‑bit case). The end range is more problematic. It is not until row 1024 (not shown) that the floating-point 1 drops to 0.9999999999999999. Quick experiments shows that each ulong maps to a unique double up until 0x0020000000000000, after which duplicates begin to appear and blocks of duplicates increase up to a length of 1024 when MaxValue is reached. I think this effect is caused by representable floating-point numbers not being equidistributed on the number line, but more research is needed to explain it and what practical risk it might present.

Converting a 64‑bit integer to a floating-point number in [0,1) should therefore be done by the shift-and-multiply method at the top of this section, not by simple division which does not produce a flat distribution. Links:

How many floating-point numbers are in the interval [0,1]?

Summary

After about 60 years of serious research into random number generation and testing, we have arrived at a point where we have too much choice. There are dozens of well-known PRNG algorithms available, many produced by respected authorities. Some algorithms are based upon LCGs, some on LSFRs, and there are hybrids and some outliers. And don't forget that real random data is available online. The difficulty is choosing the source of random data that is right for your needs. Is speed important, or the level of trust, or the size of the generated numbers? Or perhaps your software runs in some constrained environment with little memory or CPU power.

For general .NET development of business applications, I will use the standard Random class for production code, as I've always trusted the original Knuth Subtractive algorithm it used for the last 20 years, and I'm technically thrilled to see it has migrated to using one of the modern xoroshiro algorithms in .NET 6.0 and onwards.

For quick and dirty numbers I like using the Guid.NewGuid().GetHashCode() one-liner. It's slower than all popular PRNGs and it could be regarded as overkill because it's probably crypto secure, but it's wonderfully convenient for knocking-out some ad-hoc random numbers.

I mostly develop using the .NET platform on Windows, which explains my favourite PRNG choices. If you work in macOS or Linux and your development platform is something completely different like Java, JavaScript, C, C++, Rust, Kotlin, Swift, etc, then you'll need to research which sources of random data are available and choose the best one for your needs.

It can be entertaining and educational to research your choices.

Nancy Street Network – Random Numbers	www.orthogonal.com.au
Home \| TOC \| Random Page