Pitfalls of (Pseudo)Random Number Generation

Von Neumann’s Middle-Square pseudorandom number generator

Take a moment to think up a few random digits. (Here’s mine: 6, 1, 2, 8, 0, 3, 3, 0, 8, 0, 0, 4, 5, 7, 9.) You might think that the numbers you are generating are random, but there is some process going on in your brain to produce these numbers that is anything but random. In fact, experiments have shown that individuals tend to underrepresent the presence of repeated numbers. After all, who would ever pick the lottery numbers of the previous day’s winner?

Remembering the work of the ancient Greeks in Chapter 0, we might hope for a genius mathematical algorithm to save us. This was also the wish of computational pioneer John von Neumann, who in the late 1940s was working on an early computer called ENIAC (see figure below). At that time, random number generation was so primitive that the state of the art was A Million Random Digits with 100,000 Normal Deviates, a book that was in development by the RAND Corporation. This book eventually contained 400 pages of a million random digits between 0 and 9 generated by spins of an electronic roulette wheel; 20 of the 32 resulting numbers were converted to digits, whereas the other 12 were ignored. If you wanted a random number, you opened the book to some arbitrary page of this book and started transcribing digits. Needless to say, a better computational approach for generating random numbers was sorely needed.

The Electronic Numerical Integrator and Computer (ENIAC), constructed in 1945. ENIAC cost nearly $500,000 to build, and although it was originally constructed for nuclear ballistics, it soon became applied to other fields, from weather prediction to pseudorandom number generation.

One method that von Neumann considered for random number generation is called the Middle-Square approach. It starts with an arbitrary n-digit integer called the seed and generates a new “random” n-digit integer by first squaring the number to produce a 2n-digit number, and then taking the middle n digits of the result. For example, if n = 4 and our seed is 1600, then 1600² = 2,560,000; this number viewed as an 8-digit number is represented as 02,560,000, whose middle four digits are 5600. This number becomes our first randomly generated integer; to generate a new random integer, we square 5600 and take its middle four digits.

The numbers generated by the Middle-Square approach may appear random, but they are generated by a deterministic method that always produces the same sequence of numbers from a given initial seed. We will use therefore the term pseudorandom number generator, or PRNG for short, to refer to a deterministic method like the Middle-Square approach that generates a sequence of numbers in an effort to mimic randomness.

Exercise: With a seed of 1600, we saw that the next four-digit random number generated by the Middle-Square approach was 5600. What are the next three numbers that will be generated?

An ideal PRNG would, over the long term, produce each n-digit number with equal frequency, independently of the seed. But as the preceding exercise indicates, the problem with the Middle-Square approach is that it often generates short cycles, in which the same numbers come up repeatedly. Consider that 8100² = 65,610,000, 6100² = 37,210,000, 2100² = 04,410,000, and 4100² = 16,810,000, so we are only able to generate four numbers from the seed 8100. Worse, if we pick 3792 as our seed, then 3792² = 14,379,264, and so we generate the same number over and over in the name of randomness.

The poor performance of the Middle-Square approach is unfortunately the norm when we try to use functions based on arithmetic operations to generate random numbers, leading von Neumann to pen the following lament:

Anyone who considers arithmetical methods of producing random digits is, of course, in a state of sin. For, as has been pointed out several times, there is no such thing as a random number.

Lagged Fibonacci generators

In a truly random sequence of numbers, we would expect to see repeated numbers — it would be anything but random if a PRNG were to produce a sequence containing every four-digit number between 0 and 9999 exactly once without any repeats. However, the Middle-Square approach suffers from the critical weakness that whenever a number is encountered that has already been generated, the entire sequence of numbers following that number will be generated again.

In other words, we currently think about a PRNG as generating a new number y from only one preceding number x, and this means that whenever we encounter the same x, we will produce the same y. What if the PRNG instead produced a new integer from more than one previous integer? “Remembering’’ more than one previous number would allow us to generate the same number x multiple times without necessarily causing a cycle.

A simple PRNG with a longer memory is based on the Fibonacci sequence. The first two terms of this sequence are defined as equal to 1, F(0) = F(1) = 1. Subsequent terms are defined as the sum of the previous two terms; that is, for n ≥ 2, F(n) = F(n – 1) + F(n – 2). Of course, the terms of this sequence continue on toward infinity (1, 1, 2, 3, 5, 8, 13, 21, 34, …), but we could form a PRNG called a Fibonacci generator by letting F(n) be the remainder of F(n – 1) + F(n – 2) after dividing by some fixed value m. For example, if m = 1,000,000, then for F(n – 1) = 832,040 and F(n – 2) = 346,269, we would set F(n) equal to the remainder when dividing 832,040 + 346,269 by 1,000,000, which is equal to 178,309.

STOP: How many initial seeds will we need for the Fibonacci generator?

Note that the Fibonacci generator will require not one but two seeds. If we change F(0) and F(1), then this will likely cause F(2) to change, and as a result we will obtain a different sequence of integers generated by the PRNG.

STOP: Do you see any issues with the Fibonacci generator?

Even though we now have a PRNG with a longer memory, the Fibonacci generator has several flaws, chiefly that once we have seen two integers in a sequence, we will immediately know the next number in the sequence. Hardly a random number generator!

We can make a slight adjustment to the Fibonacci generator if we ask it to “remember” farther back in the sequence of random integers.

That is, for some fixed positive integers j and k, we set F(n) equal to the remainder of the sum F(n–j) + F(n–k) divided by m. The resulting PRNG is called a lagged Fibonacci generator that is surprisingly practical given its simplicity. We generally avoid mention of specific languages, but it is worth pointing out that Go’s built-in PRNG uses a lagged Fibonacci generator with j =273, k = 607, and m = 2³¹. That is, F(n) is equal to the remainder when dividing F(n – 273) + F(n – 607) by 2³¹.

Exercise: How many seeds do you think we will need for a lagged Fibonacci generator?

A short detour on PRNG security

This discussion makes us wonder what makes a PRNG acceptable for use in applications like Monte Carlo simulation. For one, we would hope that if we use different seeds, then we will obtain sequences of numbers that are different from each other. We would also hope to avoid repeating sequences of numbers like what we saw with the Middle-Square approach. The German Federal Office for Information Security identified four “levels” that PRNGs should have to be secure enough to use in applications.

Level 1: Different sequences generated by the PRNG should usually be different from each other.

Level 2: The PRNG should generate numbers with the same properties of a sequence of truly random numbers (e.g., how often a number repeats, or how often a given triplet of numbers repeats).

Level 3: A hacker shouldn’t be able to use the sequence of numbers to guess any future numbers.

Level 4: A hacker shouldn’t be able to use the sequence of numbers to guess any previous numbers.

PRNGs satisfying all four levels of security do exist, although they are slower than simpler PRNGs. Furthermore, it took decades of research to obtain advanced PRNGs, which are mathematically complicated — yet one more reason why mathematics is vital for an understanding of computer science.

Finally, before returning to our work with Monte Carlo simulation, we note that part of the reason we emphasize an understanding of the fundamentals of PRNGs is that they are still relevant today. A 2017 Wired article spins a fascinating tale of a team of hackers based in St. Petersburg who allegedly made millions on a particular slot machine by applying an understanding of the machine’s PRNG to predict when the machine would make large payouts.

STOP: If we are able to use past results to time a slot machine payout, then what level does the machine’s PRNG not achieve?

Next lesson