To get started, create a folder named hw2
under your go/src
directory. Then create a text file named main.go
inside this folder. This is the file that you should edit for the purposes of this assignment.
Timing functions on pseudorandom inputs
In Chapter 0, we saw that Euclid’s approach to computing a GCD was much faster than the trivial algorithm. (We are reproducing the pseudocode below for both algorithms for convenience.)
TrivialGCD(a, b) d ← 1 m ← Min2(a,b) for every integer p between 1 and m if p is a divisor of both a and b d ← p return d EuclidGCD(a, b) while a ≠ b if a > b a ← a − b else b ← b − a return a
We would like to quantify just how much faster Euclid’s algorithm is, but the problem is that the number of steps that it requires depends on the input values given. For example, if we compute the GCD of a = 1,000,000 and b = 1,000,000 with Euclid’s algorithm, then the while loop will never be entered, and the algorithm will immediately return 1,000,000.
Our solution is to generate two pseudorandom integers a and b in the range between some integers lowerBound and upperBound, and then to time the two algorithms on these values. We implement this idea in the window below.
Exercise: TimeTrivialGCD()
andEuclidGCD()
for randomly chosen pairs of numbers from each of the following ranges: 1,000 to 2,000; 10,000 to 20,000; 100,000 to 200,000; and 1,000,000 to 2,000,000. What is the ratio of the runtime ofTrivialGCD()
toEuclidGCD()
each time, and how does this ratio increase as the values of lowerBound and upperBound increase?
Warming up: weighted dies and estimating pi
Say that we would like to simulate a weighted die that has probability 0.5 in a given roll of resulting in a 3 and probability 0.1 of resulting in any of 1, 2, 4, 5, or 6.
Code Challenge: Write and implement a functionWeightedDie()
that models this die by taking no inputs and returning a pseudorandom integer between 1 and 6, inclusively, according to these probabilities.
You may be aware that the area of a circle is equal to the mathematical constant 𝝅 times the circle’s radius squared, but how could 𝝅 be determined? One way to do so would be to create a physical circle and measure its area using small rectangular blocks, but we will describe a way that 𝝅 can be estimated using Monte Carlo simulation.
Imagine throwing a dart at a circular dart board that is inscribed within a square (see figure below). If you were as poor of a dart player as your author, then the likelihood of the dart landing anywhere within the square (including off the board) is uniform. Under this assumption, the likelihood of the dart hitting the dartboard is equal to the ratio of the area of the circle to the area of the square.
Let us make the radius of the dart board be equal to 1, so that the square has side length equal to 2. As a result, the area of the circle is equal to 𝝅, and the area of the square is equal to 4, making the probability of our dart hitting the dartboard equal to 𝝅/4. We can write this as an equation:
Probability(dart hits dartboard) = 𝝅/4.
Multiplying both sides of the above equation by 4 yields an innocuous enough result:
4 · Probability(dart hits dartboard) = 𝝅.
However, if we are able to estimate the probability that the dart hits the dartboard, then we can simply multiply that estimate by 4, and it will give us an estimate for 𝝅. In this way, even if we didn’t know 𝝅, we could estimate it.
To estimate Probability(dart hits dartboard), we will apply Monte Carlo simulation. We will assume that the dartboard is centered at (0, 0), so that we can simulate throwing the dart at the square by independently generating two decimal numbers a and b that are both between -1 and 1 to obtain the point (a, b) in the square struck by the dart. The question is: how do we know whether the dart hits the circle?
Fortunately, this question can be answered easily, since the equation for the boundary of the circle of radius 1 centered at the origin is x2 + y2 = 1, and a little long-forgotten high school mathematics tells us that (a, b) lies inside this boundary, and therefore on the dart board, precisely when a2 + b2 ≤ 1.
We can therefore iterate this process over a number of simulations, repeatedly generating a and b while counting the number of times that (a, b) falls within the circle. At the end, we can divide this count by the number of simulations to obtain an estimate for Probability(dart hits dartboard) , and as we observed above, multiplying this estimate by 4 yields our desired estimate for 𝝅.
Code Challenge: Write and implement a functionEstimatePi()
that takes an integernumPoints
as input and returns a decimal estimating 𝝅 according to the above process of selectingnumPoints
points uniformly from within a square of side length 2 centered at the origin.
The following lesson allows you to check your solutions to these challenges. It can also be found using a direct link.
Estimating the probability that two integers are relatively prime
Two positive integers are called relatively prime (also called coprime) if they do not share any divisors other than 1. To determine whether two integers a and b are relatively prime, we could consult every possible divisor d between 2 and the minimum of a and b. If any such d is a shared divisor of a and b, we conclude that a and b are not relatively prime; on the other hand, if no such d is a shared divisor of a and b, then we conclude that a and b are relatively prime.
However, this sort of brute force approach of trying every possible divisor suffers from the same flaw as TrivialGCD()
. Instead, we note that a and b are relatively prime precisely when their GCD is equal to 1. Accordingly, we can use our existing fast EuclidGCD()
algorithm as a subroutine in order to quickly determine whether a and b are relatively prime.
Code Challenge: Write and implement a functionRelativelyPrime()
that takes as input two integersa
andb
and returnstrue
ifa
andb
are relatively prime andfalse
otherwise.
A natural question is to determine the probability that two randomly selected integers are relatively prime. If we did not see a way of calculating this probability mathematically, then we could attempt to estimate the probability by generating every pair of integers in some range and counting the number of relatively prime pairs. For example, we could generate all pairs of numbers between 1 trillion and 2 trillion. However, there are about 5 · 1023 pairs of integers in this range, and so such a brute force approach would prove prohibitively slow.
Instead, Monte Carlo simulation will provide an estimate for this probability by generating many pairs of integers in the range, testing whether each pair of integers are relatively prime, and returning the ratio of the number of relatively prime pairs to the total number of pairs considered.
Code Challenge: Write and implement a functionRelativelyPrimeProbability()
that takes as input three integerslowerBound
,upperBound
, andnumPairs
, and returns an estimate of the probability that two randomly chosen numbers betweenx
andy
are relatively prime by selectingnumPairs
pairs of numbers betweenlowerBound
andupperBound
, inclusively.
We would like to estimate the probability of any two integers being relatively prime. To do so, we would call RelativelyPrimeProbability()
on a large range of integers (e.g., setting x
equal to 1 and y
equal to 1 trillion) and then choose many pairs of integers when testing whether given pairs are prime (e.g., setting
equal to 100 million).numPairs
Exercise: Callon your computer on a very wide range of values with
RelativelyPrimeProbability()
lowerBound
equal to 1,upperBound
equal to 1 trillion, andequal to 1 billion (even with an efficient
numPairs
function, your code may take a few minutes to run). What is your estimated probability that two integers are relatively prime?
RelativelyPrime()
Although we have estimated this probability using Monte Carlo simulation, it can be determined exactly mathematically. The correct answer is strange and beautiful, and understanding why it is the way it is relies on some very advanced mathematics. We suggest that you look it up after trying the preceding exercise.
The following lesson allows you to check your solutions to these challenges. It can also be found using a direct link.
The birthday problem
When we generate pseudorandom numbers, it is natural to think that a PRNG that produces repeated numbers is bad. After all, the brain’s method for generating “random” numbers tends to avoid repeats. But as we will see in this section, repetition is a natural component of randomness.
We can represent a sequence of random integers generated by a PRNG using an array. The following exercise will help us detect whether a sequence of integers has a repeated element.
Code Challenge: Write and implement a functionthat takes as input an array of integers
HasRepeat()
a
and returnstrue
if the array contains a repeated value andfalse
otherwise.
Imagine a room full of people. What are the chances that there are two people in the room who share a birthday on the same day of the year? We will estimate this likelihood, which depends on the number of people in the room, using Monte Carlo simulation.
Code Challenge: Write and implement a functionthat takes as input two integers,
SharedBirthdayProbability()
numPeople
andnumTrials
. It runssimulations and returns an estimate of the probability that in a room of
numTrials
people with randomly chosen birthdays, at least one pair of people have the same birthday. For simplicity, you should assume that a year always has 365 days. (Hint: the month is irrelevant when running these simulations.)
numPeople
Exercise: What is the smallest value ofnumPeople
for which there is a greater than 50% chance of two people sharing the same birthday? Are you surprised?
The following lesson allows you to check your solutions to these challenges. It can also be found using a direct link.
In a state of sin with the Middle-Square PRNG
We would like to know how we can we test the quality of a PRNG; we will apply this approach to the Middle-Square PRNG discussed in the main text, since we know that this approach is flawed.
We first would like to implement the Middle-Square PRNG, which means that we will need to write a function
that takes integers SquareMiddle()
x
and numDigits
and returns the result of squaring x
and taking its middle numDigits
digits. This is a simple task by hand, but how can we teach it to a computer?
First, we focus on a subroutine that we will need.
Code Challenge: Write and implement a functionthat takes as input an integer
CountNumDigits()
x
and returns the number of digits inx
. Your function should work for both positive and negative numbers.
Hint: first write your function for positive values ofx
, and then make a slight adjustment to it to accommodate negative values ofx
.
We can divide the process of taking the middle numDigits
digits of an integer into two steps: “crossing off” the first numDigits
/2 digits, then crossing off the last numDigits
/2 digits. For example, we would convert the number x
= 12345678 into 345678 by removing its first two digits, and then 3456 by removing its final two digits. Note that the first process corresponds to taking the remainder when we divide x
by 106 = 108-2, and the second process corresponds to taking the integer division of the resulting number by 102. In general, we can now see that we are taking the remainder of x
by 102·numDigits
– numDigits
/2, and then dividing the resulting number by 10numDigits
/2. We only need to generalize this idea for an arbitrary value of numDigits
to write
.SquareMiddle()
Code Challenge: Write and implement. When squaring the input number, you should add zeroes as needed to the resulting number so that it has 2 ·
SquareMiddle()
numDigits
total digits. Your function should return -1 ifnumDigits
is odd, ifx
is negative, ifnumDigits
is not positive, or if the number of digits inx
is greater thannumDigits
. You may also find it helpful to write a subroutinethat takes as input an integer
Pow10()
n
and returns 10n
.
We can now use
as a subroutine in the following function, which generates the sequence of random numbers for the middle-square sequence corresponding to a given seed and number of digits.SquareMiddle()
GenerateMiddleSquareSequence(seed, numDigits) seq ← array of length 1 seq[0] ← seed while HasRepeat(seq) is false seed ← SquareMiddle(seed, numDigits) seq ← append(seq, seed) return seq
Code Challenge: Implement.
GenerateMiddleSquareSequence()
Now that we have implemented the Middle-Square approach, we ask whether it is a good number generator. In other words, are the short sequences that we have seen this PRNG produce a general flaw? We can answer this question by determining how many seeds cause short “cycles” of numbers.
One way to determine the quality of a seed is to count the number of the sequence produced by the PRNG for this seed. However, say that our PRNG generates the sequence of numbers 140, 278, 14, 28, 19, 28, 19, 28, 19, … We could report this sequence as having length 6, the number of elements in this sequence up to the first repeat: 140, 278, 14, 28, 19, 28. However, this would be misleading, since once it generates 28, the sequence repeats the integers 28 and 19 indefinitely.
Given a seed, its period is the length of the cycle of numbers generated by a PRNG with respect to a given seed, once that cycle is produced. For this reason, in the above example, the seed 140 would have a period equal to 2, since it eventually produces a cycle containing just two elements.
We have already written a function HasRepeat()
to determine whether a sequence of integers has a repeated value. If so, then we would like to compute the period of the cycle generated by these numbers.
Code Challenge: Write a functionthat takes as input an array
ComputePeriodLength()
a
of integers. The function should return 0 if there are no values ina
that repeat; ifa
has a repeated value, then the function should return the period of the sequence.
Exercise: Callon every four-digit seed between 1 and 9999. How many seeds produce a sequence of period 10 or smaller? Is the Middle-Square approach a good PRNG?
GenerateMiddleSquareSequence()
The following lesson allows you to check your solutions to these challenges. It can also be found using a direct link.
Linear congruential generators
Another PRNG, called a linear congruential generator, produces the next integer y from the current integer x according to the equation
y = Remainder(a·x+c,m).
That is, we take the remainder when dividing a · x + c by m. In the linear congruential generator, a, c, and m are all constants. Different values of these constants will produce better or worse results, but a common choice is to set m equal to either a (large) prime number or a (large) power of 2.
Below, we provide pseudocode for a function GenerateLinearCongruenceSequence()
that takes an initial seed as input as well as the constants a, c, and m. This function returns the sequence of integers produced by the linear congruential generator. It stops as soon as we encounter a number that we have encountered before, which means that we will need to use our function
as a subroutine.HasRepeat()
GenerateLinearCongruenceSequence(seed, a, c, m) seq ← array of length 1 seq[0] ← seed while HasRepeat(seq) is false seed ← Remainder(a · seed + c, m) seq ← append(seq, seed) return seq
Code Challenge: Implement.
GenerateLinearCongruenceSequence()
Exercise: Let a = 5, c = 1, and m = 213 = 8192. What is the period when the seed is equal to 1? What can we conclude about the period of every seed between 1 and m − 1?
The following lesson allows you to check your solutions to these challenges. It can also be found using a direct link.
Up next
In this chapter, we have started encountering longer programs, with functions that call each other. As we transition to write longer programs to solve more complex tasks, we must plan carefully and organize our work.
In the next chapter, we will introduce a paradigm for planning larger coding projects called top-down programming. We will then apply this paradigm to see how to produce a simple simulation of a system that can replicate itself. Although we will cover this task in a single chapter, it took researchers 30 years to build such a system.