## Eratosthenes meets Sagan

Who first computed the circumference of the Earth? When asked this question, some students will still cheerily reply, “Christopher Columbus”! But the correct answer is Eratosthenes of Cyrene, a Greek polymath living in Alexandria, who in the 3rd Century BCE devised an ingenious experiment. No one has better explained Eratosthenes’ work than Carl Sagan, and we leave this task to him.

## A trivial algorithm for finding prime numbers

Although Sagan’s story is compelling, and one that is not told often enough, the reason why we have introduced you to Eratosthenes is because of another of his interests: finding prime numbers. We remind you that an integer greater than 1 is **prime** if its only positive divisors are 1 and itself, and an integer greater than 1 is **composite** if it is not prime.

The ancient Greeks were perennially looking for order in the world around them, and yet the primes stubbornly refused to follow any discernible pattern. What is special about the number 1423 that makes it prime, while 1421 is kept out of the club?

Finding primes is a natural task to transform into a computational problem, which we state for a single input integer variable.

ProblemPrime NumberInput:An integerp.Output:“Yes” ifpis prime, and “no” otherwise.

The Prime Number Problem is an example of a **decision problem**, a computational problem that always outputs “yes” or “no”. These problems may seem simple, but we will see much later in this work that they lie at the dark and mysterious heart of computer science.

For now, we will use the keywords

and `true`

to represent “yes” and “no”, respectively. These keywords can be returned or assigned as values to a variable; a variable that can take either `false`

or `true`

as a value is called a `false`

**boolean variable**.

The following pseudocode uses the

/`true`

keywords to solve the Prime Number Problem.`false`

IsPrime(p) for every integer k between 2 and p − 1 if k is a divisor of p return false return true

STOP:Change the first line of`to the following:`

`IsPrime()`

`. How would this change the output of`

`for every integer k between 1 and p − 1`

`? What if we instead change this line to`

`IsPrime()`

`?`

`for every integer k between 2 and p`

By running `IsPrime()`

for larger and larger input values, we can find as many primes as we like (see table below). However, this method would certainly be called the trivial algorithm for finding primes. Can we find a nontrivial algorithm to find primes faster?

One way to improve the trivial prime-finding algorithm is to make a tweak to

. If an integer `IsPrime()`

*x *is a divisor of *p*, then there must be some other integer *y* such that *x* · *y* = *p*. If *x *< *y*, then consider ranging through the following loop. We will conclude that *p *is composite (and return

) when `false`

*k* = *x* and never need to consider the case that *k* = *y*. Furthermore, since *x *< *y*, *x *must be at most √*p*.

for every integer k between 2 and p − 1 if k is a divisor of p return false

As a result, we don’t need to consider all possible divisors *k* of *p *up to *p *− 1; it suffices to stop at √*p*, and we can therefore revise

as follows.`IsPrime()`

IsPrime(p) for every integer k between 2 and √p if k is a divisor of p return false return true

This insight has helped us speed up our approach, but Eratosthenes’ idea was more clever. Yet before we describe his algorithm, which we will recklessly call “the world’s second nontrivial algorithm”, we will first ask ourselves how we know that the prime numbers go on forever. After all, if it were true that there were some *largest *prime, then we would not care so much about fast prime-finding algorithms: we could just find all the primes once and put them into a big database.

It may seem ridiculous that the prime numbers would eventually stop, but this idea is not such a radical one. If we count the prime numbers that we find up to a given point, we see that they become rarer and rarer as the integers become larger (see figure below). What is to say that we don’t eventually run out of prime numbers? (And why, if your teacher told you that the primes go on forever, did you believe them?)

## The infinitude of the prime numbers

The proof that there are infinitely many prime numbers is another one of Euclid’s gems. This chapter has become long, but the proof is so beautiful that we must show it in case you have not seen it. (If you have seen this proof, then feel free to skip this section.)

Before proving that there are infinitely many primes, we state a mathematical fact that we will use soon.

Fact:Every composite integer larger than 1 has at least one prime factor.

Why is this fact true? Consider any composite integer *n*; because *n *is composite, it has factors other than itself and 1. Consider the smallest of these factors, which we call *p*. There is no way that *p *can be composite, since if it were, it would itself have a divisor other than itself and 1, which would also be a factor of *n*. This cannot be true since *p *is the smallest factor of *n*.

The above fact will prove useful when we replicate Euclid’s argument in the following theorem.

Theorem: There are infinitely many prime numbers.

To prove this theorem, we will employ an approach called **proof by contradiction**, in which we assume the opposite of what we want to demonstrate and show that it must be false. In this case, the opposite of what we want to prove is that there are *finitely *many prime numbers. How can we show that this statement is false?

Since we are assuming that there are finitely many primes, we can use the variable *n *to refer to the total number of these primes. Furthermore, we can label all of these primes using the notation *p*_{1} , *p*_{2} , . . . , *p _{n} *. Consider the number

*q*formed as the product of all the primes:

*q *= *p*_{1} · *p*_{2} · ⋯ · *p _{n}* .

STOP:Isqcomposite? Why or why not?

Certainly, *q *is composite because all of the primes *p _{i} *must be divisors of

*q*. However, consider the number

*p*that is one more than

*q*:

*p *= *q* + 1 = (*p*_{1} · *p*_{2} · ⋯ · *p _{n}*) + 1 .

STOP:Is p composite? Why or why not?

Because *p *is clearly larger than all of the finitely many primes *p _{i}*, it must be composite. Yet consider what happens when we divide each prime into

*p*. If we divide

*p*

_{1}into

*p*, we obtain

*p*

_{2}·

*p*

_{3}· ⋯ ·

*p*with a remainder of 1. If we divide

_{n}*p*

_{2}into

*p*, we obtain

*p*

_{1}·

*p*

_{3}·

*p*

_{4}· ⋯ ·

*p*with a remainder of 1. And so on; any prime

_{n}*p*that we divide into

_{i }*p*leaves a remainder of 1.

In short, none of the finitely many primes *p _{i} *is a divisor of

*p*. But according to the fact presented before this theorem,

*p*must have some prime factor. It is impossible for these two statements to be true, and so we obtain a

**contradiction**. Because our only assumption was that there are finitely many primes, we can conclude that the primes are infinite.

## Factorials, arrays, and Eratosthenes’ insight

Now that we have proven that the primes really do march on forever, we return to Eratosthenes and his desire to find primes quickly. To help illustrate his insight, we will use a simpler example.

Say that you had gone to the effort of computing 100! by hand, and then someone asked you what 101! is. The last thing you would do would be to start multiply 1 by 2 by 3, and so on. Rather, you would simply multiply 100! by 101. What we are getting at is that it makes sense to store intermediate factorials that we obtain in a table rather than discarding them along the way.

In general, a table or ordered list of *n *variables is called an **array** of length *n*. If the name of our array is *a*, then the first variable in the array is denoted *a*[0], the second variable is denoted *a*[1], and so on. The numbers 0, 1, and so on are called the **indices** of the array. You have probably noticed that we use **0-based indexing** of the array, in which we start counting indices at 0 instead of 1, a paradigm that may be alien but that is used by most modern programming languages.

STOP:What is the index of the last variable in an array of lengthn?

For the factorial example, we want *a*[0] to equal 0! = 1, *a*[1] to equal 1! = 1, *a*[2] to equal 2! = 2, *a*[3] to equal 3! = 6, and so on, up until *a*[*n*] = *n*!. Note that the length of this array is *n *+ 1. We can state the generation of this array as a computational problem.

ProblemFactorial ArrayInput:An integern.Output:An array of lengthn+ 1 containing the values 0!, 1!, 2!, . . . ,n!

The following pseudocode solves the Factorial Array Problem, and it requires only a slight modification to the

function.`AnotherFactorial()`

FactorialArray(n) a ← array of length n a[0] ← 1 for every integer k between 1 and n a[k] ← a[k−1]·k return a

Exercise:TheFibonacci numbersare the classical sequence of integers given by (1, 1, 2, 3, 5, 8, 13, 21, …), in which the first two members of the sequence are equal to 1, and every subsequent number is equal to the sum of the two preceding numbers. Write a function`that takes as input an integer`

`FibonacciArray()`

`n`

and returns an array containing the first`n`

Fibonacci numbers.

We can now state our problem of finding primes as an array problem.

ProblemPrime Number ArrayInput:An integern.Output:An arrayprimeBooleansof lengthn+ 1 such that for every positive integerp≤n,primeBooleans[p] is`if`

`true`

pis prime andprimeBooleans[p] is`otherwise.`

`false`

The following pseudocode implements the trivial algorithm for prime finding to solve the Prime Number Array Problem. Note that this function calls

as a subroutine.`IsPrime()`

TrivialPrimeFinder(n) primeBooleans ← array of n + 1 false boolean variables for every integer p from 2 to n if IsPrime(p) is true primeBooleans[p] ← true return primeBooleans

Much like our reasoning with

, the duplication of effort in `FactorialArray`

is the motivation behindEratosthenes’s idea for finding primes. The inefficiency in the trivial prime-finding approach is that once we find that a number is prime, we can automatically conclude that all multiples of this number are composite. For example, once we have determined that 1423 is prime, we automatically know that 2 · 1423 = 2846, 3 · 1423 = 4269, 4 · 1423 = 5692, and so on are all composite. In other words, every multiple of a prime number is composite. The generalization of this idea to an algorithm is called the `TrivialPrimeFinder`

**sieve of Eratosthenes**, which is illustrated in the figure below and sometimes still taught to students as a way of finding primes by hand.

In this example, our largest number is *n *= 120. We start with *p *= 2 and cross off all multiples of 2, immediately concluding that almost half of the numbers considered are composite. We then look at the next number, *p *= 3, which is prime, and so we cross off all multiples of 3. Since *p *= 4 is composite, we skip it, as multiples of 4 will have *already *been labeled composite since they are also multiples of 2. In this manner, we cross off multiples of 5 and 7. Since 8, 9, and 10 are composite, we do not need to cross off their multiples. The next prime number would be 11, but note that 11 is larger than √120 = 10.954… The multiples of this number stop at 11 · 10 = 110, and therefore will already have been labeled composite in a previous step. We can therefore halt the process and look at all remaining numbers that have not been crossed off; these numbers (11, 13, 17, 19, etc.) must all be prime.

The following pseudocode implementing the Sieve of Eratosthenes starts by assuming that every number (other than 1) is prime. At each step, it looks for the smallest prime that we have not already identified, and then crosses off all multiples of this prime (for which it employs a subroutine called `CrossOffMultiples`

).

SieveOfEratosthenes(n) primeBooleans ← array of n + 1 true boolean variables primeBooleans[0] ← false primeBooleans[1] ← false for every integer p between 2 and √n if primeBooleans[p] = true primeBooleans ← CrossOffMultiples(primeBooleans, p) return primeBooleans

As for the `CrossOffMultiples()`

subroutine, it takes an existing array `primeBooleans`

of Boolean variables as input along with an integer `p`

. For every integer `k`

that is a multiple of `p`

between 2`p`

and the final index of the array, this function sets `primeBooleans[k]`

equal to `false`

, which corresponds to “crossing off” these integers in the Sieve of Eratosthenes table. It then returns the updated `primeBooleans`

array.

Note:A tricky part about the`CrossOffMultiples()`

function is its first line. Recall that in 0-based indexing, the indices of an array having lengthkrange from 0 tok– 1. As a result, in the pseudocode below, we setnequal to 1 less than the length of`primeBooleans`

, which we access using a`length`

function that is typically built into programming languages for determining the length of an input array.

CrossOffMultiples(primeBooleans, p) n ← length(primeBooleans) - 1 for every multiple k of p (from 2p to n) primeBooleans[k] ← false return primeBooleans

Exercise:It is possible to make`even faster. If you are mathematically inclined, see if you can see how. (Hint: do we ever cross off integers that have already been crossed off?)`

`SieveOfEratosthenes()`

## Forming a list of prime numbers

We now have two prime-finding algorithms. Yet if someone were to ask us what the prime numbers up to some positive integer *n* are, we would currently only be able to give them an array of boolean variables, where a variable is `true`

if the number is prime and `false`

otherwise. It would be better to provide a list of the integers that are prime.

The list of primes can be represented by an array *primes*. Although there are ways of estimating the number of primes up to and including *n*, we will not know how long this array should be in advance. Instead, we will start *primes* as having no elements; we will then *append* every prime number *p* found to the end of the array. After this process, *primes* will store the prime numbers between 2 and *n* in increasing order. The following pseudocode for a function

implements this idea, using `ListPrimes()`

as a subroutine as well as a function called `SieveOfEratosthenes()`

that is built into many languages. The notation`append`

primes ← append(primes, p)

indicates that we take an array `primes`

and an integer `p`

as input, append `p`

to `primes`

, and then return this array, which we use to update `primes`

.

ListPrimes(n) primes ← array of integers of length 0 primeBooleans ← SieveOfEratosthenes(n) for every integer p from 0 to n+1 if primeBooleans[p] = true primes ← append(primes, p) return primes

## It’s time to code

You may still be skeptical that two algorithms for something as straightforward as finding prime numbers could really be different in terms of performance. So let’s learn some coding basics, and use them to implement these two algorithms and time them so that we can see for ourselves how fast the ancient Greeks’ algorithms for calculating a GCD and finding primes really are!