Learning objectives
We are now ready to return to the two algorithms for finding the GCD of two integers that we presented as pseudocode in the core text. In TrivialGCD()
, we consider every possible divisor p
between 1 and the minimum of the two input integers a
and b
, and take the largest common divisor.
TrivialGCD(a, b) d ← 1 m ← Min2(a,b) for every integer p between 1 and m if p is a divisor of both a and b d ← p return d
We then introduced EuclidGCD()
, which iteratively subtracts the smaller of the two integers from the larger until they are both equal.
EuclidGCD(a, b) while a ≠ b if a > b a ← a − b else b ← b − a return a
We claimed that EuclidGCD()
is a more efficient algorithm than TrivialGCD()
. In this code along, we will implement these two algorithms in Python and then time them to test our hypothesis.
Code along summary
Set up
Create a new folder called gcd
in your python/src
directory, and then create a new file within the python/src/gcd
folder called main.py
, which we will edit in this lesson.
In main.py
, add the following starter code.
def main(): print("Implementing two GCD algorithms.") if __name__ == "__main__": main()
Implementing the trivial GCD algorithm
We start with TrivialGCD()
, reproduced below.
TrivialGCD(a: int, b: int) -> int d ← 1 m ← Min2(a,b) for every integer p between 1 and m if p is a divisor of both a and b d ← p return d
Our introductory work has equipped us with most of what we need to know to implement this function, since we have discussed for loops and if statements, and we have already implemented a
function that min_2()
uses as a subroutine. We reproduce the Python implementation of trivial_gcd()
min_2()
below; make sure that it is present in main.py
so that it will be visible to our implementation of trivial_gcd()
.
Note: Of course, instead of usingmin_2()
, we could use themin()
function that’s built into Python.
def min_2(a: int, b: int) -> int: """ Takes two integers as input and returns their minimum. Parameters: - a (int): first integer - b (int): second integer Returns: int: minimum of a and b """ if a < b: return a else: return b
We are nearly ready to implement
, but we must first determine how to implement the line that tests whether trivial_gcd()
p
is a divisor of both input integers,
if p is a divisor of both a and b
First, note that we can rewrite trivial_gcd()
using two nested if statements instead of a single statement, as follows. This change allows us to focus solely on determining whether one integer is a divisor of another.
TrivialGCD(a: int,b: int) -> int d ← 1 m ← Min2(a,b) for every integer p between 1 and m if p is a divisor of a if p is a divisor of b //if we make it here, we know that p is a divisor of both a and b d ← p return d
From the previous code along, we know how to test when p
is a divisor of a
: it is when a % p
, the remainder of the division of a
by p
, is equal to zero.
def trivial_gcd(a: int, b: int) -> int: """ Returns the GCD of two integers by applying a trivial algorithm attempting each possible divisor of a and b up to their minimum. Parameters: - a (int): first integer - b (int): second integer Returns: int: GCD of a and b """ d = 1 m = min_2(a, b) for p in range(1, m + 1): if a % p == 0: if b % p == 0: # if we make it here, we know that p is a divisor of both a and b d = p return d
STOP: Let’s call our function on a couple of integers. Type the following code intodef main()
. Then, open a command line terminal and navigate into the folder containing your code by executing the command. Then, run the code by executing the command
cd python/src/gcd
(macOS/Linux) or
python3 main.py
python main.py
(Windows). You should see the correct value of 21 printed.
def main(): x = 63 y = 42 print(trivial_gcd(x, y))
Improving TrivialGCD(), and the “and” and “or” keywords
In practice, we do not need to use two nested if statements to test whether p
is a divisor of both a
and b
. Instead, we can test both statements as part of a single if statement using an and
keyword, which we encountered briefly when introducing conditionals. If x
and y
are two Boolean expressions (i.e., evaluating to True
or False
), then the expression
is x and y
precisely when both True
x
and y
are True
; it is
otherwise. We will use this idea to update our implementation of False
with a single if statement as follows.trivial_gcd()
def trivial_gcd(a: int, b: int) -> int: """ Returns the GCD of two integers by applying a trivial algorithm attempting each possible divisor of a and b up to their minimum. Parameters: - a (int): first integer - b (int): second integer Returns: int: GCD of a and b """ d = 1 m = min_2(a, b) for p in range(1, m + 1): if a % p == 0 and b % p == 0: d = p return d
Note: For a given value ofa
andp
, ifa % p == 0
isFalse
, then Python will not bother to check whetherb % p
is equal to 0, since there is no way that both statements can be true. This is called short circuiting.
Python also has an or
keyword. The expression
is x or y
precisely when at least one of True
x
and y
are True
; it is
when both False
x
and y
are False
. A truth table illustrating all four different outcomes for the and
and or
keywords is shown in the figure below.
x | y | x and y | x or y |
True | True | True | True |
True | False | False | True |
False | True | False | True |
False | False | False | False |
Note: When checking the truth ofx or y
, Python uses short circuiting as well. Ifx
isTrue
, thenx or y
is automaticallyTrue
, regardless of the value ofy
, and Python immediately considersx or y
to beTrue
.
Note: A common pitfall for introductory programmers is ensuring that both statements connected by a logical connective likemust be syntactically correct statements in the language. For example, the expression
and
if a % p and b % p == 0:
may produce an unexpected result in Python because although it may make sense to a human, Python views the expressionas an integer, which is always
a % p
True
unless it is equal to 0.
STOP: Say that we replaced the lineif a % p == 0 and b % p == 0:
intrivial_gcd()
with the incorrect lineif a % p == 0 or b % p == 0:
. What would the function return instead of the GCD?
STOP: Technically,trivial_gcd()
contains a bug because according to the strictest mathematical definition, for any integer n, the GCD of 0 and n is equal to n. What doestrivial_gcd()
currently return in this situation, and how could we fix it?
Implementing Euclid’s nontrivial GCD algorithm
We now return to implementing Euclid’s algorithm. The pseudocode for EuclidGCD()
is reproduced below.
EuclidGCD(a, b) while a ≠ b if a > b a ← a − b else b ← b − a return a
The only aspect of this function that we have not yet introduced is the test for inequality of a
and b
, while a ≠ b
. In a previous lesson, we introduced a collection of comparison operators. In particular, we saw that ≠
can be represented in Python using the operator !=
. We use this operator in our implementation of EuclidGCD()
below.
def euclid_gcd(a: int, b: int) -> int: """ Returns the GCD of two integers by applying Euclid's algorithm. Parameters: - a (int): first integer - b (int): second integer Returns: int: GCD of a and b """ while a != b: if a > b: a = a - b else: b = b - a return a
We can now check that euclid_gcd()
is working by calling it on a couple of real inputs; update main() as follows.
def main(): x = 63 y = 42 print(euclid_gcd(x, y))
STOP: As withtrivial_gcd()
,euclid_gcd(0, n)
should returnn
for any positive integern
. What currently happens when we calleuclid_gcd()
in this situation, and how could we fix it?
Timing the two GCD algorithms
You may still be wondering what the big deal is. Is Euclid’s approach for finding a GCD really that much better? Aren’t modern computers so fast that they make this problem obsolete?
To answer this question, let’s time
and trivial_gcd()
on the same inputs. The code that we will use for timing will prove handy throughout our programming future for measuring the efficiency of different functions.euclid_gcd()
First, we will need to add a statement to the top of main.py
to import Python’s time
module; a module is a collection of pre-written functions that we can use without having to write from scratch.
import time
In def main()
, we declare two positive integer variables and set them to values that are on the order of a million; it would be better to use randomly generated integers, but we will discuss how to generate random integers in a later chapter.
def main(): x = 3782026 y = 2731479
Next, we will time trivial_gcd()
. First, we declare a variable named start
that is equal to the current time using the built-in function
. This function takes no inputs, and returns a time.time()
float
equal to the number of seconds that have elapsed since January 1, 1970, 00:00:00 (UTC). We can think of declaring start
as clicking the “start” button on a stopwatch.
def main(): x = 3782026 y = 2731479 # timing trivial_gcd start = time.time()
We then will call trivial_gcd(x, y)
. We do not need the output of trivial_gcd(x, y)
, and so we will not set a variable equal to this output. (Don’t worry, Python’s interpreter won’t get angry.)
def main(): x = 3782026 y = 2731479 # timing trivial_gcd start = time.time() trivial_gcd(x, y)
Finally, we need to click the “stop” button on our conceptual stopwatch. To do so, we call the function
again, subtracting the time.time()
variable from the result, which will tell us how long start
trivial_gcd()
ran. We store the result of this subtraction in a variable elapsed
.
def main(): x = 3782026 y = 2731479 # timing trivial_gcd start = time.time() trivial_gcd(x, y) elapsed = time.time() - start
We now need to display the amount of time that has gone by. We could always simply call print(elapsed)
, but we will use this example as an opportunity to show how to format this output more nicely. Python provides a powerful feature called f-strings, which allow us to add formatting to certain parts of a string based on the values of variables. In this case, the line
print(f"trivial_gcd took {elapsed:.6f} seconds")
prints the value of elapsed
to six decimal places. This command is similar to the printf
command from other programming languages like C, which is why you’ll sometimes hear this style called “printf-style formatting.” We will adapt the following code any time we need to time functions.
def main(): x = 3782026 y = 2731479 # timing trivial_gcd start = time.time() trivial_gcd(x, y) elapsed = time.time() - start print(f"trivial_gcd took {elapsed:.6f} seconds")
STOP: Run this code, either on your computer or by using the window below. What do you find?
We now will repeat this process and time euclid_gcd()
, updating def main()
as shown below. Note that we are very careful to run euclid_gcd()
on the exact same input integers that we used for trivial_gcd()
, and that we place only the call to euclid_gcd()
between the lines invoking the start time.time()
and end time.time()
so that we obtain accurate results.
def main(): x = 3782026 y = 2731479 # timing trivial_gcd start = time.time() trivial_gcd(x, y) elapsed_trivial = time.time() - start print(f"trivial_gcd took {elapsed_trivial:.6f} seconds") # timing euclid_gcd start = time.time() euclid_gcd(x, y) elapsed_euclid = time.time() - start print(f"euclid_gcd took {elapsed_euclid:.6f} seconds")
Finally, we print the speedup, which is the ratio of the runtime of trivial_gcd()
to that of euclid_gcd()
.
def main(): x = 3782026 y = 2731479 # timing trivial_gcd start = time.time() trivial_gcd(x, y) elapsed_trivial = time.time() - start print(f"trivial_gcd took {elapsed_trivial:.6f} seconds") # timing euclid_gcd start = time.time() euclid_gcd(x, y) elapsed_euclid = time.time() - start print(f"euclid_gcd took {elapsed_euclid:.6f} seconds") # compute and print speedup if elapsed_euclid > 0: speedup = elapsed_trivial / elapsed_euclid print(f"Speedup: {speedup:.2f}x faster")
STOP: Run this code, either on your computer or by using the window below. How do the two approaches compare?
Furthermore, let’s see what happens as we make the numbers larger.
STOP: Add a digit to each ofx
andin your code. How does it affect the runtime of the two algorithms? What about if you add two digits to
y
x
and? What is this demonstrating?
y
The previous question is hinting at a critical theme of programming, which is that the benefits of a faster algorithm provides tend to increase as the size of the input parameters increases. Computing the GCD of two small numbers may be relatively fast with either trivial_gcd()
or euclid_gcd()
. However, as we increase the size of the input integers to the order of 10 million, trivial_gcd()
takes several seconds on a typical computer, while euclid_gcd()
finishes in about a thousandth of a second.
This phenomenon is illustrated in the following table, which illustrates the runtime of the two algorithms on input integers having different numbers of digits. We can quantify how much faster euclid_gcd()
is by taking the ratio of the runtime of trivial_gcd()
to euclid_gcd()
on the same input; this ratio is called the speedup of euclid_gcd()
compared to trivial_gcd()
. As we continue to increase the size of the input integers, trivial_gcd()
will start to exhaust considerable computational resources, whereas euclid_gcd()
will finish in the blink of an eye.
STOP: Time the two approaches for numbers with a variety of differing numbers of digits. Are your results similar to those of the table below?
Number of Digits | trivial_gcd() Runtime | euclid_gcd() Runtime | Speedup |
6 | 15.4 ms | 43 μs | 358 |
7 | 0.154 s | 43 μs | 3,580 |
8 | 1.56 s | 42 μs | 37,143 |
9 | (Times out in online IDE) | 43 μs | Unknown |
trivial_gcd()
and euclid_gcd()
for two integers having an indicated number of digits. The final column shows the “speedup” offered by euclid_gcd()
, which is the ratio between the runtime of trivial_gcd()
and euclid_gcd()
. Observe that the speedup increases as the size of the input increases. The units ms and μs denote milliseconds (thousandths of seconds) and microseconds (millionths of seconds), respectively.As we move toward the final code-along in this introductory chapter, we are now ready to consider the second nontrivial algorithm of this chapter, the sieve of Eratosthenes. Will we find that it provides a considerable speedup as well?
For the precocious: an even faster Euclid GCD algorithm
Say that we are taking the GCD of 50 and 375. Euclid’s algorithm would subtract 50 from 375 to obtain 325, concluding that GCD(50, 375) = GCD (50, 325). In turn, it would subtract 50 from 325 to obtain 275, concluding that GCD(50, 325) = GCD(50, 275). In this manner, it would by repeated subtractions obtain that the original GCD is the same as GCD(50, 75), which is equal to GCD(50, 25).
Yet remembering that repeated subtraction is just division, we note that we obtain 25 by taking the remainder after dividing 375 by 50. In other words, Euclid’s algorithm is relying on the fact that GCD(a, b) = GCD(a, b–a), but there is a stronger result that GCD(a, b) = GCD(a, b%a). By repeatedly applying this fact, we can obtain an optimized version of Euclid’s algorithm that we call faster_euclid_gcd()
. In fact, the algorithm below is implemented by the gcd() function contained within Python’s math
module.
def faster_euclid_gcd(a: int, b: int) -> int: """ Computes the greatest common divisor (GCD) of two positive integers using an optimized version of Euclid's algorithm that replaces subtraction with the modulus operator. Parameters: - a (int): a positive integer - b (int): a positive integer Returns: int: the greatest common divisor of a and b """ while a != 0 and b != 0: if a > b: a = a % b else: b = b % a if a == 0: return b else: return a
STOP: Timefaster_euclid_gcd()
againsteuclid_gcd()
. What is the speedup that it provides for integers with 6, 7, 8, and 9 digits?
Looking ahead
We have implemented one elegant mathematical algorithm from antiquity, but one remains: the sieve of Eratosthenes for finding prime numbers. To implement this algorithm and store multiple prime numbers, we will need to learn more about how Python implements arrays, which allow us to store more than one value at a time.
Check your work from the code along
We provide autograders in the window below (or via a direct link) allowing you to check your work for the following functions:
trivial_gcd()
euclid_gcd()