Code Along - Dictionaries in Python

Code along video

Although we strongly suggest coding along with us by following the video above, you can find completed code from the code along in our course code repository.

Learning Objectives

In the previous code along, we discussed how to identify the occurrences of a substring within a larger text. We would like to return to the more general problem of finding the most frequent strings (of some fixed length) within the text, which we called the Frequent Words Problem.

Frequent Words Problem

Input: A string text and an integer k.

Output: All most frequent k-mers in text.

To solve the Frequent Words Problem, we will need to build a frequency table associating each k-mer substring of a string with its number of occurrences. Recall the following example from the core text, which shows the frequency table for text equal to "ACGTTTCACGTTTTACGG" and k equal to 3.

**Figure:** A table corresponding to counting the number of occurrences of every 3-mer in *text* = “`ACGTTTCACGTTTTACGG"`.

The frequency table exemplifies a generalized version of an array called a map for which the indices are allowed to be arbitrary values (in this case, they are strings); the indices of a map are called its keys. In Python, maps are called dictionaries. In this lesson, we will learn about how Python implements dictionaries, which will prepare us to solve the Frequent Words Problem.

Code along summary

Setup

Create a folder called dictionaries in your python/src directory and create a text file called main.py in the python/src/dictionaries folder. We will edit main.py, which should have the following starter code.

def main():
    print("Dictionaries in Python!")


if __name__ == "__main__":
    main()

Declaring and working with dictionaries

Declaring a dictionary is similar to declaring a list in Python. For example, let’s declare a dictionary polls that associates a state name (a string) to the polling percentage for a candidate (a decimal). We will work more with this dictionary in the next chapter when we simulate elections from polling data.

def main():
    print("Dictionaries in Python!")

    # Declare an empty dictionary.
    polls = {}

The declaration polls = {} will work to declare any dictionary, but it’s a great coding practice to signal to ourselves and anyone examining our code that we know that the keys are strings and the values are integers. To do so, we can use the special declaration polls: dict[str, float] = {}.

def main():
    print("Dictionaries in Python!")

    # Declare an empty dictionary.
    polls: dict[str, float] = {}

Accessing the value corresponding to a key in a dictionary is similar to accessing the value corresponding to some index of an array or list. With a list, we need to ensure that the i-th element of the slice exists before setting it; in the case of a dictionary, we can set the value corresponding to a key even if the dictionary does not yet contain the key. We illustrate this below by setting a few values of polls.

def main():
    print("Dictionaries in Python!")

    # Declare an empty dictionary.
    polls: dict[str, float] = {}

    polls["Pennsylvania"] = 0.517
    polls["Ohio"] = 0.488
    polls["Texas"] = 0.378
    polls["Florida"] = 0.5

Although we can set the value of any key we choose, dictionaries do have a length that is equal to the number of keys that we have been assigned values. As with lists, we can use the len() function to determine the number of key-value pairs a dictionary has.

def main():
    print("Dictionaries in Python!")

    # Declare an empty dictionary.
    polls: dict[str, float] = {}

    polls["Pennsylvania"] = 0.517
    polls["Ohio"] = 0.488
    polls["Texas"] = 0.378
    polls["Florida"] = 0.5

    print("The number of states in the dictionary is", len(polls)) # Prints 4

Click Run 👇 to try it!

STOP: Open a terminal, and navigate to the dictionaries directory using the command cd python/src/dictionaries. Then, run your code using the command python3 main.py (on macOS/Linux) or python main.py (on Windows).

Note: Some beginner programmers think that dictionaries can replace arrays in all contexts. Why bother with arrays if we can just declare a dictionary associating integer keys with integer values? Your author had this same thought once and can vouch that it is a horrible idea; the inherent ordering structure of indices that arrays provide is a feature to be embraced.

You might be wondering what happens if we attempt to access an entry of a dictionary that does not exist. If we add the following line of code to main(), then running the code raises a KeyError because "Vermont" is not a key in the polls dictionary.

def main():
    print("Dictionaries in Python!")

    # Declare an empty dictionary.
    polls: dict[str, float] = {}

    polls["Pennsylvania"] = 0.517
    polls["Ohio"] = 0.488
    polls["Texas"] = 0.378
    polls["Florida"] = 0.5

    print("The number of states in the dictionary is " + str(len(polls))) # Prints 4

    print(polls["Vermont"]) # raises KeyError

Click Run 👇 to try it!

This can be fixed by ensuring that we have set a value associated with "Vermont".

def main():
    print("Dictionaries in Python!")

    # Declare an empty dictionary.
    polls: dict[str, float] = {}

    polls["Pennsylvania"] = 0.517
    polls["Ohio"] = 0.488
    polls["Texas"] = 0.378
    polls["Florida"] = 0.5

    print("The number of states in the dictionary is " + str(len(polls))) # Prints 4

    polls["Vermont"] = 0.69

    print(polls["Vermont"]) # works!

Click Run 👇 to try it!

Removing an element from a dictionary

Let’s do the world a favor and rid ourselves of Florida once and for all. We can do so using a built in del operator that requires two things: the name of the dictionary variable and the key that we wish to delete from the dictionary. (Note that this is only useful if you know that the key exists, if it does not it will raise a KeyError. In this case, to remove the entry associated with “Florida”, we call del polls["Florida"].

def main():
    # Code omitted for clarity

    # Getting rid of Florida.
    del polls["Florida"]

    print("The number of states in the dictionary after removing Florida is " + str(len(polls))) # Should print 4.

We could even be a little more careful with deleting Florida by checking if "Florida" is a key of the dictionary. The check "Florida" in polls will return True if so, and False if not.

def main():
    # Code omitted for clarity

    # Getting rid of Florida.
    if "Florida" in polls:
        del polls["Florida"]

    print("The number of states in the dictionary after removing Florida is " + str(len(polls))) # Should print 4.

Click Run 👇 to try it!

List and dictionary literals

We can also set the values of lists and dictionaries when they are defined, which is helpful when the number of values is small or fixed; these lists or dictionaries are called literals. For example, we could declare the following list containing five integers.

def main():
    # code omitted for clarity

    primes = [2, 3, 5, 7, 11]

Note: We will need to comment out this line, since we don’t intend to use symbols further in this code along.

We can declare short dictionaries too. For example, the following dictionary associates states with their number of electoral votes (more on this in the next chapter).

def main():
    # code omitted for clarity

    # Dictionary literal.

    electoral_votes: dict[str, int] = {
        "Pennsylvania": 20,
	"Ohio":         18,
	"Texas":        38
    }

Dictionaries are useful as replacements for match statements

When introducing strings, we introduced a match statement to replace lengthy if/elif checks.

def complement(dna: str) -> str:
    """
    Finds the complementary strand of the given string.
    
    Parameters:
    - dna (str): A dna string.
    
    Returns:
    - dna2: the string whose i-th symbol is the complementary 
    nucleotide of the i-th symbol of the input string. (A-T, C-G, T-A, G-C)
    """

    dna2 = ""

    for symbol in dna:
        match symbol:
            case "A":
                dna2 += "T"
            case "C":
                dna2 += "G"
            case "G":
                dna2 += "C"
            case "T":
                dna2 += "A"
            case _:
                raise ValueError("Invalid symbol in string given to complement().")

    return dna2

Dictionaries provide yet another way of handling multiple cases succinctly. For example, we could rewrite the above function as follows by using a dictionary called comp_dict associating each nucleotide symbol with its complementary nucleotide.

def complement(dna: str) -> str:
    """
    Finds the complementary strand of the given string.
    
    Parameters:
    - dna (str): A dna string.
    
    Returns:
    - dna2: the string whose i-th symbol is the complementary 
    nucleotide of the i-th symbol of the input string. (A-T, C-G, T-A, G-C)
    """
    comp_dict = {
        "A": "T",
        "T": "A",
        "C": "G",
        "G": "C",
    }

    dna2 = ""

    for symbol in dna:
        if symbol not in comp:
            raise ValueError("Invalid symbol in string given to complement().")
        dna2 += comp[symbol]

    return dna2

Dictionaries are pass by reference

The number of electoral votes that each state receives is a function of its population. The number of votes that we presented in the previous section are from 2020, but in 2024, the electoral votes changed as voters fled miserable rust belt states like Pennsylvania and Ohio for the uninhabitable heat, low taxes, and plentiful ammunition of Texas.

Let’s use this demographic fact to write a function updating the electoral_votes dictionary to the appropriate 2024 values, in which Pennsylvania and Ohio each lost a vote and Texas gained two votes. We do so in the following update_votes_2024() function.

This function also allows us to discuss a couple of points related to type hints. First, note that we specify not just that the input dictionary is a dictionary, but that it maps strings to integers via the type hint dict[str, int]. Second, we introduce a new concept: if a function does not return anything, we use the keyword None in place of a return type. This allows the user to know that we are changing the input dictionary in place, meaning within the function.

def update_votes_2024(electoral_votes: dict[str, int]) -> None:
    """
    Modifies the votes in the input dictionary in place to reflect 2024 Electoral College votes.

    Parameters:
    - electoral_votes: A dictionary with a set of votes for a given area.

    Returns:
    - None: The dictionary is modified in place.
    """

    electoral_votes["Pennsylvania"] = 19
    electoral_votes["Ohio"] = 17
    electoral_votes["Texas"] = 40

We give this example not to make a cute political point but rather to illustrate a fact about dictionaries input to functions. After we call this function in main(), we will print out the number of votes allocated to Pennsylvania.

STOP: What do you think will be printed? 19 or 20?

def main():
    print("Dictionaries in Python!")

    # ...
    # Dictionary literal.

    electoral_votes: dict = {
        "Pennsylvania": 20,
		"Ohio":         18,
		"Texas":        38
    }

    update_votes_2024(electoral_votes)

    print("The number of electoral votes in Pennsylvania is now", electoral_votes["Pennsylvania"])

Click Run 👇 to try it!

When we run our code, we see that the number of votes printed is 19, so that the dictionaries that we created in main() has been changed by a function. This behavior is different from that of basic types like integers and strings, which we learned in a previous code along are passed by value to functions. In contrast, like lists, dictionaries are passed by reference, which means that we can edit dictionaries that are passed into functions. Python is primarily pass by object reference. This means that immutable objects like mentioned before are pass by value and mutable objects, like dictionaries and lists, are pass by reference.

Note: Technically, Python is “pass by object reference”, a concept that we will describe in a later chapter.

Ranging over a dictionary

Rather than printing only Pennsylvania’s electoral votes, let’s print the votes of all states. As we can range over the indices (and values) of an array, we can range over the keys (and values) of a dictionary. In particular, we can range over the keys of a dictionary dict by using the notation for key in dict:, where key is a variable that we can call what we like.

In the code below, we range over the keys of our electoral_votes dictionary in the same way that we ranged over the indices and values of a list. Inside the for loop, we print each state’s name and its polling percentage.

def main():
    print("Dictionaries in Python!")

    # ...

    # Dictionary literal.

    electoral_votes: dict = {
        "Pennsylvania": 20,
		"Ohio":         18,
		"Texas":        38
    }

    update_votes_2024(electoral_votes)

    for key in electoral_votes:
        print("The number of electoral votes in " + key + " is " + str(electoral_votes[key]))

STOP: In what order do you think the keys of the dictionary will be printed?

A convenience provided in Python 3.7 and later is that when ranging over a dictionary, the ranging occurs in the order in which the keys were inserted into the dictionary. As a result, we can expect to see Python print the number of electoral votes associated with Pennsylvania, then Ohio, and then Texas.

Click Run 👇 to try it!

Note: If we later update the value associated with a key, then the order of keys when iterating over the dictionary does not change.

Just as we can range over both the indices and the values of a list, we can also range over the keys and values of a dictionary dict using the notation for key, val in dict.items(), where key and val are variable names. If we don’t add the .items() to the dictionary name, then Python will only allow us to range over the keys of the dictionary.

def main():
    print("Dictionaries in Python!")

    # ...

    # Dictionary literal.

    electoral_votes: dict = {
        "Pennsylvania": 20,
		"Ohio":         18,
		"Texas":        38
    }

    update_votes_2024(electoral_votes)

    for key, val in electoral_votes.items():
        print("The number of electoral votes in " + key + " is " + str(val))

Click Run 👇 to try it!

Ranging over the keys of a dictionary in alphabetical order

Say that instead of ranging over a dictionary in the order in which keys were added to the dictionary, we would like to range over the keys in some other order, such as alphabetical order.

First, we should obtain the keys of the dictionary. For a dictionary dict, Python provides a built-in operator dict.keys() that returns the dictionary’s keys, which we can convert to a list using the built-in list() function. We start by calling this function.

def main():
    print("Dictionaries in Python!")

    # ...

    # Dictionary literal.

    electoral_votes: dict = {
        "Pennsylvania": 20,
		"Ohio":         18,
		"Texas":        38
    }

    update_votes_2024(electoral_votes)

    for key, val in electoral_votes.items():
        print("The number of electoral votes in " + key + " is " + str(val))

    keys = list(electoral_votes.keys())
    print(keys) # ["Pennsylvania", "Ohio", "Texas"]

Next, we will sort the keys of the list that we produced in alphabetical order. Python provides support for this task with the built-in function sort().

def main():
    print("Dictionaries in Python!")

    # ...

    # Dictionary literal.

    electoral_votes: dict = {
        "Pennsylvania": 20,
		"Ohio":         18,
		"Texas":        38
    }

    update_votes_2024(electoral_votes)

    for key, val in electoral_votes.items():
        print("The number of electoral votes in " + key + " is " + str(val))

    keys = list(electoral_votes.keys())
    print(keys) # ["Pennsylvania", "Ohio", "Texas"]

    keys.sort()
    print(keys) # ["Ohio", "Pennsylvania", "Texas"]

Now that we have the ordering of the keys that we want, we will range over the list keys, and for each value key that we find, print the value electoral_votes[key].

def main():
    print("Dictionaries in Python!")

    # ...

    # Dictionary literal.

    electoral_votes: dict = {
        "Pennsylvania": 20,
		"Ohio":         18,
		"Texas":        38
    }

    update_votes_2024(electoral_votes)

    for key, val in electoral_votes.items():
        print("The number of electoral votes in " + key + " is " + str(val))

    keys = list(electoral_votes.keys())
    print(keys) # ["Pennsylvania", "Ohio", "Texas"]

    keys.sort()
    print(keys) # ["Ohio", "Pennsylvania", "Texas"]

    for key in keys:
        print("The number of electoral votes in", key, "is", electoral_votes[key])

Although the above approach is intuitive and we suggest knowing it, Python provides an even faster approach. The built-in Python function sorted() takes as input a dictionary and returns the sorted list of its keys. We can therefore shorten the above function by simply ranging over sorted(electoral_votes), as shown below.

def main():
    print("Dictionaries in Python!")

    # ...

    # Dictionary literal.

    electoral_votes: dict = {
        "Pennsylvania": 20,
		"Ohio":         18,
		"Texas":        38
    }

    update_votes_2024(electoral_votes)

    for key, val in electoral_votes.items():
        print("The number of electoral votes in " + key + " is " + str(val))

    for state in sorted(electoral_votes):
        print("The number of electoral votes in " + state + " is " + str(electoral_votes[state]))

Click Run 👇 to try it!

Accessing the values of a dictionary

As we can obtain the keys of a dictionary dict, we can also obtain its values using the built-in function dict.values(). For example, we could sum the total collection of votes over all states using the following code.

def main():
    print("Dictionaries in Python!")

    # ...

    total_votes = 0
    for state_votes in electoral_votes.values():
        total_votes += state_votes
    print(total_votes)

Looking ahead

We are now ready to move on to the next code along and find frequent words in a string. You may like to try implementing the following functions on your own before joining us in the code along.

Exercise: Try implementing the following three pseudocode functions on your own.

FindFrequentWords(text, k)
    frequentPatterns ← an array of strings of length 0
    freqMap ← frequency_table(text, k)
    max ← max_map_value(freqMap)
    for all strings pattern in freqMap
        if freqMap[pattern] = max
            frequentPatterns ← append(frequentPatterns, pattern)
    return frequentPatterns

FrequencyTable(text, k)
    freqMap ← empty map
    n ← length(text)
    for every integer i between 0 and n − k
        pattern ← text[i, i + k]
        if freqMap[pattern] doesn't exist
            freqMap[pattern] = 1
        else
            freqMap[pattern]++
    return freqMap

MaxMapValue(dict)
    m ← 0
    firstTime = true
    for every key pattern in dict
        if firstTime = true or dict[pattern] > m
            firstTime= false
            m ← dict[pattern]
    return m

Next lesson