Code Along - Simulating an Election from Polling Data in Go

Learning objectives

In this code along, we will use Monte Carlo simulation to estimate the winner of the 2016 US Presidential election using polling data at three different points in time. To do so, we will revisit the SimulateMultipleElections() function that we introduced in the core text, which we reproduce below. This function largely consists of appealing to a SimulateOneElection() subroutine, which returns the electoral college votes for each of two US presidential candidates in a simulated election.

SimulateMultipleElections(pollingData, numTrials, marginOfError)
    winCount1 ← 0
    winCount2 ← 0
    tieCount ← 0
    for numTrials total trials
        votes1,votes2 ← SimulateOneElection(pollingData, marginOfError)
        if votes1 > votes2
            winCount1 ← winCount1 + 1
        else if collegeVotes2 > collegeVotes1
            winCount2 ← winCount2 + 1
        else (tie!)
            tieCount ← tieCount + 1
    probability1 ← winCount1/numTrials
    probability2 ← winCount2/numTrials
    probabilityTie ← tieCount/numTrials
    return probability1, probability2, probabilityTie

The SimulateOneElection() function examines the polling percentage (for candidate 1) in each state and adds noise to this percentage to reflect the fact that polls can only sample a small portion of the electorate and may be influenced by the effects of random noise.

SimulateOneElection(polls, electoralVotes, marginOfError)
    votes1 ← 0
    votes2 ← 0
    for every key state in polls
        poll ← candidate 1's polling percentage
        adjustedPoll ← AddNoise(poll, marginOfError)
        if adjustedPoll ≥ 0.5 (candidate 1 wins state)
            votes1 ← votes1 + electoralVotes[state]
        else (candidate 2 wins state)
            votes2 ← votes2 + electoralVotes[state]
    return votes1, votes2

As we saw in the code along on simulating craps, the work of generating pseudorandom numbers is passed to a low-level subroutine. In this case, that subroutine is AddNoise(), which takes a polling average and a margin of error and simulates a true polling number with a 95% chance of being within the margin of error. This function requires RandNormal(), a built-in function that generates a pseudorandom decimal according to the standard normal distribution (which has a mean equal to 0 and a standard deviation equal to 1).

AddNoise(poll, marginOfError)
    x ← RandNormal()
    x ← x/2 (95% chance of x being between -1 and 1)
    x ← x * marginOfError (now x is in range)
    return x + poll

Setup

To complete this code along, you will need to build upon the starter code that we provided in the previous code along on parsing election data. Ensure that you have an election directory under your go/src source code folder and that it contains a main.go file, an io.go file (with completed functions from the previous code along), and a data folder containing four data files that are explained further in the previous code along.

Your main.go file should contain the following code from our work on parsing election data.

package main 

import (
    "fmt"
)

func main() {
    fmt.Println("Simulating the 2016 US Presidential election.")

    electoralVoteFile := "data/electoralVotes.csv"
    pollFile := "data/earlyPolls.csv"

    electoralVotes := ReadElectoralVotes(electoralVoteFile)
    polls := ReadPollingData(pollFile)
}

Code along video

Beneath the video, we provide a detailed summary of the topics and code covered in the code along.

At the bottom of this page, you will have the opportunity to validate your work via auto-graded assessments that evaluate the functions covered in the code along.

Although we strongly suggest completing the code along on your own, you can find completed code from the code along in our course code repository.

Code along summary

Writing a function to simulate multiple elections

We start with implementing SimulateMultipleElections(), and we will implement subroutines as we encounter them. SimulateMultipleElections() takes four input parameters:

a map polls that maps the name of each state to that state’s polling percentages of candidate 1 (Clinton), where the polling percentage for candidate 2 (Trump) can be obtained by subtracting candidate 1’s polling percentage from 1;
a map electoralVotes that maps the name of each state to the number of Electoral College votes (as an unsigned integer) that the winner of that state receives;
an integer numTrials representing the number of Monte Carlo simulations to run;
a decimal marginOfError representing the margin of error of all polls, which we assume is a constant.

Note: There is no reason for numTrials to be a (signed) integer and the values of electoralVotes to be unsigned integers. In this course, we will generally only use signed integers, even if the underlying variable is never negative. However, we make the values of electoralVotes unsigned to give us practice with this variable type.

//SimulateMultipleElections takes polling data as a map of state names to floats (for candidate 1), along with a map of state names to electoral votes, a number of trials, and a margin of error in the polls.
//It returns three values: the estimated probabilities of candidate 1 winning, candidate 2 winning, and a tie.
func SimulateMultipleElections(polls map[string]float64, electoralVotes map[string]uint, numTrials int, marginOfError float64) (float64, float64, float64) {
    // to fill in
}

We will start by declaring three variables winCount1, winCount2, and tieCount, which respectively corresponding to the number of simulations won by candidate 1, the number of simulations won by candidate 2, and the number of simulations in which the two candidates tie.

//SimulateMultipleElections takes polling data as a map of state names to floats (for candidate 1), along with a map of state names to electoral votes, a number of trials, and a margin of error in the polls.
//It returns three values: the estimated probabilities of candidate 1 winning, candidate 2 winning, and a tie.
func SimulateMultipleElections(polls map[string]float64, electoralVotes map[string]uint, numTrials int, marginOfError float64) (float64, float64, float64) {
    // keep track of number of simulated elections won by each candidate (and ties)
    winCount1 := 0
    winCount2 := 0
    tieCount := 0 // oh no!

    // to fill in
}

Eventually, we will normalize each of these counts by dividing them by the total number of trials, and then return the resulting ratios.

//SimulateMultipleElections takes polling data as a map of state names to floats (for candidate 1), along with a map of state names to electoral votes, a number of trials, and a margin of error in the polls.
//It returns three values: the estimated probabilities of candidate 1 winning, candidate 2 winning, and a tie.
func SimulateMultipleElections(polls map[string]float64, electoralVotes map[string]uint, numTrials int, marginOfError float64) (float64, float64, float64) {
    // keep track of number of simulated elections won by each candidate (and ties)
    winCount1 := 0
    winCount2 := 0
    tieCount := 0 // oh no!

    // to fill in

    //divide number of wins (and ties) by number of trials
    probability1 := float64(winCount1) / float64(numTrials)
    probability2 := float64(winCount2) / float64(numTrials)
    tieProbability := float64(tieCount) / float64(numTrials)

    return probability1, probability2, tieProbability
}

We fill in the interior of SimulateMultipleElections() by running numTrials total simulations. Each simulation, we call SimulateOneElection(), which will take all of the inputs of SimulateMultipleElections() except for numTrials and return the number of electoral votes for each of candidate 1 and 2 in a simulated election. Based on who has more votes in this simulation (or if there is a tie), we then update the appropriate count variable.

//SimulateMultipleElections takes polling data as a map of state names to floats (for candidate 1), along with a map of state names to electoral votes, a number of trials, and a margin of error in the polls.
//It returns three values: the estimated probabilities of candidate 1 winning, candidate 2 winning, and a tie.
func SimulateMultipleElections(polls map[string]float64, electoralVotes map[string]uint, numTrials int, marginOfError float64) (float64, float64, float64) {
    // keep track of number of simulated elections won by each candidate (and ties)
    winCount1 := 0
    winCount2 := 0
    tieCount := 0 // oh no!

    //simulate an election n times and update counts each time
    for i := 0; i < numTrials; i++ {
        //call SimulateOneElection as a subroutine
        votes1, votes2 := SimulateOneElection(polls, electoralVotes, marginOfError)
        // did candidate 1 or candidate 2 win?
        if votes1 > votes2 {
            winCount1++
        } else if votes1 < votes2 {
            winCount2++
        } else { //tie!
            tieCount++
        }
    }

    //divide number of wins (and ties) by number of trials
    probability1 := float64(winCount1) / float64(numTrials)
    probability2 := float64(winCount2) / float64(numTrials)
    tieProbability := float64(tieCount) / float64(numTrials)

    return probability1, probability2, tieProbability
}

Simulating a single election

We now turn to implementing SimulateOneElection(). As we mentioned above, this function takes all of the same parameters as SimulateMultipleElections() except for numTrials. It returns two unsigned integers corresponding to the number of electoral college votes for candidate 1 and 2, respectively. We begin by declaring two unsigned integers to hold these votes, which we will eventually return.

//SimulateOneElection takes a map of state names to polling percentages, 
//along with a map of state names to electoral college votes and a margin of error.
//It returns the number of EC votes for each of the two candidates in one simulated election.
func SimulateOneElection(polls map[string]float64, electoralVotes map[string]uint, marginOfError float64) (uint, uint) {
    var collegeVotes1 uint = 0
    var collegeVotes2 uint = 0

    //to fill in

    return collegeVotes1, collegeVotes2
}

SimulateOneElection() needs to run the simulation over all states, and we can grab the state names and the current polling value by ranging over the keys and values of polls. We can then access the state’s number of electoral votes with electoralVotes[state].

//SimulateOneElection takes a map of state names to polling percentages, 
//along with a map of state names to electoral college votes and a margin of error.
//It returns the number of EC votes for each of the two candidates in one simulated election.
func SimulateOneElection(polls map[string]float64, electoralVotes map[string]uint, marginOfError float64) (uint, uint) {
    var collegeVotes1 uint = 0
    var collegeVotes2 uint = 0

    // range over all the states, and simulate the election in each state.
    for state, poll := range polls {
        //grab the number of electoral votes in the state
        numVotes := electoralVotes[state]

        // to fill in
    }

    return collegeVotes1, collegeVotes2
}

Because the polling value is not a precise estimate, we will first adjust the polling value by adding some randomized noise that is a normally distributed random variable with mean equal to zero and standard deviation equal to half of the polls’ margin of error, which we will pass to a subroutine AddNoise().

//SimulateOneElection takes a map of state names to polling percentages, 
//along with a map of state names to electoral college votes and a margin of error.
//It returns the number of EC votes for each of the two candidates in one simulated election.
func SimulateOneElection(polls map[string]float64, electoralVotes map[string]uint, marginOfError float64) (uint, uint) {
    var collegeVotes1 uint = 0
    var collegeVotes2 uint = 0

    // range over all the states, and simulate the election in each state.
    for state, poll := range polls {
        //grab the number of electoral votes in the state
        numVotes := electoralVotes[state]

        // let's adjust polling value with some randomized "noise"
        adjustedPoll := AddNoise(poll, marginOfError)

        // to fill in
    }

    return collegeVotes1, collegeVotes2
}

Now that we have an adjusted polling value, we must check whether it is greater than or equal to 0.5. If so, then we can conclude that candidate 1 won the state in this simulation, and otherwise, we can conclude that candidate 2 won the state in this simulation.

//SimulateOneElection takes a map of state names to polling percentages, 
//along with a map of state names to electoral college votes and a margin of error.
//It returns the number of EC votes for each of the two candidates in one simulated election.
func SimulateOneElection(polls map[string]float64, electoralVotes map[string]uint, marginOfError float64) (uint, uint) {
    var collegeVotes1 uint = 0
    var collegeVotes2 uint = 0

    // range over all the states, and simulate the election in each state.
    for state, poll := range polls {
        //grab the number of electoral votes in the state
        numVotes := electoralVotes[state]

        // let's adjust polling value with some randomized "noise"
        adjustedPoll := AddNoise(poll, marginOfError)

        // now we check who won state based on the adjusted poll ...
        if adjustedPoll >= 0.5 {
            // candidate 1 wins! give them the EC votes for the state
            collegeVotes1 += numVotes
        } else {
            //candidate 2 wins!
            collegeVotes2 += numVotes
        }
    }

    return collegeVotes1, collegeVotes2
}

Note: We would obtain the same result if we were instead to check if adjustedPoll is greater than 0.5, because since we will be generating a random decimal number, the chances that adjustedPoll is exactly equal to 0.5 are essentially zero.

Adding random noise to a polling value

We now turn to implementing AddNoise(), which takes as input a polling percentage in a state and the margin of error and returns an adjusted polling percentage corresponding to a simulated polling percentage. We first generate a random number x from the standard normal distribution, which has mean equal to 0 and standard deviation equal to 1. Go implements this with a NormFloat64() function in the "rand" package.

//AddNoise takes a polling value for candidate 1 and a margin of error. It returns an adjusted polling value for candidate 1 after adding random noise.
func AddNoise(pollingValue, marginOfError float64) float64 {
    x := rand.NormFloat64() // random number from standard normal distribution

    // to fill in
}

The standard error of generating x is equal to 2 (twice the standard deviation of the standard normal distribution from which we just sampled), and so we can obtain a number with standard error equal to 1 by halving x.

//AddNoise takes a polling value for candidate 1 and a margin of error. It returns an adjusted polling value for candidate 1 after adding random noise.
func AddNoise(pollingValue, marginOfError float64) float64 {
    x := rand.NormFloat64() // random number from standard normal distribution
    x = x / 2               // x has ~95% chance of being between -1 and 1

    // to fill in
}

We then can ensure that the process of generating x has margin of error equal to marginOfError by multiplying x by marginOfError.

//AddNoise takes a polling value for candidate 1 and a margin of error. It returns an adjusted polling value for candidate 1 after adding random noise.
func AddNoise(pollingValue, marginOfError float64) float64 {
    x := rand.NormFloat64() // random number from standard normal distribution
    x = x / 2               // x has ~95% chance of being between -1 and 1
    x = x * marginOfError   // x has 95% chance of being -marginOfError and marginOfError

    // to fill in
}

We have now obtained our desired noise value, and so we add the value of x to the existing polling value to ensure that the value that we return has mean equal to pollingValue and margin of error equal to marginOfError.

//AddNoise takes a polling value for candidate 1 and a margin of error. It returns an adjusted polling value for candidate 1 after adding random noise.
func AddNoise(pollingValue, marginOfError float64) float64 {
    x := rand.NormFloat64() // random number from standard normal distribution
    x = x / 2               // x has ~95% chance of being between -1 and 1
    x = x * marginOfError   // x has 95% chance of being -marginOfError and marginOfError
    return x + pollingValue
}

Running our election simulator

We are now ready to run our Monte Carlo simulation. We revisit our main.go file, which reads in the electoral votes and polling data. The data directory contains three files, and we will begin our work by reading in the first file.

earlyPolls.csv: polls from summer 2016.
conventions.csv: polls from around the Republican and Democratic National Conventions in mid- and late July 2016.
debates.csv: polls from around the presidential debates, in late September through mid-October 2016.

package main 

import (
    "fmt"
)

func main() {
    fmt.Println("Simulating the 2016 US Presidential election.")

    electoralVoteFile := "data/electoralVotes.csv"
    pollFile := "data/earlyPolls.csv"

    electoralVotes := ReadElectoralVotes(electoralVoteFile)
    polls := ReadPollingData(pollFile)
}

We next set the number of trials to 1 million and the margin of error to 5%.

func main() {
    fmt.Println("Simulating the 2016 US Presidential election.")

    electoralVoteFile := "data/electoralVotes.csv"
    pollFile := "data/earlyPolls.csv"

    electoralVotes := ReadElectoralVotes(electoralVoteFile)
    polls := ReadPollingData(pollFile)

    numTrials := 1000000
    marginOfError := 0.05

    // to fill in
}

Now that all its inputs are set, we call SimulateMultipleElections() and store the resulting probabilities of each candidate winning (and the probability of a tie). We then print these probabilities to the console.

STOP: After completing main.go with the code below, we are now ready to run our code. In a terminal, navigate to our go/src/election directory. Compile all of the code in this directory using go build, and then execute the command ./election (Mac) or election.exe (Windows). What do you find? Is it what you expected?

func main() {
    fmt.Println("Simulating the 2016 US Presidential election.")

    electoralVoteFile := "data/electoralVotes.csv"
    pollFile := "data/earlyPolls.csv"

    electoralVotes := ReadElectoralVotes(electoralVoteFile)
    polls := ReadPollingData(pollFile)

    numTrials := 1000000
    marginOfError := 0.05

    probability1, probability2, probabilityTie := SimulateMultipleElections(polls, electoralVotes, numTrials, marginOfError)

    fmt.Println("Estimated probability of a candidate 1 win is", probability1)
    fmt.Println("Estimated probability of a candidate 2 win is", probability2)
    fmt.Println("Estimated probability of a tie is", probabilityTie)
}

Click 👇 Run to try it yourself!

When we run our code, we obtain a surprising result: Clinton wins 99.9% of the simulations!

Perhaps our simulation is too confident. Let us therefore increase the margin of error to 10%, which produces a very conservative simulation: even if a candidate could be polling at 60% in a state poll, this margin of error implies that there is still a 5% chance of the true polling value being either greater than 70% or less than 50%; that is, there is a 2.5% chance that the other candidate is actually leading.

func main() {
    fmt.Println("Simulating the 2016 US Presidential election.")

    electoralVoteFile := "data/electoralVotes.csv"
    pollFile := "data/earlyPolls.csv"

    electoralVotes := ReadElectoralVotes(electoralVoteFile)
    polls := ReadPollingData(pollFile)

    numTrials := 1000000
    marginOfError := 0.1

    probability1, probability2, probabilityTie := SimulateMultipleElections(polls, electoralVotes, numTrials, marginOfError)

    fmt.Println("Estimated probability of a candidate 1 win is", probability1)
    fmt.Println("Estimated probability of a candidate 2 win is", probability2)
    fmt.Println("Estimated probability of a tie is", probabilityTie)
}

Click 👇 Run to try it yourself!

Even with increasing the margin of error, however, Clinton’s dominance over the simulation is still pronounced, as she is leading 98.7% of simulations.

Perhaps Clinton simply had an early lead. To test this hypothesis, let us change the input to ReadPollingData() to "conventions.csv", and then compile and run our simulation again.

func main() {
    fmt.Println("Simulating the 2016 US Presidential election.")

    electoralVoteFile := "data/electoralVotes.csv"
    pollFile := "data/conventions.csv"

    electoralVotes := ReadElectoralVotes(electoralVoteFile)
    polls := ReadPollingData(pollFile)

    numTrials := 1000000
    marginOfError := 0.1

    probability1, probability2, probabilityTie := SimulateMultipleElections(polls, electoralVotes, numTrials, marginOfError)

    fmt.Println("Estimated probability of a candidate 1 win is", probability1)
    fmt.Println("Estimated probability of a candidate 2 win is", probability2)
    fmt.Println("Estimated probability of a tie is", probabilityTie)
}

Click 👇 Run to try it yourself!

Clinton’s lead has actually widened — she now wins 99.3% of the simulations!

STOP: Verify that the lead gets even wider when we change the input to ReadPollingData() to "debates.txt".

These simulations should give us pause. Even though we have what seems like a conservative simulation, we are predicting a Clinton victory very confidently, and our prediction of that victory is more confident than major media outlets, which predicted a Clinton victory in the 60-90 percent range. To understand why our simulation is so confident, we need to pass our work to an epilogue to reflect on the assumptions of our model and the inherent difficulties that are always present when trying to simulate an election from polls.

Check your work from the code along

We now provide autograders in the window below (or via a direct link) allowing you to check your work for the following functions:

AddNoise()
SimulateOneElection()
SimulateMultipleElections()

Next lesson