The U.S. presidential election and the dreaded electoral college
Polls are inherently volatile. Because we are only able to sample a tiny percentage of the voting population, we can only estimate the true percentage of voters favoring a candidate. For this reason, most polls include a margin of error, or a percentage value on either side of the polling percentage. For example, a poll may indicate 51% support for one candidate with a 3% margin of error, in which case the poll projects with some degree of confidence (say, 95% confidence) that the true percentage supporting that candidate is between 48% and 54%.
We will take a moment to review how the U.S. presidential election works. Each of the 50 states, along with the District of Columbia (D.C.), receives two fixed electoral votes and a number of additional of electoral votes that are directly related to their population. With two exceptions (Maine and Nebraska), each state is winner take all; the candidate who receives the most votes in a state earns all of the electoral votes in that state. There are 538 total votes (hence FiveThirtyEight’s name), and so a candidate must win 270 votes to win the presidency.
The actual election process is more complicated, since electoral votes are used to “elect” party members from the winning candidate’s party. This is by design — the Electoral College was created to provide a final barrier to prevent Americans from electing a highly undesirable candidate. In 2016, a record seven of the 538 electors defected (five Democrats and two Republicans), all of whom voting for someone who was not running in the election.
Monte Carlo simulation
Because the presidential election is determined by the Electoral College, it makes sense to use state polling data to forecast the election. Our plan is therefore to simulate the election state-by-state many times, determining who wins the most Electoral College votes each time (or whether there was a tie). At the end, we can estimate our confidence in a candidate winning the election as the fraction of all simulations that candidate wins (see figure below).
A key point is that we must allow for some variation in our simulations by incorporating randomness, or else the simulations will all turn out the same. The principle of using a large number of randomized simulations to provide an estimate of a desired result is called Monte Carlo simulation.
Monte Carlo simulation is ubiquitous in practical applications. When your phone tells you that there is a 40% chance of rain at 3 PM, it is because meteorologists have run many simulations and found that it is raining at 3 PM in approximately 40% of these simulations. If you want to gain an edge in a daily fantasy sports contest, you might run many Monte Carlo simulations based on previous performance to identify a strong lineup of players. And for another political example, if you wanted to prove that legislative districts have been drawn unfairly, then you could draw many political maps randomly and compare how many districts favor one party in the simulations and in reality, an argument made by a Carnegie Mellon professor that led to a 2018 redistricting of Pennsylvania (see figures below).
In this chapter, our goal is to apply Monte Carlo simulation to implement and interpret our own election forecast algorithm. But before we get ahead of ourselves, let’s learn a bit more about Monte Carlo simulation by applying it to a simpler example.
An example of Monte Carlo simulation: estimating the house edge of craps
ComputeCrapsHouseEdge(numTrials) count ← 0 for numTrials total trials outcome ← PlayCrapsOnce() if outcome = true count ← count + 1 else count ← count − 1 return count/numTrials
PlayCrapsOnce() firstRoll ← SumTwoDice() if firstRoll = 2, 3, or 12 return false (player loses) else if firstRoll = 7 or 11 return true (player wins) else while true newRoll ← SumTwoDice() if newRoll = firstRoll return true else if newRoll = 7 return false