1. Introduction
Numerous models aim to forecast nationwide popular vote (PV) in the USA presidential elections. They derive it from economic data, foreign policy, ethnic composition and other factors: the book [4] and articles [2, 5, 7, 12, 13]. In the USA, uniquely among developed nations, the winner is decided by the Electoral College (EC), rather than PV; see [9].
The EC consists of 538 people, with each of the 50 states delegating the number equal to its House and Senate members combined, and Washington, District of Columbia (DC) delegating 3 members. (DC is not a state since it does not have representatives in the House and the Senate.) For each state and the DC, its members of the EC must vote for the winner of the popular vote in the state. (Exceptions are Maine and Nebraska; see Section 2.) The current twoparty system of Democrats and Republicans dates back to the Civil War age, and the thirdparty candidates have never won a presidential election; only on a few occasions, they won some EC votes. In case of winning 270 or more Electoral College votes the candidate is declared to be the president. If there is a tie: Each majorparty candidate wins exactly 269 votes, then the decision is deferred to the House of Representatives.
This peculiarity of election system in the USA moved to the forefront since 2000, which was the first time since 1888 when the winner of the Electoral College lost the popular vote (that is, got lower % of votes than the other majorparty candidate). Such possibility was discussed in the book [1], published in 1991. In 2016, the same situation repeated. Both times, the winners of the EC were Republicans. This stirred a debate in the media whether the Electoral College system currently has a builtin bias against Democrats. It is impossible to do full justice to this literature; let us mention as an example [6, 10, 11, 15, 18, 19, 21, 22].
Another feature of the USA electoral system is the division of states into ‘red’, ‘blue’, and ‘purple’. A ‘red’ state has a very high probability of being won by Republicans, so that its EC members will vote for Republicans by a large margin. A ‘blue’ state is the opposite: It is very likely to be won by Democrats, and the margin is likely to be large. ‘Purple’ states, otherwise known as ‘swing’ or ‘battleground’ states, have a significant probability of being won by either party. This classification is somewhat informal, and there are no generally accepted thresholds. In particular, political analysts often split states into more than three categories: ‘Solid Democrat’, ‘Likely Democrat’, ‘Lean Democrat’, ‘Swing’, ‘Lean Republican’, ‘Likely Republican’, and ‘Solid Republican’.
In the election season, American and international media often feature maps of the USA with states colored in various shades of red, blue, and purple, as election prediction. For example, California is considered a solidly blue state: It was won by Democrats in every presidential election since 1992, and the margin is now overwhelming (over 20%). Conversely, Texas is considered a solidly red state.
Cook Political Report www.cookpolitical.com created a somewhat more formal version of this classification: Cook Partisan Voting Index based on the last two President election, by comparing statewide vs nationwide election results. A popular web site 538 www.fivethirtyeight.com uses a similar version of this index. However, we would like to make this research more formal and statistically rigorous. We would like to use not only presidential, but other federal elections: House and Senate.
We classify states into solid R, lean R, swing, lean D, and solid D. We define the partisan lean of each state and model its time evolution, as well as correlation with the nationwide popular vote percentages. We use not only presidential elections (which happen every 4 years in the USA), but House and Senate elections (which happen every 2 years). We collect publicly available data from the House Clerk, Federal Election Commission (FEC) and Wikipedia, starting from 1992. See
[15, 16] on political evolution of states.For each of these elections: House, Senate, Presidential, take and , the numbers of votes by major parties, and compute the quantity for each state. Then regress it upon the nationwide
, and the year of election. We obtain both point estimates and Bayesian posterior, given a noninformative Jeffrey’s prior. Then we simulate 2020 and 2024 EC given four different scenarios of PV:

Even PV, ;

2016 PV, ;

2008 PV, ;

2004 PV, .
We remark that in 2024, the EC will be different. Texas and Florida are projected to gain 3 and 2 votes respectively; Arizona, Colorado, Montana, North Carolina, Oregon each are projected to gain 1 vote, New York state is projected to lose 2 votes, and Alabama, Illinois, Missouri, Minnesota, Ohio, Pennsylvania, Rhode Island, West Virginia will each lose 1 vote.
We also simulate past elections in 2012 and 2016, given even PV. This allows us to see whether the EC is biased towards Republicans (this is true if the win probability for Democrats is significantly less than 50%). This is actually true for 2012 and 2016, but not for 2020 and 2024.
Finally, we investigate which swing states are more important for winning, given each scenario. That is, which of them can be most easily swung given additional resources. (The answer is North Carolina, Pennsylvania, and Florida, for most scenarios.)
We stress that our research is not a prediction of 2020 elections; for this, we need to use polls, fundraising, endorsements, economic indicators, and other available data. Among many books on this topic, see [20].
1.1. Organization
In Section 2, we describe the data collection and organization, including some special data points which we had to modify for consistency. In Section 3, we introduce our linear regression model. Section 4 is devoted to fitting Bayesian linear regression for each state; and Section 5 contains simulations of the Electoral College and related discussions. Section 6 concludes the article and proposes future research.
1.2. Acknowledgements
I am thankful to my undergraduate students Jaucelyn Canfield and Franklin Fuchs for collecting the data and helping me code the program. I am also thankful to my undergraduate student Akram Reshad for pointing to me the data source: FEC, which made data collection much easier. I would also like to thank Professors Aleksey Kolpakov and Thomas Kozubowski for useful discussion and pointing relevant literature.
2. Election Data
2.1. Data description
We collect statewide % of popular vote for each of the two major parties, for each of the 50 states for each presidential, House, and Senate election since 1992.
A House election happens every two years. Each state is split into several congressional districts, for each of which a representative for the House is elected for two years. The state with the most districts is California (53). Several lowpopulated states (Alaska, Wyoming, and others) have only one congressional district: districtatlarge. In other states, districts are numbered: California 1, …, California 53; and similarly for other states.
For each state, we sum votes in all congressional districts of this state, for each major party. Then we divide these two numbers by the overall popular vote (for all major and minor parties together) in this state. The quantity of such congressional districts and their shape is determined by the population of the state relative to the USA. Each 10 years, after Census, the number and the shape of these districts are recalculated. In particular, after the coming 2020 Census, Texas is projected to gain 3 congressional districts.
Every state has two senators, which are elected statewide. Each senator has a sixyear term. All 100 senators in the Senate are split evenly into three socalled Classes: Class I, Class II, and Class III, which determines their election years. This means that each state has Senate elections every two out of three even years.
The President of the USA is elected every four years: 1992, 1996, 2000, etc. As described in the Introduction, each state is assigned EC members, equal in number to the total sum of House and Senate members. For the 48 states other than Maine and Nebraska, these EC members vote for the popular vote winner in this state.
There is an issue of faithless electors, which break this rule. However, so far there were only very few such electors, and this has not influenced the outcome. It is hard to model the behavior of potential faithless electors, and we shall not attempt this.
Maine and Nebraska use a hybrid system: They assign two EC votes to the statewide winner, and other EC votes to winners of congressional districts. This can lead to splits, for example 2008, Nebraska 2 vs the state of Nebraska; 2016, Maine 2 vs the state of Maine. However, the boundaries of the districts change with each redistricting, and thus we cannot compare the same district in different years. For the purposes of this article, we simply assume that Maine and Nebraska assign all EC votes to the statewide winner. This will introduce an error to our analysis, but it is of order 1 electoral vote, which is not much.
Finally, we have a benchmark: nationwide popular vote. For a Presidential election, this is selfexplanatory. For a House election, this is the sum of votes in all 435 congressional districts. A Senate election is not nationwide (as explained above, each state has Senate elections 2 out of 3 even years). Thus we use the nationwide House PV of the same year.
We do not model DC because it overwhelmingly voted Democratic in recent elections.
2.2. Data sources
The GitHub repository ElectoralCollegeVsPopularVote of the user asarantsev contains data in data.xlsx: sheet RawData contains numbers or % of votes and in each state in each election; sheet Logarithms contains ; sheet EC contains EC votes in each state for 2012–2020 (current) and 2024 (future), as well as state population in 2000 and 2010 Census, and the Cook Partisan Voting Index (we compare it with our own version of partisan lean index).
For 2018, we take the House and the Senate data from Wikipedia pages, which exist for every state, as well as for the whole nation. For 2000–2016, the data is taken from the FEC: https://transition.fec.gov/pubrec/electionresults.shtml in Excel:

2004, 2008, 2012, 2016: Tables 2, 6, 7;

2006, 2010, 2014: Tables 4, 5;

2002: Tables 2, 3;

2000: Sheets 2, 4, 5.
For 1992–1998, the data is taken from the House Clerk web page:
2.3. Special elections
There are some elections for which we cannot literally take available data, but have to modify it.

A toptwo primary system has two top votegetters in the primary advance to the general election, regardless of the party. Such system is used in Washington House and Senate elections since 2008, and in California House and Senate elections since 2014. In 2016 and 2018 California Senate elections, this led to both Democrats as general election candidates in Califorina. For these Senate races, we use the primary election, summing votes for all candidates from the two major parties.

For House races, sameparty runoffs happened, too, but for less than half of districts in each state, for each election. Since the general election has a much higher turnout than the primary election, we consider the general election in this case more representative. In this research, we set the following rule: We ignore a House election if in at least half of districts there was no candidate from either Democrats or Republicans.

Louisiana has a system similar to California and Washington: There, the November election takes the form of a jungle primary: All candidates run together, not separated by party. If no candidate gets , a runoff election is in December. We sum votes for all candidates from the two major parties in both the jungle primary and the runoff, if it happens, and divide these by the total vote.

We ignore Louisiana House elections for 1996, 1998, 2002, 2012, 2014, and 2016: In each of these years, in at least half of districts, runoff elections were oneparty.

We treat Bernie Sanders from Vermont and Angus King of Maine as Democrats. Bernie Sanders took part in House elections for Vermont atlarge district in 1992–2004 and in Senate elections for Vermont in 2006, 2012, and 2018. Angus King took part in Senate elections for Maine in 2012 and 2018. For each of these elections, we sum the votes of Bernie Sanders or Angus King and a Democrat in the same race, if such Democrat existed; and assign this percentage to Democrats.
There are some other elections which we have to ignore.

2016 House election in Vermont atlarge had Peter Welch, for both major parties: He is a Democrat, but won the Republican primary on writein votes.

2006 Senate election in Connecticut featured incumbent Joe Lieberman running (and winning) as an independent, because he lost the Democratic primary. Same applies to Lisa Murkowski in 2010, Alaska Senate race: She lost the Republican primary.

The following Senate elections did not have a Democratic: 2010, South Dakota; 2002, Virginia; 2002, Mississippi; 2002, Virginia; 2006, Indiana; 2000, Arizona; 2014, Alabama; 2004, Idaho; 2014, Kansas. In each case, there was no opposition, or other candidates were not ideologically similar to Democrats.

The following Senate elections did not have a Republican: 2002, Massachusetts; 2008, Arkansas. In each case, there was no opposition, or other candidates were not ideologically similar to Republicans.

The following House elections did not have a Republican for at least half of the districts in the state: 2008, Vermont atlarge; 2008, Arkansas 1, 2, 4; 2006, Rhode Island 2; 1996 and 1998, West Virginia 1, 3; Massachusetts, 2000–2008, 2014, 2016.

The following House elections did not have a Democratic for at least half of the districts in the state: 2016, Arkansas 1, 3, 4; 1998, Nevada 2; 2012, Kansas 1, 3; 2002, Nebraska 1, 3.
3. Regression Model
We denote election years 1992–2018 by , for convenience. We wish to make the latest election year (2018, as of this article) to be .
Let be the set of 50 states. Let be the set of all 35 House, Senate, and President elections. For each election , we denote its year by , and the party vote percentages nationwide by and . Recall that we use House nationwide popular vote percentages for the corresponding Senate election. For each state , we denote elections used in this state by . For a state and an election , we denote by and the percentage of the statewide vote for Democrats and Republicans for each election. The corresponding logarithms of ratios are denoted by
We consider the following linear regression:
(1) 
Here, is the current partisan lean, is elasticity of the state: its responsiveness to changes in the national environment (measured by ), and is partisan lean rate of increase or decrease. The two terms together make partisan lean of this state in election . We borrow terms from quantitative finance: A stock’s alpha is the excess return compared with the whole market, and its beta is its sensitivity to changes in the whole market. We find point estimates for for each state .
Next, we simulate national elections. We do predictions for years 2020 () and 2024 (), as well as backsimulations for years 2012 () and 2016 (). We fix nationwide PV percentages and , and compute . Democrats win state if
(2) 
which has probability , where is the CDF of :
Then we simulate each state, independently of others, and sum the corresponding EC votes of Democrats. Repeating this simulation many times, we get the distribution of EC votes. This, in turn, gives us the probability that Democrats win (getting more than 269 EC votes).
4. Parameter Estimates
4.1. Point estimates
For each state , estimates of from (1), number of elections, and current EC votes , are in Table 1, taken from output.csv.

The reddest state in 2018, measuring by , is Wyoming, .

The bluest state in 2018 is Hawaii, .

For 2020, the results above stay the same, measured by for .

The most neutral state (with closest to ) in 2018 is Nevada, .

The most rapidly blueing state is Vermont, .

The most rapidly reddening state is North Dakota, .

The state with least change rate is Massachusetts, .

The state which is most sensitive to the national environment is Alaska, .

The least sensitive state is Hawaii, .

The states with the highest and lowest are Delaware and Washington, respectively.
State  # data  

Alabama  0.778  0.864  0.057  0.208  9  29 
Alaska  0.480  2.334  0.032  0.508  3  30 
Arizona  0.196  1.205  0.018  0.193  11  30 
Arkansas  0.572  0.329  0.067  0.291  6  27 
California  0.562  0.664  0.038  0.119  55  31 
Colorado  0.115  1.282  0.029  0.179  9  30 
Connecticut  0.509  0.885  0.018  0.411  7  31 
Delaware  0.328  2.305  0.035  0.773  3  29 
Florida  0.215  1.677  0.010  0.283  29  31 
Georgia  0.315  0.983  0.013  0.142  16  30 
Hawaii  0.966  0.030  0.022  0.344  4  31 
Idaho  0.817  1.229  0.022  0.403  4  30 
Illinois  0.329  1.173  0.011  0.200  20  30 
Indiana  0.235  0.944  0.004  0.288  11  30 
Iowa  0.083  1.414  0.018  0.317  6  30 
Kansas  0.577  0.939  0.004  0.284  6  29 
Kentucky  0.576  1.061  0.039  0.209  8  30 
Louisiana  0.820  0.975  0.097  0.517  8  30 
Maine  0.159  0.574  0.003  0.522  4  30 
Maryland  0.599  0.675  0.022  0.208  10  31 
Massachusetts  1.045  0.212  0.016  0.585  11  29 
Michigan  0.128  0.788  0.001  0.167  16  30 
Minnesota  0.189  0.825  0.003  0.159  10  30 
Mississippi  0.440  0.621  0.024  0.372  6  29 
Missouri  0.242  0.660  0.009  0.316  10  31 
Montana  0.307  1.017  0.017  0.365  3  30 
Nebraska  0.702  1.854  0.008  0.480  5  30 
Nevada  0.007  0.795  0.007  0.239  6  30 
New Hampshire  0.039  1.54  0.023  0.215  4  30 
New Jersey  0.279  0.599  0.018  0.118  14  30 
New Mexico  0.367  0.862  0.034  0.295  5  30 
New York  0.746  0.736  0.039  0.208  29  31 
North Carolina  0.08  0.894  0.002  0.118  15  30 
North Dakota  0.516  0.797  0.068  0.493  3  31 
Ohio  0.209  1.555  0.008  0.146  18  31 
Oklahoma  0.912  1.120  0.049  0.194  7  30 
Oregon  0.368  0.344  0.015  0.236  7  30 
Pennsylvania  0.127  0.903  0.017  0.222  20  32 
Rhode Island  0.623  0.835  0.008  0.462  4  29 
South Carolina  0.403  1.243  0.009  0.205  9  31 
South Dakota  0.535  1.676  0.055  0.502  3  29 
Tennessee  0.595  0.977  0.043  0.218  11  30 
Texas  0.445  0.952  0.017  0.173  38  30 
Utah  0.745  0.596  0.02  0.248  6  31 
Vermont  0.836  0.656  0.043  0.565  3  29 
Virginia  0.003  1.363  0.016  0.214  13  29 
Washington  0.312  0.975  0.015  0.105  12  31 
West Virginia  0.543  0.5  0.138  0.576  5  30 
Wisconsin  0.038  1.036  0.001  0.218  10  31 
Wyoming  1.197  1.243  0.073  0.314  3  30 
State  

California  (0.486, 0.590)  (0.594, 1.062)  (0.027, 0.040) 
Nevada  (0.140, 0.139)  (0.069, 1.396)  (0.011, 0.025) 
Texas  (0.543, 0.333)  (0.469, 1.392)  (0.029, 0.003) 
90% Confidence Intervals for Regression Parameters
4.2. Size effect
Let be the Census population of the state in year 2000 and 2010, respectively. Pearson correlation coefficient and the for regression of vs are equal to and . For vs , these are and
. We see that small states have, on average, larger standard error.
4.3. Comparison with Cook Partisan Voting Index
The current state partisan lean is for the state . We did linear regression of the list of these 50 , vs Cook Partisan Voting Index (based on the last two presidential elections, 2012 and 2016). This dependence is very strong, with and Pearson correlation . However, we believe that is a better index, since it includes more data than from the two last presidential elections.
4.4. Deficiencies of the regression normal model
Each state has only few observations, at most 31 (14 House + 7 President + 10 Senate elections). Thus we cannot assume that these estimates of , , and for each state are very precise. In particular, confidence intervals are large. We computed them for confidence level 90%. They are in the file output.csv. As we see, they are large, particularly for .
Also, in output.csv we have values for ShapiroWilk normality test for regression residuals of each of the 50 states; and 21 of them have . Therefore the residuals are not normal (5% of 50 is 2.5, which is far less than 21).
4.5. Bayesian analysis
To account at least to some extent for these model deficiencies, we use Bayesian linear regression. For background, see the textbook [14]; we also found useful [3, Chapter 4]. Assume a noninformative prior for each state :
(3) 
The symbol stands for “proportional”. The term noninformative refers to the fact that we do not use any information (such as empirical mean) in this distribution (3). This prior is improper:
That is, the integral of this density over all possible values is infinite. This seems to be a contradiction in terms: A probability distribution must integrate (or sum, if it is a discrete distribution) to
. But we can still use this prior from (3) in Bayesian analysis.The next step is to compute the likelihood . This is a product of Gaussian densities. We get posterior distribution from the Bayes’ formula:
(4) 
From the choice of a prior in (3), the posterior distribution is already known explicitly, see for example [14, Chapter 9] or [3, (4.8)–(4.10)]. Let us introduce the following notation:

inverse distribution with degrees of freedom, scale parameter , and density (with the Gamma function):

, the number of elections for state ;

a matrix , where
is a vector of
unit numbers, is the vector of the quantities for each election valid for the state , and is the vector of the times of these elections.
Then the posterior distribution from (4), as known from [14, 3], is
(5) 
One can show that the posterior (unconditional) marginal distribution of is multivariate Student, with heavy tails. Simulated parameters for Nevada are given in Figure 2. One can compute 90% confidence intervals with the
quantile. These coincide with confidence intervals from the file
output.csv, given in Table 2 for three out of 50 states.5. Electoral College Simulations
We simulate each , then , for each of the 50 states . Then we simulate the nationwide result in the following 6 scenarios:

[label=()]

2020 election with even PV: ;

2020 election with 2016 PV: ;

2020 election with 2008 PV: ;

2020 election with 2004 PV: ;

2024 election with even PV;

2024 election with 2016 PV.
For each scenario, we repeat this simulation 40000 times. Probabilities of Democrats winning each state are in the file stateProb.csv. Histograms for EC votes are in Figure 3. Red vertical lines show 269 votes: winning threshold. For each scenario, we compute the probability for Democrats winning EC (by getting more than 269 EC votes): Area to the right of the red line.
Electoral maps for these 6 scenarios are given in Figure 4. These maps are created using mapchart.com. We classify states by (the probability of Democrats winning) in 5 categories, similarly to the New York Times and other news agencies:

: solid D, dark blue;

: lean D, light blue;

: swing states, green;

: lean R, yellow;

: solid R, red.
5.1. Evolution of the Electoral College bias
If we simulate the EC for even PV in 2012, 2016, 2020, 2024, then Democrats’ win probability is 42.4%, 46.2%, 50.2%, 49.9%, respectively. Thus the EC was biased in favor of Republicans in 2012, but much less so in 2016, and the bias will disappear in 2020 and 2024.
For previous elections: 2012 with actual 2012 PV, and 2016 with actual 2016 PV, we have Democrats’ EC win probabilities 88.5% and 73.6%, respectively. Thus we see that the Democrats’ lead in 2016 was less robust than in 2012. This is consistent with some elections predictions, see for example fivethirtyeight.com: 90.9% and 71.4%, respectively, for final election predictions on the Election Day morning in 2012 and 2016.
Let us discuss whether the EC bias (deviation from 50.0%) is due to statistical error. Assuming that the EC is, in fact, unbiased, the standard significance test for the fraction of simulations (out of ) in which Democrats win gives the interval for significance level . The results of 2012 and 2016 fall outside this interval. Thus in these scenarios, the EC is really biased towards Republicans.
5.2. Statebystate highlights
We summarize the table of statebystate win probabilities in these 6 scenarios from stateProb.csv; see also Electoral College maps in Figure 4.

The safest states for Democrats are California and New York, with win probability greater than 99.9%. Hawaii, Maryland, Washington are also very safe for Democrats, with probability greater than 99%.

The safest state for Republicans is Oklahoma, with win probability greater than 99.9%. Alabama and Idaho are also safe, with win probability greater than 99%.

Even though the media discussed the idea of turning Texas blue, see for example [17], Democrats win Texas in 2020 for even or 2008 PV only with probabilities 1.0% and 4.9%, respectively. Thus the path to EC win for Democrats does not lie through Texas. Instead, they should follow a traditional path through Midwest and Northeast, trying to get back the states that Republicans won in the 2016 presidential election. We discuss this below when we speak of state importance coefficients.

The states with closest to 50% win probability: Nevada for 2020 and 2024 election with even PV, Iowa for 2020 and 2024 election with PV as in 2016, New Hampshire and Arizona for 2020 election with 2004 and 2008 PV.

For the even nationwide PV, the states with the most likely D or R win in 2020 are not the same as the states with the strongest partisan lean measured by (2018, ) or (for 2020, ). This is due to the fact that standard errors are different from state to state, and these win probabilities depend on too.
5.3. Electoral College maps for various scenarios
Here we shall analyze the maps and states’ ratings in Figure 4. In particular, how states’ ratings change from 2020 election with even PV to other scenarios. The letters below correspond to subfigures in Figure 4.

[label=()]

Except Nevada, all swing states in the Northeast and Midwest: Virginia, Pennsylvania, Maine, New Hampshire, Iowa, Wisconsin. Although Ohio and Florida went D in 2008 and 2012, we rate Ohio solid R and Florida lean R. But Michigan and Wisconsin (two states which were crucial for 2016 election) are leaning D. We wish to see more than one election won by R in Michigan and Wisconsin to conclude they became swing states. Conversely, we rate Arizona, Montana, Mississippi, and the Dakotas only lean R, despite them voting R in all presidential elections since 2000.

Since 2016 PV was in favor of Democrats, the Electoral College map becomes more Demfavorable as compared to (A). Arizona, Florida, North Carolina change from lean R to swing, New Mexico changes from lean D to solid D, Pennsylvania and New Hampshire change from swing to lean D, Ohio changes from solid R to lean R.

Year 2008 was even more in favor of Democrats than 2016, thus the map in (C) shifts to Democrats compared to (B). Ohio, Indiana, Montana switch to swing from lean R; Colorado, Delaware, and Pennsylvania switch from lean D to solid D; Virginia and North Carolina switch from swing to lean D; South Carolina and Georgia switch from solid R to lean R.

Year 2004, on the contrary, was more favorable to Republicans. Thus the map becomes more Rleaning as compared to the map from (A): Colorado and Michigan become swing states, and West Virginia switches from lean R to solid R.

This map is similar to (A), but New Mexico is solid D instead of lean D, West Virginia is solid R instead of lean R, and North Carolina is swing instead of lean R.

This map is more favorable to Democrats than (E): Colorado and Delaware change to solid D, Ohio changes from solid R to lean R, Pennsylvania and New Hampshire switch from swing to lean D, and Florida switches from lean R to swing.
5.4. Pivotal states
Let us discuss on which states should the parties focus their resources, given expected nationwide PV. For the sake of convenience, we shall discuss this from the viewpoint of Democrats; but since elections are a zerosum game, the same analysis applies to Republicans. Assume Democrats can increase the of this state to . Then the probability of Democrats winning this state has changed
or, in other words, by
where the normal density is defined by
Thus the expected rate of gain in Electoral College votes per change is
where is the number of EC votes in state . This importance coefficient is calculated for each state , for all 6 scenarios in the file importance.csv. In Table 3, we list the three most pivotal states together with their importance coefficients.
Scenario  Rank 1  Rank 2  Rank 3 

2020 with even PV  NC (55)  PA (54)  FL (30) 
2020 with 2016 PV  NC (65)  PA (44)  FL (36) 
2020 with 2008 PV  NC (50)  OH (49)  FL (40) 
2020 with 2004 PV  PA (57)  NC (39)  VA (24) 
2024 with even PV  NC (64)  PA (49)  FL (31) 
2024 with 2016 PV  NC (70)  PA (38)  FL (37) 
6. Conclusion
We found the partisan lean of each state, its time dynamics, and its dependence of the national political environment, using Bayesian linear regression of statewide election PV vs election year and national PV. To test whether the EC is biased, we set the equal major party nationwide PV, and simulate the EC votes. In 2012 and 2016, the EC was biased towards Republicans, but in 2020 and 2024, it will be unbiased.
Our version of PVI (partisan voting index) almost coincides with Cook PVI, and our backcasts for Presidential elections in 2012 and 2016 closely match final predictions from FiveThirtyEight. Small states have significantly higher variance than large states.
Again, we emphasize that this model is very simple. We do not pretend to describe voting behavior and its change over time in a comprehensive way. To this end, we need polls, ethnic and income composition data, etc. Some references with research along these lines are cited in the Introduction.
Subsequent research might focus on more sophisticated models, which include ethnic and economic statewide data, or the power of incumbency. Other possible lines of research: (a) make sense of unusual elections which we disregarded in our analysis (see Section 2); (b) include state and local elections (county and city councils, governors, state legislatures, judges); (c) capture correlations between states with similar ethnic and economic patterns, such as Wisconsin and Michigan, or Nevada and Arizona; (d) study the size effect further (whether large state vote differently than small ones); (e) model Maine and Nebraska voting, since these states split their EC votes.
References
 [1] David W. Abbott, James C. Levine (1991). Wrong Winner: The Coming Debacle in the Electoral College. Praeger.
 [2] Scott J. Armstrong, Alfred G. Cuzán, Andreas Graefe, Randall J. Jones (2014). Accuracy of Combined Forecasts for the 2012 Presidential Elections: The PollyVote. Political Science & Politics 47 (2), 427–431.
 [3] Biliana S. Bagasheva, Frank J. Fabozzi, John S. J. Hsu, Svetlozar T. Rachev (2008). Bayesian Methods in Finance. Wiley.
 [4] James E. Campbell (2008). The American Campaign: U.S. Presidential Campaigns and the National Vote. Texas A&M University Press.
 [5] James E. Campbell (2016). The TrialHeat and SeatsinTrouble Forecasts of the 2016 Presidential and Congressional Elections. Political Science & Politics 49 (4), 664–668.
 [6] George C. Edwards III (2011). Why the Electoral College Is Bad for America. Yale University Press.
 [7] Robert S. Erikson, Christopher Wlezien (2016). Forecasting the Presidential Vote with Leading Economic Indicators and the Polls. Political Science & Politics 49 (4), 449–472.
 [8] Scott L. Feld, Bernard Grofman (2005). Thinking about the Political Impacts of the Electoral College. Public Choice 123 (1–2), 1–18.
 [9] Luis FuentesRohwer, GuyUriel Charles (2001). The Electoral College, The Right to Vote, and Our Federalism: A Comment on a Lasting Institution. Florida State University Law Review 29, 879–924.
 [10] Andrew Gelman, Jonathan Katz, Gary King (2004). Empirically Evaluating the Electoral College. In Rethinking the Vote: The Politics and Prospects of American Electoral Reform, 75–88. Oxford University Press.
 [11] Andrew Gelman, Jonathan N Katz, Gary King, Andrew C. Thomas (2012). Estimating Partisan Bias of the Electoral College Under Proposed Changes in Elector Apportionment. Statistics, Politics, and Policy, 4 (1), 1–13.
 [12] Andreas Graefe (2014). Accuracy of Vote Expectation Surveys in Forecasting Elections. Public Opinion Quarterly 78 (S1), 204–232.
 [13] Andreas Graefe (2018). Predicting Elections: Experts, Polls, and Fundamentals. Judgment and Decision Making 13 (4), 334–344.

[14]
Peter D. Hoff (2009).
A First Course in Bayesian Statistical Methods.
Springer.  [15] Jason R. Jurjevich, David A. Plane (2012). Voters on the Move: The Political Effectiveness of Migration and its Effects on State Partisan Composition. Political Geography 31 (7), 429–443.
 [16] Richard L. Morrill, Gerald R. Webster (2015). Spatial and Political Realignment of the U.S. Electorate, 1988–2012. Political Geography 48, 93–107.
 [17] Mary Beth Rogers (2016). Turning Texas Blue. St. Martin’s Press.
 [18] Tara Ross (2012). Enlightened Democracy: The Case for the Electoral College. Colonial Press.
 [19] Alan Siaroff (2001). 1876, 1916 and now 2000: Decisive Small State Bias in the US Electoral College. Representation 38 (2), 131–139
 [20] Nate Silver (2015). The Signal and the Noise. Penguin Press.
 [21] Barney Warf (2009). The U.S. Electoral College and Spatial Biases in Voter Power. Annals of the Association of American Geographers 99 (1), 184–204.
 [22] Joshua Zingher (2016). The Relationship Between Bias and Swing Ratio in the Electoral College and the Outcome of Presidential Elections. Journal of Elections, Public Opinion and Parties 26 (2), 232–252.
Comments
There are no comments yet.