- (1) Christopher Mele and Annie Correal. “‘Not our president’: Protests spread after Donald Trump’s election,” New York Times; nytimes.com/2016/11/10/us/trump-election-protests.html.
- (2) Matt Motyl. “If he wins, I’m moving to Canada”: Ideological migration threats following the 2012 us presidential election. Analyses of Social Issues and Public Policy, 14(1):123–136, 2014.
- (3) Harold Hotelling. Stability in competition. The Economic Journal, 39(153):41–57, 1929.
- (4) Duncan Black. On the rationale of group decision-making. Journal of Political Economy, 56(1):23–34, 1948.
- (5) Anthony Downs. An economic theory of political action in a democracy. Journal of Political Economy, 65(2):135–150, 1957.
- (6) George Rabinowitz and Stuart E Macdonald. A directional theory of issue voting. American Political Science Review, 83(1):93–121, 1989.
- (7) Stefan Napel and Mika Widgrén. Power measurement as sensitivity analysis: a unified approach. Journal of Theoretical Politics, 16(4):517–538, 2004.
Guillermo Owen and Lloyd S Shapley.
Optimal location of candidates in ideological space.
International Journal of Game Theory, 18(3):339–356, 1989.
- (9) Peter Coughlin and Shmuel Nitzan. Electoral outcomes with probabilistic voting and Nash social welfare maxima. Journal of Public Economics, 15(1):113–121, 1981.
- (10) James M Enelow and Melvin J Hinich. A general probabilistic spatial theory of elections. Public Choice, 61(2):101–113, 1989.
- (11) Jeffrey S Banks and John Duggan. Probabilistic voting in the spatial model of elections: The theory of office-motivated candidates. In D. Austen-Smith and J. Duggan, editors, Social Choice and Strategic Decisions, pages 15–56. Springer, 2005.
- (12) Peter J Coughlin. Probabilistic Voting Theory. Cambridge University Press, 1992.
- (13) Melvin J Hinich. Some evidence on non-voting models in the spatial theory of electoral competition. Public Choice, 33(2):83–102, 1978.
- (14) Priscilla L Southwell. The politics of alienation: Nonvoting and support for third-party candidates among 18–30-year-olds. The Social Science Journal, 40(1):99–107, 2003.
- (15) James Adams, Jay Dow, and Samuel Merrill. The political consequences of alienation-based and indifference-based voter abstention: Applications to presidential elections. Political Behavior, 28(1):65–86, 2006.
- (16) Leo P Kadanoff. More is the same; phase transitions and mean field theories. Journal of Statistical Physics, 137(5-6):777, 2009.
- (17) Robert M May, Simon A Levin, and George Sugihara. Complex systems: Ecology for bankers. Nature, 451(7181):893, 2008.
- (18) Marten Scheffer, Jordi Bascompte, William A Brock, Victor Brovkin, Stephen R Carpenter, Vasilis Dakos, Hermann Held, Egbert H Van Nes, Max Rietkerk, and George Sugihara. Early-warning signals for critical transitions. Nature, 461(7260):53, 2009.
- (19) Thierry Mora and William Bialek. Are biological systems poised at criticality? Journal of Statistical Physics, 144(2):268–302, 2011.
- (20) Jean-Philippe Bouchaud. Crises and collective socio-economic phenomena: simple models and challenges. Journal of Statistical Physics, 151(3-4):567–606, 2013.
- (21) “Political polarization in the american public,” Pew Research Center, Washington, D.C.; people-press.org/2014/06/12/political-polarization-in-the-american-public/.
- (22) Elisabeth R Gerber and Jeffrey B Lewis. Beyond the median: Voter preferences, district heterogeneity, and political representation. Journal of Political Economy, 112(6):1364–1383, 2004.
- (23) Soren Jordan, Clayton M Webb, and B. Dan Wood. The president, polarization and the party platforms. The Forum, 12(1):169–189, 2014.
- (24) Keith T Poole and Howard Rosenthal. A spatial model for legislative roll call analysis. American Journal of Political Science, 29(2):357–384, 1985.
- (25) Mehran Kardar. Statistical Physics of Fields. Cambridge University Press, 2007.
- (26) James M Enelow and Melvin J Hinich. The Spatial Theory of Voting: An Introduction. Cambridge University Press, 1984.
- (27) Dan S Felsenthal and Moshé Machover. A priori voting power: what is it all about? Political Studies Review, 2(1):1–23, 2004.
- (28) Lionel S Penrose. The elementary statistics of majority voting. Journal of the Royal Statistical Society, 109(1):53–57, 1946.
- (29) Lloyd S Shapley and Martin Shubik. A method for evaluating the distribution of power in a committee system. The American Political Science Review, 48(3):787–792, 1954.
- (30) Andrew Gelman, Jonathan N Katz, and Francis Tuerlinckx. The mathematics and statistics of voting power. Statistical Science, 17(4):420–435, 2002.
- (31) Stefano Benati and Giuseppe V Marzetti. Probabilistic spatial power indexes. Social Choice and Welfare, 40(2):391–410, 2013.
s1.1 Distributions and functional derivatives
In this section, we give a brief introduction to the mathematics behind Dirac delta functions and functional derivatives. The Dirac delta function is not technically a function, but is rather a generalized function, also known as a distribution. A distribution may not be well-defined if evaluated at a particular point (e.g. is not defined for ), but is instead defined through the integral of its product with ordinary functions.111Technically, a distribution is a map from the set of smooth functions with compact support to . For instance, the Dirac delta function is defined by eq. 13:
for all continuous functions . We note that can be approximated by an arbitrarily narrow gaussian, which has a total area under its curve of . (The more narrow the gaussian, the taller it must be, so that the product of its height and width remains constant.) Formally, for small , in the sense that
for continuous .
For a set of voters with opinions , the distribution of voter opinions is , which is the only distribution satisfying the property that is the number of voters with opinions in the interval . However, it is often useful to choose to be a smooth function that approximates , in the sense that the difference between and the number of voters with opinions in the interval is no greater than for all . One way to achieve this smoothing is to replace the delta functions by their gaussian approximations (see fig. S1). For large enough , the error of up to voter opinion will generally not be significant. Whether or not is chosen to be smooth does not matter for the results of the text, although for the results that rely on the assumption that the number of voters is large, the mathematics are simpler if is assumed to be a function rather than a distribution. For instance, the expression for representation in the case of median voting involves evaluating at its median, an operation which is not well-defined if is a sum of Dirac delta functions.
A functional is a map from a space of functions or distributions to . In analogy to an ordinary derivative, we can define the functional derivative as a function of that satisfies the following equation for all :
Just as the ordinary derivative of a function will in general depend on , the functional will in general depend on . The expression refers to the value obtained when the functional derivative of is evaluated at , just as refers to the value obtained when the derivative of is evaluated at . By substituting into eq. 15, a simpler formulation for the functional derivative can be obtained:
Thus, we can express a functional derivative in terms of ordinary derivatives: if, for a particular and , we define , then . This function is shown in fig. S2.
s1.2 Translational invariance
In the main text, we make the assumption of translational invariance (eq. 1). Technically, translational invariance is defined only in relation to a particular metric. Thus, the assumption of translational invariance can be relaxed without invalidating the paper’s results. For the proof that negative representation implies instability, all that is required is that the election be continuous (rather than invariant) under translations or, mathematically, that for , be continuous in (note that this property is independent of the metric). The proof given in the manuscript then follows in the limit . For the proof that the total representation sums to (eq. 5), there need only exist some metric on the opinion space under which the election is translationally invariant. The total representation will then equal assuming the election is translationally invariant under the metric used to define representation.
s1.3 Representation in the large-population limit
In this section, we derive properties of our representation measure when the number of voters is large. In the limit of a large population (), the change in the opinion distribution arising from an individual opinion will be small compared to the opinion distribution as a whole, and so we expand to first order in :
Note that eq. 17 does not apply to cases in which is not differentiable, e.g. when the election is unstable and small changes in the opinion distribution can have an outsized impact; thus we do not use results derived from these equations when analyzing instability.
We now derive eq. 4. Note that when an individual opinion changes from to , the opinion distribution changes by
By the fundamental theorem of calculus, we see from eq. 19 that is the average of over .
We now prove eq. 5 (). For small , , and thus . Since has compact support, which follows from being an approximation of the opinions of a finite number of voters, integrating by parts yields , which, combined with eq. 1, yields eq. 5. This proof assumes that is differentiable for illustrative purposes, but the result will generally hold whenever is well-defined, if is treated as a distribution (generalized function).
s1.4 Nash equilibria of the electoral game
Consider a two-candidate election with endogenous candidacy: candidate positions (or, equivalently, candidates) are chosen in order to maximize the probability of victory. In this framework, the winner of the election will have adopted an unbeatable position , provided such a position exists (i.e. a candidate with position will have at least a 50% chance of winning against a candidate with any other position). Formally, is a Nash equilibrium, since no candidate can improve her chances by changing her position. Because the voting game is symmetric, if is a Nash equilibrium, then so are and ; see, for instance, section 2.3 of Coughlin1981 . Thus, if there is a unique Nash equilibrium, it must be of the form .
s1.5 Utility difference model examples
Here, we give examples of the utility difference model given by eq. 8 describing median voting, mean voting, and an election between median and mean voting. For where
is a constant, the mean opinion is selected, since for a random variable, is minimized for . (Note that and the support of must be confined to an interval of length at most so that no probabilities are greater than 1.) For (where again, and the support of must be confined to an interval of length at most ), the median opinion is selected (since the median minimizes ), although by a different mechanism than the deterministic voting assumptions of the Median Voter Theorem. Both of these functions can be viewed as limiting cases of the hyperbolic , with approximating median voting and approximating mean voting. Under mean voting, for a voter either to the right of both candidates or to the left of both candidates, the farther away this voter is, the stronger the voter’s preference between the two candidates. Under median voting, the strength of this voter’s preference for one candidate over the other is independent of how far the voter is from both candidates. For the intermediate case, the strength of this voter’s preference gets stronger up to a point and then levels off as the voter moves farther away from both candidates. However, in actual elections, voters with opinions that are very far from both candidates are more likely to abstain from voting (or vote for a third-party candidate), which is why eq. 9 may be more realistic.
s1.6 Representation in the utility difference model
In this section, we calculate representation for the utility difference model given by eq. 8. We derive the first part of eq. 10, and we then calculate representation for the examples given in section S1.5. To do so, we must assume there is a single possible election outcome (see section S1.4). Then, from eq. 8,
Note that eq. 21 satisfies for any positive constant (scale invariance), and let where is the size of the electorate, so that . Considering the change that arises from the addition of a single individual with opinion to the population, we define by
Because is differentiable in and has a single maximum in ,
Noting that the denominator is independent of , of order (i.e. independent of ), and negative (otherwise, would be a minimum rather than a maximum),
So, using eq. 4,
If , as it must be for some function if the election is translationally invariant as in eq. 1, then
Eq. 28 provides a direct link between voter preferences and the representation of opinions. (If needed, the constant of proportionality can be determined through eq. 5.) For and , we can quickly derive the results for median and mean voting, respectively. For , which yields an outcome between that of median and mean voting,
resulting in the representation of opinions being concentrated around the election outcome, but not infinitely concentrated as it is for median voting. For that are not concave, there will exist some such that , and representation will be negative for those opinions (see eq. 28).
s1.7 Instability in the utility difference model
Here, we explore the conditions under which instability can arise in the model described by eq. 8, and we elaborate on the concrete model of instability with outcomes given by eq. 12. For the model given by eq. 8, is shown to be well-defined as long as is single-valued, i.e. eq. 8 has a single maximum (section S1.6). Thus, instability can occur only when there are multiple maxima.222For a single maximum with , the functional derivative of is not defined, but, as can be shown in a higher order analysis, there is no instability. In particular, for defined by eq. 23, we can derive
The existence of multiple maxima in implies that is not concave (ignoring the degenerate case in which is constant over some interval). Thus, instability can arise only in the case of non-concave , which is precisely the same condition under which negative representation occurs. That instability can arise only in the presence of negative representation should not surprise us, since it was proven under more general conditions in the main text. For this class of models, we also find that negative representation implies that there exist distributions of opinions for which instability arises, i.e. if is not concave, then there exists an such that eq. 8 has multiple maxima. To see why this is true, consider an opinion distribution such that . Then, if there is a single maximum of eq. 8, it must lie at . In order for to be a maximum, we must have (where the equality follows from the symmetry of and ). But since is twice continuously differentiable and not concave, there exists an such that and . For such an , is not a maximum, thus contradicting our assumption that there was a unique maximum.
To provide an example of how, for non-concave , the election is unstable for certain opinion distributions, we consider the used for the example of negative representation: for some positive constant (eq. 9
). We take the distribution of voter opinions to be a sum of two (potentially unequally weighted) normal distributions of equal variance:
Then, from eq. 8, the outcome of the election is then given by
Without loss of generality, we can assume that . Defining the normalized election outcome , we solve eq. 31 to get the following condition:
where and . For (i.e. for equally sized subpopulations), , and the election is stable with for and unstable for . In the unstable regime, there are two possible outcomes described by , and an arbitrarily small change in can cause to swing between its positive and negative value. In this regime, the majority of one of the subpopulations will be negatively represented: for , over half of the subpopulation centered at will have opinions with ( in the unstable regime), and, from eq. 10, representation is negative for these opinions. Likewise, for in the unstable regime, over half of the subpopulation centered at will be negatively represented.
s1.8 Connection with the mean-field Ising model
We map the voting model that gives rise to eq. 12 (described in detail in section S1.7) onto a mean-field Ising model Kadanoff2009 . To begin constructing this map, note that the left-hand side of eq. 31 gives the limiting value of as when
is drawn from a probability distribution corresponding to the partition function444If the states of a physical system described by the variables have a ratio of energy to temperature of , the probability density of the system being in any one configuration is given by where the normalization is known as the partition function and is given by
For large , having voters with deterministic opinions distributed according to is equivalent to having voters with probabilistic i.i.d. opinions with PDF . We can therefore view the partition function as describing the voting population, with each voter interacting with the election outcome .
Because is a gaussian random variable, we integrate over , which yields, up to a multiplicative constant,
This equation describes interacting probabilistic “spins,” with each spin weighted by the opinion distribution , with an energy penalty proportional to its mean-square distance from all of the other spins. For given by eq. 11, the behavior is exactly that of a mean-field Ising model (with an external magnetic field for ); in general, for bimodal symmetric we expect a phase transition in which the system will spontaneously break the symmetry between the peaks as the peaks move farther apart. In the stable/disordered phase, both of the peaks of are sampled by the “spins;” in the unstable/ordered phase, however, only one of the two peaks is sampled, and therefore the other peak is not represented. Despite the fact that each individual votes independently from everyone else, voters are coupled through their collectively choosing a candidate, which, as shown above, is equivalent to each voter being coupled to every other voter. In the limit of weak interactions, i.e. , we recover mean voting, since in this limit, . (Quadratic utility functions yield mean voting—see section S1.5.) Thus, mean voting is a way of “independently” aggregating opinions.
For a general (since , we can always write in this form with ), we note that eq. 8 yields an election outcome equivalent to the limit of as with drawn from
For quadratic , we saw above that we could exactly integrate over the election outcome to yield pairwise quadratic interactions between the variables, but for general , such an integration will yield many interaction terms of higher than quadratic order between these . Although such integration cannot be carried out precisely, we expect this interacting system to undergo a phase transition for bimodal if the expansion of produces sufficiently strong interactions. Thus, the system’s behavior should be similar to the exactly solvable case in which is quadratic.
s1.9 Empirical data
To determine if the stability of U.S. presidential elections has changed over time, we used data from Jordan et al. Jordan2014 on the polarization in the party platforms. Jordan et al.
used a combination of machine learning and human judgment to determine which of the frequently used words in the party platforms were polarizing and then determined the number of polarizing words (classified by political issue dimensions such as economic, foreign, etc.) in the Republican and Democratic platforms from 1944 to 2012. From this data, we calculated the total number of polarizing words as a percentage of all words in the platforms. We chose the percentage of polarizing words in the party platforms as a measure of political polarization over other measures of ideology—such as NOMINATE scoresPoole1985 , which measure ideological purity based on agreement with other politicians—because we wanted an external, content-based measure of divergence in opinion rather than a measure of ideology that depends only on the positions that politicians take relative to one another. To construct fig. 4, we plotted by year the fraction of polarizing words in the Democratic platforms and the negative of the fraction of polarizing words in the Republican platforms. To correct for any time-independent bias affecting the number of polarizing words in the party platforms—for instance, which words Jordan et al. designated as polarizing—we subtracted from the data for each party separately the fraction of polarizing words from the year of least polarization for that party (which was 1948 for both parties). Thus, the data shown are the changes in the fraction of polarizing words relative to their baseline value (0.0258 for Democratic platforms and 0.0693 for Republican platforms).
As was noted in the manuscript and explained in section S1.8, our voting model (fig. 3) is equivalent to a mean-field Ising model. Real-world elections are unlikely to follow this model exactly, and even if they did, there is no reason to believe that there would be a simple relationship between polarization (for which is a dimensionless measure) and time. Nonetheless, if U.S. presidential elections underwent a phase transition in the same universality class as the mean-field Ising model, then in the vicinity of the phase transition, polarization would increase in proportion to the square root of the time from the transition, regardless of the precise relationship between time and polarization. In much the same way, magnetization increases near a ferromagnetic phase transition in proportion to , where is temperature, is the temperature at which the phase transition occurs, and is known as a critical exponent, which depends only on the universality class to which the phase transition belongs Kardar2007 . Inspired by this universality, we fit the polarization of both parties to the piecewise function
where is the year and is the fraction of polarizing words in that year’s platform relative to the baseline value (see above), and and are free parameters, corresponding to the amplitude of the polarization and the year that it begins, respectively. We found that and minimize the total sum of square errors, yielding values of for the Democratic party and for the Republican party. If the two parties are considered together, .
S2 Supplementary Text
s2.1 Multidimensional opinion space
For the sake of simplicity, this paper focuses on systems with a one-dimensional opinion space, but the concepts developed in this paper can naturally be extended to a multidimensional opinion space, where the opinions of the electorate and candidates lie in , as in Coughlin1981 ; Enelow1989 ; Banks2005 ; Enelow1984 . This extension will be briefly outlined here. The definition of representation is generalized by replacing eq. 2 with
For a scalar measure, we use
When there exists an such that
for all , this can be used in place of eq. 3 as a representation independent of . In the large-population limit, eq. 19 is then replaced (using Einstein-summation notation) by the path-independent integral
where (which satisfies eq. 39) is a matrix defined by
The differential representation in a direction given by the unit vectoris then given by , which yields the same results as eq. 7 of Napel2004 in the limit of a continuum of voters. The trace gives a rotationally invariant scalar measure.
The representation normalization condition corresponding to eq. 5 is where the integral is taken over .
In the multidimensional case, instability also implies a failure in representation. Analagously to eq. 6, instability is characterized by
Generally, instability implies either that , in which case negative representation (defined by ) follows in the same way as the one-dimensional case, or that an infinitesimal change in opinion causes a finite orthogonal change in the outcome of the election. In this case, by considering further infinitesimal changes in opinion parallel to the first change in election outcome, and assuming that the magnitude of the change in the outcome of the election cannot grow without bound, one either gets negative representation directly ( for some )—or , from which negative representation follows as it does in the one-dimensional case.
s2.2 The Owen-Shapley index as a special case
In this section we show that there exist functionals for which our representation measure (eq. 4) reproduces the values of both the deterministic and probabilistic Owen-Shapley voting power indices. Thus, these voting power indices can be thought of as special cases of our measure. We give a brief background on the voting power literature and then consider the case of a one-dimensional opinion space, followed by a generalization to the case of a multidimensional opinion space for which the Owen-Shapley index was primarily designed.
When nothing is known about the preferences of voters, their political power has traditionally been measured by a priori voting power Felsenthal2004 , which reflects the probability that a given individual or entity will cast the deciding vote and is usually measured by either the Penrose index Penrose1946 or the Shapley-Shubik index Shapley1954 . More precisely, to calculate a voter’s a priori voting power, consider a random division of the rest of the voters into two camps. Then the probability that the excluded voter will get his way regardless of which camp he joins is his voting power; the Penrose index and the Shapley-Shubik index differ only in the way in which they randomly choose a division. While the Penrose index assumes that each voter randomly chooses one side or the other, the Shapley-Shubik index re-weights the probabilities so that each ordering of voters is equally likely.555There are some fundamental differences in the motivation behind the indicesFelsenthal2004 , but mathematically, they are rather similar, though neither is without drawbacks: while the assumptions behind the Penrose index are simpler, in general the sum of all voters’ Penrose indices will not equal , while the sum of all voters’ Shapley-Shubik indices will. These indices provide useful and counterintuitive results when the voters possess differing numbers of votes, as in the European Union. For instance, under the 1958 voting rules for the European Economic Community, Luxembourg, despite having had one vote, had no voting power, since there were no possible divisions of the other five countries such that Luxembourg’s vote would be decisive Gelman2002 . But in elections in which each voter has one vote, all measures of a priori voting power result in each voter having an equal amount of power. A measure of voting power that takes voter preferences into account is needed to determine how various opinions are differently represented. Many preference-based measures have been proposed—for instance, the spatial Shapley-Shubik index, also known as the Owen-Shapley index Owen1989 —but, as we will see, these measures implicitly assume that people vote in a particular way.
In one dimension, the deterministic Owen-Shapley index Owen1989 allows for only two possible orderings of the voters (left to right or right to left), for which the median voter is pivotal in both, thus yielding the same concentration of power that our representation measure (eq. 3) yields in the case of median voting. Benati and Marzetti Benati2013 note that this extreme concentration of power is due to the deterministic nature of the Owen-Shapley model, which assigns zero probability to almost all orderings. They propose a generalized election model in which voters’ opinions have both a deterministic and a random component. In the one-dimensional case, their treatment is equivalent to denoting the probabilistic opinion of voter by
where the are deterministic and the
are independent random variables with a continuous probability density function. Denoting the distribution of the over the population by (note that will be a sum of delta functions for a finite population) and choosing an election in which people vote for the candidate closest to their probabilistic opinions ,666This election takes the form of the Random Utility Model mentioned in section 2.2 of Banks2005 . the Nash equilibrium for the two candidates’ opinions is the median of the distribution , i.e.
The continuity of follows from that of , and we have made the additional assumption that ; otherwise, there is no unique Nash equilibrium.
We then calculate
which yields the representation measure (eq. 4) for opinion :
The rightmost side of eq. 46 is the probability that voter is the median—i.e. pivotal—voter; thus, is equal to the generalized Shapley-Shubik index (SSI) for voter .
The Owen-Shapley index was developed primarily for multidimensional opinion spaces in . Owen and Shapley Owen1989 consider a randomly drawn unit vector and then order individuals by defining if . (The are deterministic but can easily be modified to be partially probabilistic as in Benati2013 .) The SSI of is again the probability that is the median of the resulting ordering. To see how this power index is a special case of our multidimensional representation measure (eq. 41), consider the following method of choosing a candidate, given a set of voters with (potentially probabilistic) opinions :
1) Randomly choose an orthonormal basis for .
2) Let be the median of the probability distribution function for , where is randomly drawn from the voter opinions .
3) The election outcome is then given by .
Note that is now a random variable, and so the right-hand side of eq. 41 must be replaced by its expectation value, i.e. . From eq. 46 (with , , and substituting for , , and ), is equal to the probability that will be the median voter along . Therefore, is equal to the expected number of basis vectors along which will be the median, and so is equivalent to times the SSI.777Note that had we treated the election as deterministic by defining the outcome by and then defined representation by , we would have arrived at a different answer, since the probability distribution over which the expected value is taken depends on .
The agreement between and the SSI follows from the fact that measures the probability that is the median voter along for this class of voting models. In this sense, the Owen-Shapley index of power implicitly assumes an election in which some sort of median is chosen. This model is appropriate when voters vote deterministically (although the options presented for them to vote on may be random). But such deterministic voting assumes that voters distinguish between very small differences in policy with 100% certainty, and it also assumes that there is no chance that a voter abstains. While these assumptions may hold for assemblies of elected officials (and in particular to the EU, where these measures are most commonly applied), they tend to fail for mass elections, in which a voter may sometimes vote for the candidate farther from her opinion and sometimes may choose not to vote at all.