Sensitivity of collective outcomes identifies pivotal components

09/23/2019
by   Edward D. Lee, et al.
0

A social system is susceptible to perturbation when its collective properties depend sensitively on a few, pivotal components. Using the information geometry of minimal models from statistical physics, we develop an approach to identify pivotal components to which coarse-grained, or aggregate, properties are sensitive. As an example we introduce our approach on a reduced toy model with a median voter who always votes in the majority. With this example, we construct the Fisher information matrix with respect to the distribution of majority-minority divisions and study features of the matrix that pinpoint the unique role of the median. More generally, these features identify pivotal blocs that precisely determine collective outcomes generated by a complex network of interactions. Applying our approach to data sets from political voting, finance, and Twitter, we find remarkable variety from systems dominated by a median-like component (e.g., California State Assembly) to those without any single special component (e.g., Alaskan Supreme Court). Other systems (e.g., S&P sector indices) show varying levels of heterogeneity in between these extremes. By providing insight into such sensitivity, our information-geometric approach presents a quantitative framework for considering how nominees might change a judicial bench, serve as a measure of notable temporal variation in financial indices, or help analyze the robustness of institutions to targeted perturbation.

READ FULL TEXT VIEW PDF

Authors

page 1

page 2

page 4

page 8

03/09/2018

The Greek Parliament of 2012: Coalition Formations & Power Indices in Context

The twin elections of 2012 in Greece had a manifest impact on the politi...
10/27/2018

Sensitivity indices for output on a Riemannian manifold

In the context of computer code experiments, sensitivity analysis of a c...
03/09/2018

The Greek Parliament of 2012: Coalitions, Incentives & Power Indices in Context

In this retrospective study, we revisit the twin Greek Parliamentary ele...
11/19/2019

Majority dynamics and the median process: connections, convergence and some new conjectures

We consider the median dynamics process in general graphs. In this model...
10/03/2019

Constant State of Change: Engagement Inequality in Temporal Dynamic Networks

The temporal changes in complex systems of interactions have excited the...
11/10/2017

Logics for modelling collective attitudes

We introduce a number of logics to reason about collective propositional...
07/11/2017

Detecting Policy Preferences and Dynamics in the UN General Debate with Neural Word Embeddings

Foreign policy analysis has been struggling to find ways to measure poli...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Median Voter Model (MVM)

The role of the median derives from the fact that in a majority-rule voting system, the voting outcome depends on a coarse-graining instead of the detailed nature of every individual’s vote. The margin by which the majority wins, as is captured in the probability that voters of the system are in the majority, can reflect the appeal of the voting outcome or even its legitimacy. These perceptions feed back into the decision process (17). Thus, serves as an aggregate measure of underlying decision dynamics that we will use to identify pivotal blocs.

To outline our approach, we study the sensitivity of in the context of a reduced toy model that captures the essence of a median voter. The ideal median voter exists in a majority-rule system where voters’ preferences are unidimensional. By virtue of a unique ranking of preference, the median is always in the majority (1)

. We propose a statistical generalization, the Median Voter Model (MVM), with an odd number of

voters. The MVM consists of random Ordinary voters and one Median voter who always joins the majority. The binary vote of voter i, , is equally likely to be and such that only majority-minority divisions are relevant. Thus, the average votes are all the same, but the set of pairwise correlations as shown in Figure 1A display nonzero correlations between M and O, , and no correlations between O’s, . Thus, this model consists of a special voter, the Median (M), who after a voting sample has been taken, is perfectly correlated with the majority, whereas Ordinary (O) voters all behave in a statistically uniform and random way.

To capture the network of interactions between individuals from which majority-minority coalitions emerge, we take a pairwise maximum entropy (maxent) approach (18). The maxent principle describes a way of building minimal models based on data. We maximize the information entropy while fixing the model to match the pairwise correlations from the data, as defined in Figure 1A. The result is a minimal model parameterized by statistical interactions between voters, or “couplings” (19). For each pairwise correlation in Figure 1A, there is a corresponding coupling in Figure 1B such that the set of couplings is specified exactly by the pairwise correlation matrix. For the MVM, the couplings only take two possible values because there are only two different kinds of correlations. The couplings for the MVM indicate that all O’s tend to vote with M (agreement between M and O leads to an increase in the log-probability as in SI Eq B.6) with a slight tendency for O’s to disagree with each other more than would be expected given their shared correlation with M (disagreement between O and O decreases the log-probability of the vote by ). In principle, any probabilistic graph model is a viable alternative for the approach we outline, but the pairwise maxent model has been shown to capture voting statistics better than other models of voting with surprisingly few parameters (20, 21), fits the data well (SI Appendix Section B), and presents a particularly tractable model for calculating information quantities.

Figure 1: Overview of method for identifying pivotal voters in the Median Voter Model. (A) Taking the matrix of pairwise correlations, (B) we solve a pairwise maxent model to learn the probability distribution parameterized by the couplings . (C) We calculate the FIM for , the probability of votes in the majority, measuring the sensitivity of to changes in voter behavior. As we describe in SI Appendix Section C

, it is the curvature of the Kullback-Leibler divergence

obtained in the perturbative limit. The perturbation to the vector of couplings determined by Eqs 1 and 2 are denoted as . The matrix is segmented into 6x6 blocks for readability. (D) The principal eigenvector of the FIM , reshaped into an “eigenmatrix,” specifies the relative change in the probability that x’s votes are replaced by y’s as described by Eqs 1 and 2 (i.e., a positive value is the rate at which x’s voting record becomes y’s and a negative the rate at which its disagreements increase). (E) The asymmetry measures the difference in magnitude of perturbations localized to a specific voter versus all its neighbors in turn. If a voter and all its neighbors are similar, the asymmetry is close to zero. Otherwise, it is bounded by a maximum value of one. The principal subspace eigenvalues (for each outlined diagonal block in panel C) give our pivotal measure after normalization as in Eq 3.

To probe how the collective properties captured by the distribution depend on the voters, we ask how the distribution changes if the voters were slightly different. In this example of majority-minority voting, any change in voting behavior is reflected in the pairwise correlations and preserves the symmetry between the two possible outcomes and . A natural endpoint for the set of possible as we increase the pairwise correlations is when all voters are perfectly correlated, so we consider perturbations that take us towards this endpoint: with some small probability, voter y’s votes are replaced by x’s,

(1)
Eq 1

is a weighted average that interpolates from the current probability of agreement between x and y,

, when to perfect agreement when . We then account for the changes to y’s correlations with the remaining voters:
(2)

Eq 2 interpolates from the current probability of agreement between y and x when to that between x and x when . If replacing M with any O voter such that and , the operation defined in Eq 1 increases the pairwise correlation while simultaneously changing M’s correlations with the others to be more like those with O, pushing them to zero. When the statistical model exactly matches the entire distribution of votes , the perturbation described in Eqs 1 and 2 is equivalent to shifting the probability from any voting configuration where i and j disagree to the voting configuration where i and j agree, holding all others probabilities constant. With the pairwise maxent model, however, the perturbation is only reflected in the pairwise correlations, moving us from one model to another within the class of pairwise maxent models. In this case, the perturbations can be mapped to changes in the couplings in the limit of that we use to determine the entries of the FIM shown in Figure 1C (SI Appendix Section D).

The variation in the entries of the FIM clearly indicates the unique role of the median. Each entry of the FIM shows how quickly changes when two pairs of voters (y becomes x and y becomes x) are changed together. When at least one index is M, we find values different from when only O’s are involved. This distinction manifests in the principal eigenvector whose entries represent the relative amount by which pairs should be simultaneously varied for maximal local change to — as if one could change all the pairwise voting “knobs” at once for maximum effect. To be clear about the pairwise grouping of index, we reshape the principal eigenvector into an “eigenmatrix” in Figure 1D. Each column of the eigenmatrix corresponds to a directed change where voter y is made more similar to the corresponding row voter x. Since O’s are all the same, the first column connecting M to each O is uniformly valued. In the first row, the entries all correspond to making the neighbors of M more like M, so these are also all uniformly valued given that the O’s are interchangeable. Thus, each column of the eigenmatrix describes perturbations localized to the column voter and each row corresponds to changes across all the neighbors of a particular voter such that the symmetry between O’s and the unusual role of M manifests in the comparison of local neighborhood with the local neighborhood of neighbors.

This local versus neighborhood asymmetry presents one way of pinpointing an unusual voter by using the difference between the eigenmatrix and its transpose . We define this per voter asymmetry in Figure 1E. Given a normalized eigenmatrix, the total asymmetry over all voters is 0 when the eigenmatrix is perfectly symmetric and is 1 when perfectly antisymmetric . The point marks the maximum asymmetry possible when all the nonzero elements are of the same sign — such as when for each , (SI Appendix Section F). For the MVM with , we find that M’s asymmetry , whereas , clearly distinguishing M from O. The total , a point of reference for systems that are more complex than the MVM. For larger , the MVM asymmetry

grows as the role of M more visibly skews the distribution. Thus, both the asymmetry in the roles of voters and the growing importance of a median with system size is reflected in the symmetry of the eigenmatrices.

To measure the sensitivity of to each voter, we inspect the subspace eigenvalues , specifying the sensitivity of to change in a single voter’s behavior. These values are calculated from the subspace of the FIM describing localized perturbations — the diagonal blocks of the FIM as outlined in Figure 1C. The upper leftmost block corresponds to M and the remaining blocks correspond to each O in turn. For each subspace, we retrieve the principal eigenvalue. To compare the eigenvalues across voters, we calculate the normalized eigenvalue,

(3)

Eq 3 defines our measure of how “pivotal” a component, here a voter, is relative to others. For the MVM, the principal eigenvalues are and . This large difference indicates that is over 10x more sensitive to variation in M than O, again reaffirming the special role of the median. It is important to note that voters with strong asymmetry are not necessarily the most pivotal — clearly because eigenvalues and eigenvectors present different information. Still, asymmetry in the eigenmatrix indicates heterogeneity amongst the voters; thus, large asymmetry is necessary, if insufficient, for the pivotal measure to vary across a wide range. Overall, the information geometry of this minimal class of models provides a way of quantifying the role of individual components on collective outcomes, identifying key components with pivotal roles that can emerge given strong heterogeneity in the population.

Figure 2: SCOTUS example. (A) Principal eigenmatrix of the FIM. Justices are ordered from most liberal to most conservative according to a standard measure of ideology (22). We indicate the typical divisions between the liberal and conservative blocs with black lines. (B) Normalized voter-subspace eigenvalues (Eq 3) and asymmetry per justice (Figure 1E). (C) Rate of change in log-probability of dissent for dissenting blocs . (D) Each bloc ’s probability of dissenting together according to the pairwise maxent model, . (E) Rate of change in log-probability of dissenters

. Error bars represent 95% confidence intervals from repeating the full procedure outlined in Figure 

1 for bootstrapped samples of the data.
Figure 3: S&P SPDR example. (A) Principle and (C) secondary eigenmatrices of the FIM with eigenvalues and , respectively. (B, D) Relative sector subspace eigenvalues (Eq 3) and asymmetry by sector (Figure 1E). Error bars represent 95% bootstrapped confidence intervals.

2 US Supreme Court (SCOTUS) and S&P 500

We perform the same analysis on an example from SCOTUS of voters, votes, and between the years 1994–2005 (see SI Appendix Section H for details about data sets). We show the principal eigenmatrix in Figure 2 that consists of perturbations primarily increasing similarity across ideological wings given by the positive values connecting liberals and conservatives.111Since the recovered eigenvector is arbitrary with respect to sign, we could just as well consider the negative eigenvector that would reverse the sign but preserve the magnitude of the elements. The principal mode has a total asymmetry of compared to for the MVM, indicating the absence of a median-like, pivotal voter. This absence is surprising because discussion of medians A. Kennedy and S. O’Connor is prominent in the context of this court. When we consider voter-subspace eigenvalues shown in Figure 2, we find the justices in ranked order: C. Thomas, S. Breyer, and Chief Justice W. Rehnquist. A change in C.T., given his strongly conservative voting record, would naturally constitute consequential change, but the roles of W.R. and S.B. are more subtle (20, 23, 24). Despite A.K. and S.O.’s prominent role in the narrative of Supreme Court voting, we find that other justices come to the foreground when we consider the sensitivity of the Court to behavioral change.

The principal mode can be projected into the more intuitive space of dissenting coalitions in terms of the rate of change of the probabilities for dissenting blocs (SI Appendix Section E). Though the eigenmatrix in Figure 2 shows increasing similarity between ideological wings, suggesting suppression of partisan 5–4 divides, the frequency of any 5–4 divide actually increases strongly along with a decrease in lone and pair dissents as in the bottom of Figure 2. Seven of the nine most common pair dissents found in the data decrease in likelihood. Thus, this shift reflects an increasing tendency for justices to join larger blocs, reflected in the suppression of every Justices’ lone dissents, in a way that breaks the typical partisan divide. To visualize changes in the existing 5–4 conservative-liberal dynamic, we inspect defections from the liberal bloc, or 6–3 votes where a single liberal vote is missing, and likewise defections from the five-member conservative bloc. On the whole, defections from the liberal bloc are less surprising than those for the conservative bloc, consistent with the balance of power favoring conservatives. For the liberal bloc, the most prominent change entails R.G. defecting, leaving D.S., J.S., S.B., which reflects the central role of R.G. in the liberal coalition. On the other side, increasing the probability of S.O. or A.K. defecting is important though not as much as the defection of W.R., which reflects his often-understated, unusual statistical role in the Court (20). Consistent with pundits’ understanding is the large surprise associated with C.T.’s defection from the conservative majority, a change that would represent a fundamental shift in the established partisan dynamics. Overall, this individual variation in the context of the partisan 5–4 dynamic reveals a portrait of much deeper subtlety than that suggested by unidimensional partisan intuition (4, 20, 25). Thus, the information geometry of statistical models of social systems can provide detailed insight into specific components or blocs in direct connection to their role in collective modes of the system.

Figure 4: Example systems. (A, B, C) On top, we show the first two eigenmatrices for AK Supreme Court, K-pop on Twitter, and CA Assembly. Below each eigenmatrix, we show the pivotal measure and the asymmetry per component (see Figure 1). For the CA Assembly, voter are grouped into nine blocs after rank ordering by the first W-Nominate dimension (See SI Appendix Figure H.11). Large error bars in the pivotal measure indicate that the unique pivotal role of Bloc 8 depends on a few crucial votes in this session. (D) Asymmetry of the principal eigenmatrix of the FIM. Error bars represent 95% bootstrapped confidence intervals.

In Figure 3, we analyze the founding set of State Street Global Advisors SPDR exchange-traded funds (;

; 2000–2018), which replicate the indices and provide daily price data (binarized to positive

or negative daily changes including no change in analogy to votes). In contrast with SCOTUS, the collective behavior of each index reflects the aggregation of many individual investors: no stock index is monolithic in the sense of an individual voter. Given this aggregate nature, it is natural to consider the eigenvectors as the most surprising set of unanticipated global changes — although entire sectors might be “perturbed” by government policy like sector-specific regulation or tariffs. From this point of view, fluctuations in the pivotal blocs might reveal notable shifts in economic conditions or collective perceptions thereof (SI Appendix Section G). Taking a look at the model, we find that the principal mode displays large asymmetry across every index, reflecting the diversity of roles played by the various sectors of the economy as captured in price movements. Relatively large subspace eigenvalues highlight XLE (energy) and XLU (utilities), in agreement with their role as drivers of the economy on whose outputs many of the other sectors depend (26, 27). Perhaps unsurprisingly, we also find a “bellweather” XLP (consumer staples) and XLV (healthcare) as notably pivotal whereas XLF (financial) and XLI (industrials) seem to be relatively not. Going beyond the principal mode, we inspect the secondary mode and find that it is remarkably symmetric, with an asymmetry score of , in contrast with the second mode of the SCOTUS example where . This secondary symmetry is reminiscent of the MVM where a prominent asymmetric mode hides a strongly symmetric mode dominated by the symmetry amongst Ordinary voters. Such a symmetry is not found for the SCOTUS example, where at lower modes, asymmetry actually increases, signaling notable individual roles in determining collective outcomes. Thus, these examples are a comparison of opposites, where the apparent asymmetry in components obscures shared structure for SPDR, whereas for SCOTUS the overarching tendency to consensus overshadows individual roles on the Court.

3 Pivotal components in society

We explore other examples of social systems, including votes from US state high courts (14), the California State Assembly and Senate (15), and communities on Twitter (16). As with the previous two examples, we map behavior in these systems to binary form. To reduce the larger legislative bodies to a comparable number of blocs, we first separate voters into 9 blocs nearly-equally-sized blocs by ranked similarity according to a standard political science measure of ideology, the first W-Nominate dimension (28). The bloc vote is given by the majority vote of the members and is randomly chosen if equally divided (see SI Appendix Figure H.11). For Twitter communities, we identify individuals as high-dimensional binary vectors where an element is positive if they used a corresponding keyword, or else negative, such that the pairwise correlations reflects overlap in their use of keywords. Thus, our analysis of the information geometry involves the same procedure outlined above, but for a wider variety of social systems.

Considering the principal eigenmatrix of the Alaskan (AK) Supreme Court (; ; 1998–2007), we find a remarkable degree of symmetry between justices and a small value for the total asymmetry . Such symmetry implies that the justices on this court all dissent in a statistically uniform way as described by the set of their pairwise correlations. Though this could be trivially true if all pairwise couplings were the same, this is not the case, a fact that is mirrored in the spread of positive and negative values in the eigenmatrix in Figure 4. Checking the local interaction networks described by the set of couplings to every neighbor j for justice i (SI Appendix Figure B.2), we find that the sets are all similar for every justice i. This symmetry is mirrored in the similarity of the individual subspace eigenvalues shown in Figure 4. Consistent with this symmetry extracted from the voting record, four out of the five justices served as Chief Justice during this period,222W. Matthews, D. Fabe, and A. Bryner rotated as Chief Justice during the period 1997–2009 and W. Carpeneti from 2009–2012 (following the period of analysis). a regular rotation of roles imposed by the state constitution stipulating that the Chief Justice only serve for three consecutive years at a time. In contrast, we show that the New Jersey (NJ) Supreme Court (, , 2007–2010) has strong asymmetry of (SI Appendix Figure B.7). Appointments to the NJ Court follow a tradition of maintaining partisan balance, apparently codifying a median role into the institution, and we find two nearly equal pivotal voters. Despite the seeming alignment between each of these two examples and the institutional norms, AK Supreme Courts are not always less symmetric than their NJ counterparts. The asymmetry is highly variable for previous years, suggesting that codified institutional rules only partially determine the role of pivotal voters (SI Appendix Section H).

We also show the eigenmatrices of the 1999 session of the CA State Assembly (; ; 1999–2000) and a K-pop Twitter community (; ; 2009–2017). The CA Assembly is an example of strong asymmetry (). Here, Bloc 8 is dominantly pivotal, identified by the largest measure of asymmetry and the largest component subspace eigenvalue. Whilst the State Assembly’s majority is held by the Democratic party, Bloc 8 is constituted of Republicans and plays an important role in the collective voting outcomes. Indeed, we find that the mutual information between the vote of Bloc 8 with the majority vote across the blocs is  bits versus that of a Democratic Bloc 1,  bits. This measure of correlation indicates that this Republican bloc is, like a median, highly predictive of the majority outcome across all of these blocs and, additionally, is pivotal.333Examples of blocs that are predictive of outcome but not pivotal include Bloc 3 ( bits but ) and A.K. and S.O. on SCOTUS (20). For the sessions starting between the years 1993–2017, we find that the Assembly displays stronger signatures of asymmetry (average total asymmetry ) compared to the Senate (), showing how the rules of the institution might be reflected in the distribution of pivotal blocs.

As for the Twitter community, we find a great deal of heterogeneity amongst the individuals with a total asymmetry exceeding that of the MVM. In contrast with the MVM, this community contains multiple pivotal members but wide variation in the strength of their subspace eigenvalues. Twitter communities may be on average sensitive to the behavior a few individuals regardless of identity (16), but this individual-level variation suggests that collective behavior may be much more sensitive to a select Twitter users even within smaller communities (29). Going beyond the few examples in Figure 4, we find large diversity even within political institutions that highlights the important role of heterogeneity in social institutions, heterogeneity that is captured in the information geometry of minimal, maxent models.

4 Discussion

An important question in the study of social institutions is whether or not collective decisions are robust to perturbation targeting individual components. Robustness is reciprocal to sensitivity: when a system is highly sensitive to small changes to components, its collective properties are not robust. In neural networks with avalanches of firing activity

(30, 31, 32), in bird flocks with propagating velocity fluctuations (33), or in macaque societies with conflict cascades (34), such sensitivity might have an adaptive functional role. In the context of human society, questions of robustness are relevant to the stability of voting coalitions or the susceptibility of a population to disease or disinformation. For example, we might be interested in comparing the impact of different judicial nominees on the dynamics of voting on a judicial bench or the spread of disinformation on social networks. By relying on the formal framework of information geometry to investigate statistical signatures of sensitivity, we present a data-driven and general approach to characterizing robustness. As a result, our approach is not model-specific, only relying on the calculation of how sensitive a model is to changes in observable individual behavior (Eq 3).

In voting systems, median voters are conventionally considered to be power brokers who have outsize influence (1, 22, 7). Building on this idea, we propose a reduced toy model to extract features of the Fisher information matrix that correspond to signatures of a median voter. We show how to identify and interpret signatures of strong sensitivity on individual components in multiple social contexts, generalizing the intuition behind the median to pivotal components on which aggregate properties, measured by majority-minority divisions, depend strongly. Intriguingly, we find hints that institutional differences may contribute to structuring individual roles in collective outcomes both in state courts and state legislatures. Though it is unsurprising that the particular rules of a voting body may structure bloc dynamics, pivotal components provide a principled way of comparing social systems with differing composition, from different eras, and across different institutions in a unified, quantitative framework.

We might think of pivotal components as “knobs” that could drive a system out of its current configuration described by the ensemble . If the subspace eigenvectors are knobs, the pivotal measure is proportional to the spacing of the dials such that for large eigenvalues the smallest turn results in the strongest effect. Since each pivotal component only considers the effects of perturbations localized to a single component, these knobs are independent. If components were accessible simultaneously, however, we would consider the joint space of multiple pivotal components, and the principal subspace eigenvalue must increase beyond (or stay at) the maximum eigenvalue over the set of component subspaces: this reflects the fact that enhancing the breadth of control only increases the range of possible outcomes (35, 36). By considering which knobs are accessible experimentally, our analysis could be extended to measuring signs of statistical control in real systems. For judicial voting, the realizable knobs that change judicial voting behavior may be the submission of amicus curiae briefs, choice of litigating cases, or lobbying.444We are careful to point out that the ensemble of votes for political systems already include such effects so it is important to distinguish between endogenous and exogenous factors. Those trying to craft a legislative coalition might “perturb” aspects of proposed policy to affect its acceptability to potential supporters (37)

. In controlled biological systems, localized perturbations to single components could include manipulation of single neurons or the upregulation of specific genes.

555For example, manipulation of single neurons is possible by electrical stimulation, optogenetic techniques, or chemical stimulation, all ways of enacting the localized perturbations of neural “votes” (30). Analogously, gene expression might be perturbed by switching genes on and off or by adding protein directly to simulate changed expression levels (38). Our work presents the possibility of informing the direction of such external perturbations in the broader context of control.

Our understanding of the interplay between components and social structures at the mesoscopic or macroscopic scales across social and biological examples remains nascent. With this principled, quantitative approach for measuring pivotal components, we might, by comparing systems, better understand how institutional factors shape the emergence of social structure.

We thank Katherine Quinn, Guru Khalsa, Bryan Daniels, Jess Flack, and David Krakauer for helpful conversations. We thank Gavin Hall with help with the Twitter data. EDL acknowledges a dissertation grant from the Dirksen Congressional Research Center, an NSF GRFP under grant no. DGE-1650441, and . DMK & MJB thank Illinois Tech - Chicago Kent College of Law for support of this project.

Appendix A Median Voter Model (MVM)

In the canonical median voter model (1), we assume that voters are described by single-peaked, unidimensional preference function and that they vote according to them. This has been proposed as a model for voting in political science and for how supply-demand curves determine market prices. The median position is special because it determines the only outcome for which it is possible to obtain a simple majority against all alternatives. Here, we map this idea to a reduced, statistical model.

The MVM consists of an odd number of voters with a single special voter, labeled Median, that is guaranteed to vote in the majority. More generally, we can interpolate between a perfect median and set of random voters by setting a probability that Median votes in the majority and that Median votes randomly (either or ). The remaining voters are always random. This model presents a simple testing ground for exploring the information geometry of a system with a unique, statistically well-defined median voter that can be range from random () to perfect median ().

The probability distribution defined by the MVM cannot be exactly captured by a pairwise maximum entropy (maxent) model. We recall that pairwise maxent models can be derived by maximizing the entropy of the model while constraining the single component and pair component distributions to match the data. From the maxent perspective, the MVM is equivalent to specifying that Median be perfectly correlated with the majority of Ordinary voters, a correlation of the form,

(A.1)

In general, this nonlinear correlation cannot be written as the linear combination of pairwise correlations. Thus, the MVM serves as a testing ground as a reduced statistical model of a median voter whose nontrivial correlations with the majority vote cannot be captured perfectly by a pairwise maxent model.

The MVM can be solved numerically for large by exploiting the symmetry between the Ordinary voters, allowing for fast enumeration of the entire partition function in linear time. This solution will be discussed elsewhere, but in Figure A.1 we show initial numerical results from such a calculation (39). We find that the eigenvalue of the Median subspace grows linearly with whereas the eigenvalue for the Ordinary voters grows much slower. Along with this divergence, we find that the asymmetry monotonically grows for the Median at the exclusion of Ordinary voters (since it is normalized). Thus, the MVM presents a model where the Median voter quickly becomes the exclusively dominant voter for large systems.

Figure A.1: Voter eigenvalues and asymmetries as a function of system size for the MVM. The subspace eigenvalue for Median grows linearly with system size while its asymmetry asymptotically dominates over Ordinary voters’ (39).

Appendix B Fitting the pairwise maxent model

We model the probability distribution of votes using a pairwise maxent approach (18). We begin by maximizing the information entropy while ensuring that the model match the pairwise correlations calculated over the data points (19),

(B.2)
(B.3)

where the left hand sum is over all possible configurations of the binary vector , and the right hand sum is over all observations in the data. Going through the usual calculation (40), we derive the pairwise maxent model

(B.4)
Eq B.4 is normalized by the “partition function”
(B.5)
and contains the energy functional of the form
(B.6)

The “couplings” are numerically solved such that the model matches the pairwise correlations.666The keen reader may note that we did not constrain the average of each element of the vector , so we did not explicitly fix the individual marginal distributions in this abbreviated derivation. Indeed, we assumed that the averages were 0 because we were only interested in majority-minority dynamics so each vote equally likely to be either up or down, . As a result, the “fields” in the complete energy function are 0 as in Eq B.6. In other words, the maxent distribution when fixing only the pairwise correlations shows symmetry about the two possible orientations and 1. In this sense, the couplings are not fit to the data, but are given exactly by the pairwise correlations in the data. For the small systems that we consider, these couplings can be found exactly by explicit calculation of the pairwise correlations and standard numerical optimization techniques as implemented in the Convenient Interface for Inverse Ising, or ConIII (40). For the MVM, the problem can be simplified because there are only two types of couplings corresponding to the two types of correlations between Median and Ordinary and between Ordinary voters. All the shown examples from Figs. B.2B.7 have least-squares fit norm errors to the pairwise correlations of . Thus, the relatively small system size ensures that straightforward enumeration techniques can be used to solve directly for the pairwise maxent models that correspond to the data.

Across all the system that we study, we find that the pairwise maxent model captures well higher-order features of the data even when only fit to the pairwise correlations. To characterize this fit, we use a property of maxent models. The entropy of the maxent model always decreases with the inclusion of additional constraints such that entropy is largest when all components are treated as independent  bits and minimized when the entire probability distribution of the data is fit exactly , where is the entropy of the model matching all correlations up to and including order i.777

For how to estimate the entropy of the data see Refs 

(20) or (41)

. Calculating an unbiased estimate entropy of the entropy of the data can be an issue for sparse samples, but is straightforward for the relatively large number of samples we have.

Thus, as we increase the number of parameters, we impose higher-order structure, and we monotonically approach the entropy of the data. This suggests as a measure for comparing maxent models, the total multi-information captured (20)

(B.7)

which varies from 0 (no improvement beyond the independent model) to 1 (exact fit to the data). For the examples considered in the main text, the pairwise maxent model serves as an excellent fit, capturing over 94% of the multi-information in all cases. The pairwise maxent model captures over 98% of the multi-information for the MVM. Overall, the pairwise maxent model is a minimal but convincing approximation of the ensemble statistics of the examples we consider (42, 43).

Figure B.2: Pairwise correlations and couplings for AK Supreme Court.
Figure B.3: Pairwise correlations and couplings for US Supreme Court.
Figure B.4: Pairwise correlations and couplings for Twitter K-pop community.
Figure B.5: Pairwise correlations and couplings for SPDR.
Figure B.6: Pairwise correlations and couplings for CA Assembly 1999 session. Composition of blocs is given in Figure H.11.
Figure B.7: Pairwise correlations and couplings for NJ Supreme Court. The initials stand for Barry T. Albin, Helen E. Hoens, Jaynee, LaVecchia, Virginia Long, Stuart Rabner, Roberto A. Rivera-Soto, and John E. Wallace Jr.

Appendix C Specifying the Fisher information metric

If we have a statistical model described by a probability distribution over a set of discrete states and parameterized by parameters , how do we measure how different one model is from another? The Kullback-Leibler (KL) divergence is one such measure that tells us how much information is necessary to reach a distribution if we know (9, 44)

(C.8)

In the limit where the two distributions are infinitesimally close to one another, the KL divergence becomes a metric. The constant and linear terms go to zero, and the first nonzero term is the curvature of the divergence, the Hessian, which is also known as the Fisher information

(C.9)

Thus, the FI is a description of how quickly the probability distribution changes if we move along various directions in parameter space. Because it is a metric, the eigenvectors of the Hessian correspond to directions in the tangent space of the model manifold, where the eigenvalues describe how quickly the manifold is varying along these modes.

The Fisher information also measures how much information about a parameter is in a random sample (9). When a parameter is extremely sensitive to the distribution of data, then the information shared is high, whereas when it is fairly insensitive the information shared is low. This is described formally by the Cramér-Rao bound, which sets a lower bound on the precision of an unbiased estimator for the parameters (9)

. This picture, more formally, has been used as technique for model reduction, removing degrees of freedom in parameter space to which the system is insensitive

(45). Here, we focus on the sensitive degrees of freedom, using them to identify components interesting because their behavior is precisely determined by the statistics of the data, or equivalently on whose behavior collective statistics depend the most sensitively.

We propose using as parameters aspects of the system that provide transparent insight into how perturbation of the parameters affects the system. In physical systems, it is natural to consider the couplings , or more generally the Langrangian multipliers from the maximum entropy formulation, as the parameters by which to control the system because they are experimentally accessible (e.g., a magnetic system can be tuned by an applied field or by changing temperature that modulates all couplings by a factor). For a statistical model of a social system, however, the meaning of the terms in the energy functional are opaque and often nontrivial. In other words, we do not know how to access the coupling parameter . Certainly, we could be methodical about it, calculate the corresponding changes in the set of pairwise correlations for a perturbation in , but that would result in changes across all pairwise correlations in varying amounts. This change may be difficult to effect in a social system when opportunities for control are often limited.

This impracticality suggests a different approach, where we instead consider how the measured behavior of the system might be perturbed in an accessible way since these are straightforward to measure. This reasoning leads us to consider as parameters the observables. Formally, the observables for a maxent model are the conjugate variables to the Langrangian multipliers as given by the Legendre transform (46). There is no difference in knowing one or the other: the transformation is a one-to-one mapping. For the pairwise maxent model, the “natural parameters” are the couplings and their conjugate the pairwise correlations

(C.10)
(C.11)

Eq C.11 states the well-known relation that the “free energy,” , is the Legendre transform of the Shannon entropy. By working in the space of observables, we do not lose any information — indeed the model started with the observables in the first place — but find a more amenable representation.

In the main text, we take one further step by choosing to consider changes to the observables that are interesting as specified in Eq 1. For example, it is a common thought experiment in discussing Supreme Court voting to imagine how the system would change if the justices were different. The justices could be different in any which way, but we narrow the range of possible perturbations substantially by focusing on relative voting records, restricting ourselves to the range of behavior already observed in a system. It makes intuitive sense to ask how the Court would change if Justice Scalia were to vote more like Justice Thomas because their behaviors are specified by the voting record, but it requires much more work to determine what would happen if Scalia were to vote more like a judge picked from the appellate courts. There is no reason such a counterfactual could not be entertained, but it would require modeling that judge’s votes on the same set of cases that Scalia voted on. We restrict ourselves from considering such open-ended questions, leaving them as potential extensions of our work. Importantly, we choose perturbations that are localized to particular components, interpretable in their mapping to behavioral changes, and applicable across a wide range of systems.

Another advantage of treating observables as parameters is that it offers some independence from the choice of model. As a simple example of the distinction between treating an observable as the parameter or a term in the energy function, consider the biased coin. With probability the coin flips heads and with probability it flips tails. If the parameter is the bare observable, the average coin flip, that is equivalent to changing up to a constant factor.

(C.12)

Taking the transformation in Eq C.12, we calculate the Fisher information (FI) to find (Figure C.8)

(C.13)

The Fisher information diverges at the boundaries of the parameter space and because that is where a finite change in can lead to a diverging information distance. More closely with the perturbation considered in the main text, we could insist that the coin “mimic” a perfectly biased coin such that

(C.14)

Under this scenario, the Jacobian captures the fact that the coin’s bias makes no finite jump near , but changes ever more slowly when it is almost a perfectly biased coin. As a result,

(C.15)

which goes to zero at .

Now, consider a maxent version of this problem which is the nonlinear transformation

(C.16)

where the field determining the bias is and

(C.17)

In contrast with using as the parameter, the Eq C.17 peaks at and decays to 0 at the boundaries, and an infinite change in is necessary to reach and . Of course, we could have chosen any possible model, choosing instead of the maxent transformation in Eq C.16, our favorite nonlinear transformation. Thus, by choosing our favorite model, we would end up effectively specifying the FI. This is generally not an issue if one cares about measuring the relationship between a particular model and the data, but it does become an issue if one cares more about the statistics of the data rather than of the particular model specified.

Figure C.8: Fisher information for biased coin according to different choices of perturbation: changing directly (Eq C.13), substitution with a biased coin (Eq C.15), and changing the “field” (Eq C.17).

If we restrict ourselves to perturbing , we ensure that the choice of perturbation does not depend on the choice of model. Additionally when the probability distribution is matched exactly, there is no dependence on the model class. In the special case of the biased coin, the probability distribution is specified exactly by a single parameter. Assuming we can measure it with infinite precision and we have a model that can fit the measured exactly (e.g., a model limited to does not count), the calculation of FI — whether Eq C.13, C.15, or some other choice — is concretely defined. More generally, data resolution is not perfect and so we must infer probabilities for configurations that we have not observed. As an example, the pairwise maxent approach assumes that the pairwise marginal distributions are known exactly, but the higher-order joint probabilities are assumed to conform to the maxent principle. When any such model feature is used in the calculation of the FI, clearly the FI will depend on the assumptions of the model. For the pairwise maxent model, this means that the calculation of FI on features of the distribution that are constructed explicitly from pairwise marginals matches the data exactly, but in general the distribution of majority-minority divisions does depend on the maxent assumptions. This is not always unfavorable: if the higher-order terms decrease in importance such that they behave as small perturbations on top of the pairwise model, the FI will show weaker dependence on these corrections (43).

Appendix D Calculation of the Fisher information matrix (FIM)

Here, we calculate the FIM for the transformation described in Eqs 1 and 2 and go through some examples to show how to calculate the FIM. We go into some detail with the derivation to make clear how to perform such a calculation for those less familiar with the topic.

In Eqs 1 and 2, we consider how the correlations between component y and all other components x change when component y appears to vote more like component x. To effect this perturbation, we use a parameter that leads to a linear change in the couplings as described by the rate of change , where we are taking a total derivative with respect to the change in the pairwise probabilities described by the vector . To obtain this derivative, we perturb to first order in the expression for the pairwise correlations (Eq B.2) to obtain the self-consistent equation

(D.18)

By self-consistent, we are referring to the fact that the new pairwise correlations after perturbation depend on the change in the couplings , so the perturbation to the couplings are on both sides of Eq D.18. Another way to describe Eq D.18 is as the linear combination of the linear response functions of every pairwise correlation to a change in the corresponding coupling, also known as the susceptibility. The perturbations in the couplings are related to the linear response of the couplings to a change in the collective statistics,

(D.19)
The resulting matrix of the changes in the correlations describes the transformation for every pair xy given a change in xy:
(D.20)
(D.21)

The set of the perturbations appear in Figure 1C representing a vector of all pairwise couplings when perturbing the pair of voters x and y.

Eqs D.18D.21 describe the numerical algorithm for calculating the changes in the statistics of the system under the perturbation described in the main text. Note that they implicitly depend on , which must be taken to zero. The remaining calculations are to coarsen the full distribution to , the distribution of votes in the majority and to calculate the FIM on . For pedagogical clarity, we will first show how to calculate the FIM without coarse-graining.

There is a simple, intuitive form for the FIM for maxent models. Under an infinitestimal change in the parameters such that the energy of each voting configuration , we can expand

(D.22)

Now calculating the Kullback-Leibler divergence to second order,

(D.23)

The FI is the second order term, or the curvature, so we must send the norm change in the energy ,

(D.24)

where is defined in Eq D.19. For the symmetrized pairwise maxent model class, is the sum over all couplings. Thus, the FI of the distribution over the full state space has a simple form in terms of the change in energy for maxent models.

Alternatively, the result from Eq D.24 can be expressed as a matrix of correlation functions. In other words, the correlation functions are the linear response functions for perturbations to the natural parameters, here the couplings . As the simplest example, consider an Ising model under perturbation to a particular coupling . Using our form Eq D.24 for the FI and ,

(D.25)
The perturbations to the couplings do not depend on the state , so they can be pulled out of the averages to obtain
(D.26)
(D.27)

The diagonal entries of the FIM are the variance of the pairwise correlation, which is a well-known result. It is straightforward to see that the off-diagonal elements of the FIM are the covariance

. Thus, the FI for maxent models reduces to the covariance of the set of observables chosen as constraints when we are dealing with natural parameters.

As a more general formulation, consider the set of Lagrangian multipliers and their corresponding bare observables (“bare” referring to the fact that we have yet to dress them with brackets by averaging over the ensemble). For the pairwise maxent model, the Lagrangian multipliers are the couplings and the bare observables are the pairwise products . Working through the same calculation as before but with this general formulation of a maxent model, we find for the FI,

(D.28)

where the depend implicitly on . As noted earlier, the perturbation in the pairwise agreement probabilities leads to a nontrivial combination of changes to the entire vector of couplings. As a result, the FI in Eq D.28 contains cross terms between all pairwise correlations and the change in the Langrangian multipliers each come with a factor of the Jacobian relating changes in the pairwise marginals to the couplings as described by Eq D.18.

For the analysis in the main text, however, there is an additional step. We do not consider the full state space, but coarse-grain each to the distribution of votes in the majority . As a result, we are not calculating the variance in the energies for the pairwise maxent model as described in Eq D.24, but the variance in the logarithm of the sum of all terms in the partition function with voters in the majority. We label the set of all states with voters in the majority to write

(D.29)
(D.30)

Eq D.30 defines an effective “ majority” energy such that under perturbation to the pair of components x and y as indicated by

(D.31)

Eq D.31 is the form that the limit in Figure 1C takes.

To summarize the algorithm, we first find the total derivative of each coupling with respect to the change in the pairwise marginals as explained in Eq D.19. Then, we calculate the change in the distribution of votes in the majority for both the model without the perturbation and with for a range of small values as in Eq D.23. By comparing these two distributions for increasingly smaller , we estimate numerically the FIM, relying on the definition of an “effective” energy as in Eq D.30 as a tool for dealing with issues in numerical precision that may arise when comparing ratios of floating point numbers. These steps generate the FIM as shown in Figure 1C with which we calculate the eigenvalue spectrum to measure our pivotal voters.

Appendix E Dissenting coalitions

In Figure 2, we project the eigenvectors onto the probabilities of dissenting coalitions to obtain a detailed picture of how the parameter directions obtained from the FIM affect dissenting coalitions. Such a projection involves taking the sum over all the probabilities of the states with the particular dissenting bloc and calculating the effective energy. Expanding the log-likelihood to first order, we calculate the rate at which this probability changes to be

(E.32)

The limit refers to an infinitestimal perturbation of along the first eigenvector of the FIM. Then, the rate of change in the log-likelihood simplifies to comparing the change in the effective majority energy with the average change across .

Appendix F Measure of asymmetry

Figure F.9: Total asymmetry for the binary system specified by the 2x2 matrix in Eq F.34.

As a way of measuring the heterogeneity between the components of a system, we calculate the asymmetry of the “eigenmatrices” of the Fisher information matrix as defined in Figure 1E. Given that the matrices are normalized , the total asymmetry can be written

(F.33)

Thus, we might think of the asymmetry as a measure of correlation between the entries in the upper triangle and lower triangle of the eigenmatrix. When they are perfectly correlated such that the matrix is symmetric, . If the entries are completely uncorrelated with their partners in the transpose , then . When they are anti-correlated, the summation in Eq F.33 can become negative and .

As an example, consider the asymmetry for a 2x2 matrix

(F.34)

The normalization constrains and to be constrained to the unit circle,

(F.35)

In Figure F.9A, we have colored the upper and lower halves of this circle (for positive and negative values of ) by different colors. Now calculating the total asymmetry,

(F.36)

We plot Eq F.36 in Figure F.9B and again color the curves differently depending on the half of the unit circle that we are tracing out. When , normalization asserts that and the total asymmetry . As we increase , we can follow along the positive (negative) route which leads to maximization (minimization) of at . As we keep increasing to , we return to and have effectively swapped the roles of and (a rotation of the matrix in Eq F.34 by ).

Appendix G Time series analysis of SPDR

Figure G.10: Retrospective time series analysis of the SPDR for the two most pivotal indices XLE and XLU and least pivotal XLY. We use a moving windowed window of duration  days and a shift of  days. The width of the moving window is delimited by the gray box. We compare the maximum of the normalized eigenvalues of the covariance matrix with , the projection of the windowed time series fluctuation onto the stock index principal subspace eigenvector (Eq G.37). Lines are drawn for readability.

The temporal fluctuations of the sector indices in the SPDR represent potentially useful information about changing economic conditions. As a preliminary demonstration of the type of analysis that may be interesting in this context, we consider a retrospective analysis of how local temporal fluctuations in the market are reflected in the subspace eigenvectors of the FIM.888Note that the subspace eigenvectors are calculated from the entire time series available, whereas realtime analysis would rely only the statistics available up to the current time. This is why we call this example “retrospective.”

To do this, we take a long temporal window  days that allows us to obtain a precise estimate of the distribution of configurations in a time window . Then, we minimize the KL divergence between and the pairwise maxent model solved on the entire data set with change in the couplings constrained to be along the principal stock index subspace eigenvector by adjusting the coefficient ,

(G.37)

The magnitude of this coefficient is a measure of how strongly the fluctuations in the windowed time series are reflected in the direction of parameter space specified by the subspace eigenvector. As we show in Figure G.10, the fluctuations show patterns that diverge at many points from the maximum of the normalized eigenvalue of the windowed covariance matrix, a measure used to determine when economic conditions are changing (13). In particular, we note that periods of time where the best fit value between the various stock indices are correlated or anti-correlated may be useful indicators. Although it remains to relate these patterns to recognized features of the time series, this presents a potentially useful complement to existing tools for analyzing market data.999We do not discuss in detail here the difficulty of estimating information quantities in the limit of small data, an important issue for realtime forecasting of changing economic conditions. Entropy estimation for small samples remains an active research problem (41, 47), and we avoid this issue by taking long windows.

Appendix H Additional notes on data sets

Figure H.11: Names of congressmen and congresswomen in CA Assembly 1999 session by voting bloc as determined by ranking on first W-Nominate dimension. Though all members were included for the W-Nominate analysis, only members who voted in more than 20% of the recorded votes were included for the coarse-graining and maxent solution.

h.1 US state supreme courts

We obtained the latest data set from the State Supreme Court Data Project (SSCDP) and used their binary coding of justice votes (14).

We show the total asymmetry for all the natural courts on the Alaska and New Jersey Supreme Courts we considered in Figure H.12. We only considered natural courts with at least 100 where the full complement of justices were voting. As we mention in the main text, there is variation in the measured value of asymmetry that makes it unclear whether or not there is relationship between the total asymmetry and the codified institutional rules of voting.

Figure H.12: Total asymmetry for the natural courts in the AK and NJ Supreme Courts. We show the calculated asymmetry with lines spanning the first to last years on record for a full vote (including every sitting justice). There is overlap between natural court years because some of the data are mislabeled and show justices participating in votes after their official date of retirement.

h.2 Scotus

We use data from the Supreme Court Database Version 2016 Release 1, taking their binary coding of majority-minority votes (12). This same data set and version has been analyzed previously. See Refs (20, 21).

h.3 Spdr

The SPDR Select Sector indices track the Standard & Poor’s (S&P) and Morgan Stanley Capital International (MSCI) Global Industry Classification Standard (GICS) sectors. As described on Wikipedia, GICS “is an industry taxonomy developed in 1999 by MSCI and S&P for use by the global financial community. The GICS structure consists of 11 sectors, 24 industry groups, 69 industries and 158 sub-industries into which S&P has categorized all major public companies. The system is similar to ICB (Industry Classification Benchmark), a classification structure maintained by FTSE [Financial Times Stock Exchange] Group.”

We focus on these assets and their adjusted price action because (1) they are the most heavily-traded and representative sector assets in the world, so their prices and volumes reflect actual interest in exposure to the sectors, (2) they have been traded daily without exception for over 20 years, and (3) unlike the Dow indices, the S&P indices are not subject to effects of price-weighting such as reverse-split over-weighting. The historical price data is available online on Yahoo! Finance.

h.4 Twitter

We analyze one of the communities from the data considered in Ref (16). In this work, the authors divide the Twitter community into smaller subcommunities using the CNM algorithm (48). We take one example from their K-pop community with 10 individuals.

h.5 CA Assembly

Session records were obtained from Prof. Jeff Lewis’ scrape of the CA legislature’s public data API (15). For all sessions from 1993 through the 2017 session, we solved the W-Nominate model using the code provided in Ref (28). We then removed any voter who did not participate in more than 20% of the votes, rank-ordered the voters by the first W-Nominate dimension, and divided them as equally as possible into 9 groups as shown for the 1999–2000 session in Figure H.11.

For the results of bootstrap sampling to calculate error bars, we found that 3% of the samples showed significant error from the fit correlations because of numerical precision issues. This is generally an issue for systems that are poised near the boundaries of the model manifold where the couplings become large. For the error bars on the normalized subspace eigenvalues, however, the contribution from these three missing samples is negligible.

References

  • (1) Black D (1948) On the Rationale of Group Decision-making. J. Political Econ. 56(1):23–34.
  • (2) Downs A (1957) An Economic Theory of Democracy. (Harper, New York).
  • (3) Arrow K (2012) Social Choice and Individual Values. (Yale University Press), 3rd edition.
  • (4) Lauderdale BE, Clark TS (2012) The Supreme Court’s Many Median Justices. Am. Polit. Sci. Rev. 106(4):847–866.
  • (5) De Donder P, Le Breton M, Peluso E (2012) Majority Voting in Multidimensional Policy Spaces: Kramer-Shepsle versus Stackelberg. JPET 14(6):879–909.
  • (6) Epstein L, Jacobi T (2010) The Strategic Analysis of Judicial Decisions. Annu. Rev. Law. Soc. Sci. 6(1):341–358.
  • (7) Segal JA, Spaeth HJ (2002) The Supreme Court and the Attitudinal Model Revisited. (Cambridge University Press, New York).
  • (8) Amari Si (2016) Information Geometry and Its Applications, Applied Mathematical Sciences. (Springer, Japan) Vol. 194.
  • (9) Cover TM, Thomas JA (2006) Elements of Information Theory. (John Wiley & Sons, Hoboken), 2nd edition.
  • (10) Transtrum MK, Machta BB, Sethna JP (2011) Geometry of nonlinear least squares with applications to sloppy models and optimization. Phys. Rev. E 83(3):036701.
  • (11) Machta BB, Chachra R, Transtrum MK, Sethna JP (2013) Parameter Space Compression Underlies Emergent Theories and Predictive Models. Science 342(6158):604–607.
  • (12) Spaeth HJ, et al. (2016) Supreme Court Database (http://Supremecourtdatabase.org).
  • (13) Bommarito MJ, Duran A (2018) Spectral analysis of time-dependent market-adjusted return correlation matrix. Physica 503:273–282.
  • (14) Liburd D, Barbosa S (2009) State Supreme Court Data Project.
  • (15) Lewis J (2019) California Assembly and Senate Roll Call Votes, 1993 to the present (http://amypond.sscnet.ucla.edu/california/).
  • (16) Hall G, Bialek W (2018) The statistical mechanics of Twitter. arXiv:1812.07029 [physics].
  • (17) Urofsky M (2017) Dissent and the Supreme Court: Its Role in the Court’s History and the Nation’s Constitutional Dialogue. (Knopf Doubleday Publishing Group).
  • (18) Shannon CE (1948) A Mathematical Theory of Communication. Bell Syst. Tech. J. 27:379–423, 623–656.
  • (19) Jaynes ET (1957) Information Theory and Statistical Mechanics. Phys. Rev. 106(4):620–630.
  • (20) Lee ED, Broedersz CP, Bialek W (2015) Statistical Mechanics of the US Supreme Court. J. Stat. Phys. 160(2):275–301.
  • (21) Lee ED (2018) Partisan Intuition Belies Strong, Institutional Consensus and Wide Zipf’s Law for Voting Blocs in US Supreme Court. J. Stat. Phys. 173(6):1722–1733.
  • (22)

    Martin AD, Quinn KM (2002) Dynamic Ideal Point Estimation via Markov Chain Monte Carlo for the U.S. Supreme Court, 1953–1999.

    Polit. anal. 10(02):134–153.
  • (23) Sirovich L (2003) A pattern analysis of the second Rehnquist U.S. Supreme Court. Proc. Natl. Acad. Sci. U.S.A. 100(13):7432–7437.
  • (24) Lawson BL, Orrison ME, Uminsky DT (2006) Spectral Analysis of the Supreme Court. Math. Mag. 79(5):340.
  • (25) Giansiracusa N, Ricciardi C (2019) Computational geometry and the U.S. Supreme Court. Math. Soc. Sci. 98:1–9.
  • (26) Arora V, Lieskovsky J (2014) Electricity Use as an Indicator of U.S. Economic Activity, (U.S. Energy Information Administration, Washington, D.C.), Technical report.
  • (27) Kang W, Ratti RA, Yoon KH (2015) The impact of oil price shocks on the stock market return and volatility relationship. J. Int. Financ. Mark. Inst. Money 34:41–54.
  • (28) Poole K, Lewis J, Lo J, Carroll R (2011) Scaling Roll Call Votes with wnominate in R. J. Stat. Softw. 42(14):21.
  • (29) Bakshy E, Hofman JM, Mason WA, Watts DJ (2011) Everyone’s an influencer: Quantifying influence on twitter in Proceedings of the Fourth ACM International Conference on Web Search and Data Mining - WSDM ’11. (ACM Press, Hong Kong, China), p. 65.
  • (30) Schneidman E, Berry MJ, Segev R, Bialek W (2006) Weak pairwise correlations imply strongly correlated network states in a neural population. Nature 440(7087):1007–1012.
  • (31) Ponce-Alvarez A, Jouary A, Privat M, Deco G, Sumbre G (2018) Whole-Brain Neuronal Activity Displays Crackling Noise Dynamics. Neuron 100(6):1446–1459.e6.
  • (32) Friedman N, et al. (2012) Universal Critical Dynamics in High Resolution Neuronal Avalanche Data. Phys. Rev. Lett. 108(20):208102.
  • (33) Bialek W, et al. (2014) Social interactions dominate speed control in poising natural flocks near criticality. Proc. Natl. Acad. Sci. U.S.A. 111(20):7212–7217.
  • (34) Daniels BC, Krakauer DC, Flack JC (2017) Control of finite critical behaviour in a small-scale social system. Nat. Comms. 8:14301.
  • (35) Liu YY, Slotine JJ, Barabási AL (2011) Controllability of complex networks. Nature 473(7346):167–173.
  • (36) Zañudo JGT, Yang G, Albert R (2017) Structure-based control of complex networks with nonlinear dynamics. Proc. Natl. Acad. Sci. U.S.A. 114(28):7234–7239.
  • (37) Moore MA, Katzgraber HG (2014) Dealing with correlated choices: How a spin-glass model can help political parties select their policies. Phys. Rev. E 90(4):042117.
  • (38) Santolini M, Barabási AL (2018) Predicting perturbation patterns from the topology of biological networks. Proc. Natl. Acad. Sci. U.S.A. 115(27):E6375–E6383.
  • (39) Lee ED (2019) To be published.
  • (40) Lee ED, Daniels BC (2019) Convenient Interface to Inverse Ising (ConIII): A Python 3 Package for Solving Ising-Type Maximum Entropy Models. JORS 7(1):3.
  • (41) Bialek WS (2012) Biophysics: Searching for Principles. (Princeton University Press, Princeton, NJ).
  • (42) Bialek W, Ranganathan R (2007) Rediscovering the power of pairwise interactions. arXiv:0712.4397 [q-bio].
  • (43) Merchan L, Nemenman I (2016) On the Sufficiency of Pairwise Interactions in Maximum Entropy Models of Networks. J. Stat. Phys. 162(5):1294–1308.
  • (44) Quinn KN (2019) Ph.D. thesis (Cornell University).
  • (45) Transtrum MK, Qiu P (2014) Model Reduction by Manifold Boundaries. Phys. Rev. Lett. 113(9):098701.
  • (46) Zia RKP, Redish EF, McKay SR (2009) Making Sense of the Legendre Transform. Am. J. Phys. 77(7):614–622.
  • (47) Nemenman I, Shafee F, Bialek W (2002) Entropy and Inference, Revisited in Advances in Neural Information Processing Systems 14, eds. Dietterich TG, Becker S, Ghahramani Z. (MIT Press), pp. 471–478.
  • (48) Clauset A, Newman MEJ, Moore C (2004) Finding community structure in very large networks. Phys. Rev. E 70(6):066111.