 # Surprise in Elections

Elections involving a very large voter population often lead to outcomes that surprise many. This is particularly important for the elections in which results affect the economy of a sizable population. A better prediction of the true outcome helps reduce the surprise and keeps the voters prepared. This paper starts from the basic observation that individuals in the underlying population build estimates of the distribution of preferences of the whole population based on their local neighborhoods. The outcome of the election leads to a surprise if these local estimates contradict the outcome of the election for some fixed voting rule. To get a quantitative understanding, we propose a simple mathematical model of the setting where the individuals in the population and their connections (through geographical proximity, social networks etc.) are described by a random graph with connection probabilities that are biased based on the preferences of the individuals. Each individual also has some estimate of the bias in their connections. We show that the election outcome leads to a surprise if the discrepancy between the estimated bias and the true bias in the local connections exceeds a certain threshold, and confirm the phenomenon that surprising outcomes are associated only with closely contested elections. We compare standard voting rules based on their performance on surprise and show that they have different behavior for different parts of the population. It also hints at an impossibility that a single voting rule will be less surprising for all parts of a population. Finally, we experiment with the UK-EU referendum (a.k.a. Brexit) dataset that attest some of our theoretical predictions.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Recent times have witnessed quite a few elections whose outcomes are widely considered as surprises. News reports covered the unprecedented impact on trade, national economies, and job markets because of the results of the elections (e.g., Brexit (News, 2016), US presidential elections (Independent, 2016), UK parliamentary election (News, 2017b, a) etc.) particularly because many people and the market were unprepared for such an outcome. It has impacted not only the economy and made the stock markets unpredictable, the social impact was also paramount. It was clear that the social connections – either online or offline – and the mass communication media – print or electronic – that are important factors in opinion building, have a localized effect which does not give a holistic idea of the outcome of an election. This effect is more prominent in the online social media, since communities in social media inevitably group similar people together and it is easy to ignore biases. Having a large number of friends on an online social network may solidify the belief that the local observation is quite a representative sample than what actually is true. This raises a natural question:

“Can the surprise/shock in an election be explained by the social network structure or the biases in the perception of the voters?”

In this paper, we address this question by proposing a model of the social network formation and voters’ perception of the winner. We show that the answer cannot be obtained from an analysis that focuses on only the network structure or only the voter perception. For instance, if we consider only network structure, the following example shows that any perception about the connection probability will always leave at least half the population surprised.

###### Example 1 (Limitation of a structure-based conclusion)

Suppose in a population of (even) voters with two candidates (red and blue), are red (meaning they prefer red over blue) and the rest are blue. The voting rule is plurality. Suppose the network structure is such that each voter is connected with every other voter that has the same color as hers, but is connected to exactly voters of the other color. If she perceives the winner just by counting the majority at her own neighborhood, then every voter will ‘think’ that her favorite candidate wins, and no matter how the winning candidate is chosen, half the population will always be surprised at the outcome.

Clearly, the example can be adapted if the voters discount the number of voters of their own color (given the fact that they are more likely to be connected with a similar colored voter) to yield the same conclusion. Moreover, if there are more than two candidates, an extension of the construction above will lead to a surprise of the voters in the classes where the actual winner (in plurality voting over all voters) is not their favorite candidate.

So, it is clear that a worst case analysis over the social network structure will always lead to surprise in election – which is hardly the case in practice – elections with unsurprising outcomes are in fact quite normal. Later in the paper, we discuss how error in voter perception alone also cannot give rise to surprise. Our approach takes into account both these factors simultaneously and provides conditions when a typical voter is surprised or not. In fact, there are some counterarguments claiming that some of these elections cannot be called ‘surprising’ given a correct model of voter perception (e.g., Economist (2016) for Brexit).

We adopt a Bayesian approach to address the question of surprise that considers the structure generation and voter perception jointly. We assume a random generative model of the voters and the social network, and show that an error in estimating the parameters of the generative process may lead to surprises.

### 1.1 Our Approach and Results

Let us define the voter generation and social network formation process a bit more formally. Consider a set of candidates and voters. A class of a voter is identified by a specific linear order over the candidates – hence there are classes. Each voter is picked i.i.d.

from a fixed probability distribution of belonging to a class. Once the voters are generated, social network among the voters are formed according to a stochastic block model. This is a general version of an Erdös-Renyi random graph model, where the vertices are partitioned into classes and the edge creation probabilities (which can be different) are defined only among the classes – hence every node of a class connects to every other node in another class with the same probability. In our model, an intra-class connection probability

is assumed to be larger than an inter-class connection probability (where and are indices for classes). For a specific voting rule , e.g., plurality, and a realization of the voters denoted by the set , there is a winner which we represent using . Since every voting rule we consider are anonymous, i.e., winner does not change even if the voter identities are changed, the winner is determined just by the number of voters in each class. Therefore, in can be replaced by , where is the number of voters in class . The perceived winner of voter is dependent on her estimates of the number of voters in different classes, denoted by , and is given by . Voter is surprised when . We call surprise to be the probability of this event. Voter estimates by taking the ratio of her observed neighbors of class with her estimated connection probability with class . This estimation neutralizes her observation bias had the estimates been perfect.

With this setup, our first result (Theorem 2) shows that for , if a ratio of the estimated connection probabilities stay within a threshold, a voter is not surprised with high probability (i.e., surprise asymptotically approaching zero as ). However, if the threshold is crossed, the voter is surprised w.h.p. A corollary of this result is that if the original distribution of the voters was very biased towards one class (‘overwhelming majority for one candidate’), then, even with erroneous connection probability estimates, a voter will never be surprised w.h.p. This result shows that voters’ perception error is not solely responsible for surprise. Together with Example 1, we conclude that social connection and voter perception are intertwined reasons for surprise in elections.

Having observed that surprise is a phenomenon of a closely contested election, we generalize our results for more than two candidates. As a first approach, we present the case with three candidates in §4.1. However, the method clearly generalizes with similar assumptions to similar conclusions with more candidates. Unlike the case with two candidates, for three candidates, one can consider different voting rules and compare their performances w.r.t. surprise. We consider three prominent voting rules (that are scoring rules). Our next result (Theorem 4) shows that for different classes of voters, different rules perform better in terms of surprise – and hints that there may not be a single surprise-optimal voting rule for all classes of voters. However, we find it interesting that the performance is not proportional to the distribution of the mass in the scoring rules since in certain class of the voters, both plurality and veto perform better than Borda voting. All voting rules are explained when presented.

Though the theoretical results in §4 use the estimates of the connection probabilities and show that the correctness of those estimates w.r.t. the true values may surprise a voter, we do not explicitly mention how the voters arrive at these estimates. In §5, we consider a real dataset (UK-EU referendum, a.k.a. Brexit) and consider a realistic model of network formation and voters’ winner anticipation, that is a realistic instantiation of our theoretical model. We investigate the effect of intra and inter-class connection probabilities, and the effect of noisy observation of their estimates on surprise. We find that the conclusions in those results show a resemblance with some of the theoretical predictions. We present the proofs in an online appendix (Authors2018) due to page limitation.

## 2 Related Work

Public elections and their outcomes had been one of the cornerstones of research in social choice theory. In the computational social choice and multi-agent systems literature, there had been several notions to measure the ‘goodness’ of elections. For example, margin of victory, defined as the smallest number of voters who can alter the outcome of an election by voting differently (Xia, 2012; Dey and Narahari, 2015), provides a quantitative threshold of surprising outcomes in terms of the voter population. A related literature exists for bribery in election (Faliszewski et al., 2006; Elkind et al., 2009; Mattei et al., 2013; Bredereck et al., 2016, e.g.) and complexity of manipulative attacks (Bartholdi et al., 1989; Conitzer et al., 2007; Faliszewski et al., 2014; Parkes and Xia, 2012, e.g.). Surprise in election, to the best of our knowledge, has not been formally studied in this literature. There is a relevant body of literature in political economy. Ely et al. (2015) formally define suspense and surprise in a dynamical model and provide a design approach to maximize either of them for a Bayesian audience. The motivation for the dynamical model comes from the examples of mystery novels, political primaries, casinos, game shows, auctions, and sports. Our definition of surprise (the outcome is contrary to a voter’s belief) is closely related in spirit, and is adapted to a single-shot decision. In sports tournaments, it is important to design the schedule so that the games are highly competitive and results are unpredictable (Dagaev and Suzdaltsev, 2015; Olson and Stone, 2014). In fact, information design, where a social planner aims to maximize the unpredictability of a contest has been investigated in various contexts (see, e.g., a recent survey by Bergemann and Morris (2017)). But in election outcomes stability is of prime importance (Pattanaik, 1973; Dummett and Farquharson, 1961; Rubinstein, 1980). The social connection model in our paper is inspired by stochastic block model. This model has a long tradition of study in the social sciences and computer science (Karrer and Newman, 2011; Holland et al., 1983; Wasserman and Faust, 1994). Therefore, in this paper, we approach the question of surprise in election using well studied models of social connection and surprise, and introduce a voter perception model to present insightful results.

## 3 Model

Let . Let be the set of voters, and be the set of candidates. Every voter has an ordinal preference over the candidates, and we assume that these preference relations are total orders, i.e., transitive, anti-symmetric, and complete. We assume , which is representative of real elections. Since the number of preference orders can be at most , we partition the voters into disjoint classes identified by , with being the indices of the classes. Voters in a given class share the same preference order. Let

denote the vector of the number of voters in each class. With a slight abuse of notation, we will refer to the preference of the voters in

also with the same notation.

Every voter is associated with class with probability independently from other voters, where , and . We assume that the ’s are unknown to the voters. The association is represented by the mapping , which maps the voter identities to the class indices. A random social network is formed with these voters by a stochastic block model which is represented by a symmetric matrix , where denotes the connection probability between the classes of voters and . In this connection model, the probability of connections for every voter in a class with every voter in another class is identified by a single parameter, which may change for a different pair of classes. The resulting graph is denoted by , where is the edge set. The edge creation process is independent among each other and also is independent with the voter-to-class association process. We assume a regularity among the connection probabilities for which we need to define a distance metric.111A valid distance metric is one that is (1) non-negative, (2) symmetric, and (3) obeys triangle inequality. The Kendall-Tau (KT) distance between two preference orderings and is the minimum number of adjacent flip of candidates needed to reach one from the other. Clearly, this is a valid distance metric. We call the ’s regular if they are monotone decreasing with increasing KT distance between and – which means that the voters with more dissimilar preferences are less likely to be connected. We assume that a voter knows the preferences of her immediate neighbors (on the social network) perfectly, but does know the preferences of the other voters.

A voter estimates these connection probabilities which are denoted by for all . We assume that the voters’ estimated ’s are also regular. At this point, we do not assume a model on how the voters reach their estimates. In §5, we consider a specific model of estimates for the experiments where voters take weighted average of their own observations and a noisy version of the true global distribution. The next section deals with how the errors in these estimates can affect a voters perception of the winner. We will consider only deterministic voting rules.

Voters’ winner perception model: Voter estimates the number of voters in class by dividing the number of her own neighbors in that class on , defined as , with her estimated . Hence voter ’s estimated number of voters in class is,

 ^Nkv=⎧⎪⎨⎪⎩1^pσ(v)k|%Nbrkv| if k≠σ(v),1^pσ(v)σ(v)|Nbrσ(v)v|+1 % otherwise, (1)

Note that if the

’s were accurate, by strong law of large numbers, this estimate gives the right number of voters in each class asymptotically

almost surely.

The voters now have randomly realized preferences and connections with each other. Also, every voter has an estimate of the number of voters in different classes, and therefore, under a given (anonymous) voting rule , her perceived winner is denoted by , where . The true winner for the same realization is denoted by . A voter is surprised when her perceived winner is different from the true winner, defined formally as follows.

###### Definition 1 (Event of Surprise)

An event of surprise of a voter for a specific realization of the voter preferences and social graph is the event where the voter’s perceived winner is not the true winner, i.e., the event such that,

 Srv:={wP(^Nv,r)≠wT(→N,r)}. (2)

We will call the probability of this event as surprise of voter under voting rule , denote by .

Note that, the event of surprise is specific to a voter, but every voter in a given class has same surprise in this model, while voters in different classes may have different surprises for the same parameters.

Metric to compare voting rules: Let the event of some candidate beating the true winner be defined as . The event of surprise, therefore, can be written as . For the chosen parameters, define the most probable false beating candidate as , with ties broken arbitrarily. Using the union bound and the fact that the probability of an union of events is always larger than that of the largest probability of the individual events, we get,

 P(Srv)=surprv∈[ℓrv,(m−1)ℓrv], where ℓrv=P(Beatrv(brv∗,wT(→N,r))). (3)

It is enough to analyze the event and consider the quantity , which we will call the most probable false beating (MPFB) factor, to compare between different voting rules, since surprise can vary at most by a constant factor of this MPFB factor. In the following sections, we will see that the effect of the number of voters on this factor is in the exponent. Since the number of voters is large, the conclusions on surprise are entirely dictated by the growth or decay of the MPFB factor.

## 4 Theoretical Results

In this section, we first analyze the setting with two candidates to get a better insight. The set of candidates is and the classes are and . WLOG, we assume that and with . For two candidates, all standard voting rules yield the same winner as the plurality rule, and therefore, we will be considering only plurality in the case of two candidates. We first show that candidate emerges as winner in plurality w.h.p.

###### Theorem 1

When voters fall in class and w.p.  and respectively, with , for sufficiently large .

Proof:  Let denote the number of voters in . Hence

 Xi=∑v∈NI{v∈Pi}, i∈.

Define,

 Z:=X2−X1=∑v∈N[I{v∈P2}−I{v∈P1}]=:∑v∈NZv.

Where are i.i.d. RVs taking values w.p.  and w.p. . Clearly, . We see that . Using Hoeffding bound, we get

 Pr(Z−EZ⩾t)⩽e−t22n.

Pick . Then for , . Hence, for , we get

 Pr(wT(→N,Plu)=a2)⩽Pr(Z⩾0)⩽Pr(Z⩾EZ+n3/4)⩽e−√n2.

Since candidate turns out to be the true winner w.h.p., we will consider only the conditional probability that is the perceived winner given being the true winner, which will approximately be equal to surprise for large .

###### Theorem 2 (Surprise for two candidates)

When voters fall in class and w.p.  and respectively, with , we have the following.

• For voter in ,

• if , then for large enough ; hence, , i.e., voter is surprised w.h.p.

• if , then for large enough ; hence, , i.e., voter is not surprised w.h.p.

• For voter in ,

• if , then for large enough ; hence, , i.e., voter is surprised w.h.p.

• if , then for large enough ; hence, , i.e., voter is not surprised w.h.p.

Proof:  We prove the result only for the case when , since the other case is symmetric. Define . Let the random graph formed according to the stochastic model is denoted by . For , let be the set of voters denoting the neighbors of that belong to class . Hence, ’s estimated number of voters in classes and are and respectively. The additional one voter in the estimate of comes from voter counting herself. Hence

 |X1|^p21 =1^p21∑u∈NI({(vu)∈E}∩{u∈P1}), (4) |X2|^p22 =1^p22∑u∈N∖{v}I({(vu)∈E}∩{u∈P2}). (5)

Taking expectations over these quantities, we get,

 E(|X1|^p21) =1^p21∑u∈NP(u∈P1)⋅P((vu)∈E) | u∈P1)=n θ p21^p21 and, E(|X2|^p22) =1^p22∑u∈N∖{v}P(u∈P2)⋅P((vu)∈E) | u∈P2)=(n−1) (1−θ) p22^p22.

Define a new random variable,

. Its expectation is

 EZ =(n−1)(1−θ)p22^p22+1−nθp21^p21 =(n−1)(12−ϵ)p22^p22+1−n(12+ϵ)p21^p21 =n[((12−ϵ)p22^p22−(12+ϵ)p21^p21)+1n(1−(12−ϵ)p22^p22)]. (6)

We first consider the case when .

The first term in the bracket in Equation 6 is negative since , by assumption. Let . Hence the whole expression of Equation 6 is negative for . Hence, is negative for sufficiently large . Note from Equations 5 and 4 that can also be written as the sum over the differences of the indicator functions. We will use Hoeffding’s bound since the random variables in the sum are independent. The maximum of every term in that sum of indicators that represent can be and the minimum can be , hence the maximum difference between each of the summands is . We have,

 Pr(wP(^Nv,Plu)=a2 | wT(→N,Plu)=a1)≤Pr(Z−EZ>t) ⩽e−2(^p22^p21^p22+^p21)2⋅t2n. (7)

Plugging in , we get that the probability of is at most . Let . The number is guaranteed to exist since , by assumption. Therefore for all , is greater than a negative quantity with probability at most . Since , we have that , .

We now consider the case when . We leverage the calculations we did for the previous case. Because of the assumption , is positive for large (Equation 6). Using Equation 7, we have,

 Pr(wP(^Nv,Plu)=a2 | wT(→N,Plu)=a1)≤Pr(|Z−EZ|⩽t)⩾1−2e−2(^p22^p21^p22+^p21)2⋅t2n.

This implies that the probability of is at least the quantity on the RHS of the above inequality. Again, plugging in and defining , which is guaranteed to exist by assumption, we get the desired conclusion for all . This completes the proof.

#### Corollaries.

Theorem 2 captures the determining factors for surprise in plurality voting. Few conclusions are in order.

1. If an agent’s estimated ’s were perfect, then the agent is never surprised w.h.p., since then the ratios will always satisfy the ‘not surprised’ condition of Theorem 2.

2. Surprise may happen when is small, i.e., the winning margin is small. This is because, the surprise-determining thresholds for s in Theorem 2 are very close to the actual ratios s and a small error of the voter in estimating these connection parameters may lead to surprise. However, when the winning margin is large, e.g., is large enough such that and if the ’s are also regular, i.e., , then no agent in will be surprised. This shows that elections with an overwhelming majority can hardly be surprising. Surprise is a phenomenon only of a closely contested election.

### 4.1 Three Candidates

We now consider the problem with three candidates. In this setting, different voting rules give rise to different winners and therefore it is possible to distinguish them w.r.t. the surprise metric. In this section, we will compare three voting rules, namely plurality, Borda, and veto (explained below), based on the factor (Equation 3) because of the reason explained right after the equation in §3. A collection of -dimensional vectors with and for every defines a voting rule (called scoring rule) — a candidate receives a score of from a vote if it is placed at the -th position in that vote, and the score of a candidate is the sum of the scores it receives from all the votes. The winners are the candidates with the maximum score. The score vectors for the plurality, Borda, and veto voting rules are , , and respectively. Scoring rules remain unchanged if we multiply every by any constant and/or add any constant . Hence, we assume without loss of generality that, for candidates, the Borda score vector is and the veto score vector is to ensure that for all the rules.

We chose these three voting rules because (1) they are most frequently used, and (2) the distribution of scores in these rules has wide variety – the whole score concentrated at the top alternative for plurality, (almost) equally distributed for veto, and in between these two extremes for Borda.

For two candidates, we have seen that surprise occurs only in closely contested elections. Hence to compare the voting rules in this section, we consider that the voters are uniformly distributed over the

preference classes.

###### Assumption 1 (Uniform Population)

Every voter belongs to exactly one class of preference in with uniform probability.

We also assume that the voters’ estimates of the connection probabilities are consistently higher than their true values as the KT distance increases between the preference class of the voter and the class of her neighbor, i.e., ’s are decreasing in . The motivation is to capture the fact that people often consider their local neighborhood to be representative of the global population, leading to an uniform ’s for all . Since the true connection probabilities are regular, i.e., decreasing in , it gives rise to a monotone estimation error.

###### Assumption 2 (Monotone Estimation Error (MEE))

The ratio of the true connection probability to the estimated one decreases with the KT distance, i.e., when when .

In the proof of our main result in this section, we will use a quantitative version of the central limit theorem due to

Berry (1941) and Esseen (1942). The following exposition is from Tao (2010).

###### Theorem 3 (Berry-Esseen)

Let be a random variable with mean

, unit variance, and finite third moment. Let

, where ’s are i.i.d. copies of . Then we have , uniformly for all , where , and the implied constant in is absolute and does not depend on the distribution of .

This theorem gives a quantitative guarantee on the deviation of the cumulative distribution function of the random variable

from that of a normal random variable with mean same as and unit variance.

With the assumptions as mentioned above, we present our main result for three candidates in the following theorem. Informally, this theorem compares plurality, Borda, and veto voting rules based on MPFBfactor. Since (Equation 3), we conclude that a lower MPFB factor gives a lower surprise.

###### Theorem 4

Consider , and voters are generated from an uniform population. Let be any voter.

1. If ranks the true winner at the first position, then w.h.p.

2. If ranks the true winner at the second position, then w.h.p.

3. If ranks the true winner at the last position, then and w.h.p.

Discussion: This result gives us a fine grained information regarding the performance on surprise of different voting rules in different voter classes. It is also clear that among these standard voting rules there is no single rule that reduces surprise for all sections of voters. But we find it interesting that the performance on surprise is not proportional to the distribution of scores in the rules, since in case (iii), Borda, that has non-extreme distribution of scores performs worse than both the other two rules having extreme score distributions.

Brief sketch of the proof: We assume WLOG, that a specific candidate wins w.h.p. We consider each of the three cases in the theorem separately. For every case, we prove the claim in three stages. First, we consider the two ‘false beating’ events where the true winner is not

the perceived winner – for which we consider the difference in the overall scores (as we are considering only scoring rules) of the other two candidates with that of the true winner. Second, to compute the probability that these two expressions are positive (which implies that these are the false beating events), we find the mean and variance of these expressions and normalize the difference expression with the standard deviation so that the Berry-Esseen theorem (

Theorem 3) can be invoked. Finally, we find the maximum of the two probabilities of false-beating events to compute the for that voting rule. The claim is proved by comparing these factors.

Proof:  Let . We label the classes as shown in Table 1. Each voter belongs to class w.p. in the uniform population model (Assumption 1). WLOG, assume that the candidate wins the election w.h.p., i.e., the overall score is highest for in every rule, and ties are broken in favor of .

Let be a normalized scoring rule vector with and . Hence, the vector is , , and respectively for , , and . For a voter , let be the random variables denoting the estimated scores for the candidates and perceived by .

For every rule and voter , we are interested in the differences of these estimated scores, i.e., , since a positive value of this expression implies that a false beating event has occurred. The maximum probability of these two events is .

With the voters’ winner perception model, each of these estimated scores of can be written as a sum over the indicator RVs that another voter belong to a specific preference class and they are connected to (with appropriate scaling with if and the other voter is in ). Hence, we can write the difference in the estimated scores as and , where we clearly distinguish the contribution of voter in the differences with the variable . We denote the summation on the RHS in each equality with the shorthand . The expression (resp.

) is the indicator random variable denoting voter

’s contribution to the difference in the score of (resp. ) and if is connected to . We detail out the exact expressions of when we consider the following cases.

Case 1: or (i.e., when ranks the winner at the second position): We only consider , since the analysis for is symmetric. For , the expression of turns out as follows for .

 Xu,a1−a2=(s1−s2)(1^p11I({(u,v)∈E}∩{u∈P1})−1^p12I({(u,v)∈E}∩{u∈P3}))+s2(1^p12I({(u,v)∈E}∩{u∈P5})−1^p12I({(u,v)∈E}∩{u∈P6}))+s1(1^p12I({(u,v)∈E}∩{u∈P2})−1^p12I({(u,v)∈E}∩{u∈P4})). (8)

Note that these are i.i.d. random variables for , whose mean and variances are as follows.

 E[Xu,a1−a2]=(s1−s2)(\nicefracp116^p11−\nicefracp126^p12)⩾0

We get the equality due to Assumption 1 and the inequality due to Assumption 2. We also have

 E[X2u,a1−a2]=(s1−s2)2(\nicefracp116^p211+\nicefracp126^p212)+(s21+s22)\nicefracp123^p212. (9)

Hence

 \omit\span\omit\span\omit\@ADDCLASSltxeqnlefteqn$var(Xu,a1−a2)=E[X2u,a1−a2]−(E[Xu,a1−a2])2$=(s1−s2)2(p116^p211+p126^p212−(p116^p11−p126^p12)2)+p123^p212(s21+s22).