Who is in Your Top Three? Optimizing Learning in Elections with Many Candidates

06/19/2019 ∙ by Nikhil Garg, et al. ∙ Stanford University 0

Elections and opinion polls often have many candidates, with the aim to either rank the candidates or identify a small set of winners according to voters' preferences. In practice, voters do not provide a full ranking; instead, each voter provides their favorite K candidates, potentially in ranked order. The election organizer must choose K and an aggregation rule. We provide a theoretical framework to make these choices. Each K-Approval or K-partial ranking mechanism (with a corresponding positional scoring rule) induces a learning rate for the speed at which the election correctly recovers the asymptotic outcome. Given the voter choice distribution, the election planner can thus identify the rate optimal mechanism. Earlier work in this area provides coarse order-of-magnitude guaranties which are not sufficient to make such choices. Our framework further resolves questions of when randomizing between multiple mechanisms may improve learning, for arbitrary voter noise models. Finally, we use data from 5 large participatory budgeting elections that we organized across several US cities, along with other ranking data, to demonstrate the utility of our methods. In particular, we find that historically such elections have set K too low and that picking the right mechanism can be the difference between identifying the ultimate winner with only a 80



There are no comments yet.


page 8

page 13

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Elections and opinion polls with many candidates and multiple winners are common. In participatory budgeting (PB), for example, people directly determine a part of the government’s budget [Alós-Ferrer and Granić2012, Goel et al.2016]. These elections often contain many candidate projects (up to 70, cf. gelauff_comparing_2018 gelauff_comparing_2018) and only a few thousand voters, with potentially millions of dollars on the line [Public Agenda2016]. Similarly, polls may compare tens of candidates and yet only sample hundreds of voters.

Unfortunately, the number of voters required to correctly recover the asymptotic ranking or set of winners often scales, potentially exponentially, with the number of candidates [Caragiannis and Micha2017]. Thus with many candidates, it is essential to use a voting mechanism that most efficiently elicits information from each voter.

In this work, we analyze positional scoring rules [de Borda1781, Young1975], mechanisms in which each position in each voter’s personal ranking maps to a score given to the candidate that occupies that position. We focus on the special cases of such rules implied by -Approval elicitation, in which each voter is asked to select their favorite candidates, as they the most commonly used such mechanisms in practice. Section 3 formalizes our model. Then:

Section 4.

For a given election, we show how the particular scoring rule used affects the rate at which the final outcome (asymptotic in the number of voters) is learned. These rates, based on large deviation bounds, extend and tighten the results of caragiannis_learning_2017 caragiannis_learning_2017, and are precise enough to determine, for example, which of -Approval and -Approval is better in a particular context. We focus on the goals of learning both a ranking over all candidates and identifying a subset of winners.

Section 5.1.

Leveraging these rates, we study when randomization between scoring rules can improve learning, extending previous results to general positional scoring rules, the goal of selecting a set of winners, and arbitrary noise models. In particular, we find that randomizing between scoring rules can never speed up learning, for arbitrary noise models. This contrasts to the case when one is restricted to -Approval mechanisms.

Section 5.2.

For the Mallows model, we study how the optimal in -Approval scales with the noise parameter, the number of candidates, and the number of winners desired. We find that, in contrast to design choices made in practice, one should potentially ask voters to identify their favorite half of candidates, even if the goal is to identify a single winner.

Section 6.

We apply our approach to experimental ballots attached to real participatory budgeting elections across several US cities, as well as other ranking data from a host of domains. We find that the exact mechanism used matters: in one setting, for example, asking voters to identify their favorite candidate results in only a chance of identifying the best candidate after 400 voters, while asking voters for their favorite candidates identifies the same best candidate of the time. Extending our theoretical insights, we find that, historically across elections, has been set too low for effective learning. We further identify real-world examples in which randomization would have sped up learning.

Our work bridges a gap between coarse theoretical analyses of voting rules and the fine-grained design questions a practitioner wishes to answer. Proofs are in the Appendix.

2 Related work

Our work is part of several strands of research on mechanisms that elicit peoples’ preferences. Aggregating voter rankings has a long history [de Borda1781, marquis de Condorcet1785, Copeland1951, Kemeny1959, Young1988].

Learning properties of voting rules

. Most related are works that study the learning properties of voting rules, assuming that a “true” ranking. One approach is to specify a noise model under which voter preferences are drawn (e.g., Mallows, Plackett-Luce) and then derive error rates by the number of voters for maximum likelihood or similar estimators under the model 

[Maystre and Grossglauser2015, Zhao, Piech, and Xia2016, Lu and Boutilier2011, Guiver and Snelson2009, Procaccia and Shah2015, de Weerdt, Gerding, and Stein2016, Chierichetti and Kleinberg2014].

caragiannis_when_2013 caragiannis_when_2013 ask similar questions to us: under what voter noise models do certain voting rules asymptotically recover the true underlying ranking, and how quickly do they do so. They define a class of voting rules and voter noise models under which a “true” ranking of candidates is eventually recovered. They further show that for a subset of this class (that does not contain positional scoring rules) and under the Mallows model, only a number of voters that is logarithmic in the number of candidates is required, where each voter provides a full ranking. lee2014crowdsourcing lee2014crowdsourcing develop an algorithm that can approximate the Borda rule, given a number of comparisons by each voter that is logarithmic in the number of candidates.

Most similar is that of caragiannis_learning_2017 caragiannis_learning_2017. They show that under the Mallows model, -Approval with any fixed takes exponentially many voters (in the number of candidates) to recover the underlying ranking; on the other hand, -approval with chosen uniformly at random for each voter takes only a polynomial number of voters.

These works provide order estimates for the learning rate, asymptotic in the number of candidates; fine-grained differentiation between different rules or -Approval mechanisms for a given election is not possible. We provide the latter and show that it matters.

Other approaches to comparing mechanisms. Many works take an axiomatic and computational approach, comparing mechanisms that may produce different outcomes even given an asymptotic number of votes [Fishburn and Gehrlein1976, Fishburn1978, Staring1986, Tataru and Merlin1997, Wiseman2000, Ratliff2003, Elkind et al.2017, Aziz et al.2015, Caragiannis et al.2017, Aziz et al.2017, Lackner and Skowron2018a, Lackner and Skowron2018b, Faliszewski and Talmon2018]. caragiannis2019optimizing caragiannis2019optimizing for example show how to find a scoring rule that most agrees with a partial ground truth ranking. In contrast, we compare mechanisms’ learning rates under a condition (formalized in Section 3.2) in which they produce the same asymptotic outcome.

benade_efciency_2018 benade_efciency_2018 and gelauff_comparing_2018 gelauff_comparing_2018 experimentally compare different mechanisms across several dimensions, including ease of use and consistency with another mechanism; the latter leverages data from a participatory budgeting election at a university.

Large deviation analysis of elicitation mechanisms. Theoretically, we leverage large deviation rates and Chernoff bounds to derive how quickly a given scoring rule learns its outcome; see work of dembo_large_2010 dembo_large_2010 for an introduction to large deviations. This work is thus conceptually similar to work on elicitation design for rating systems  [Garg and Johari2018, Garg and Johari2019]. In those works, the authors derive large deviation-based learning rates that depend on the questions that are asked to buyers as they review an item, where the goal is to accurately rank items; they further run an experiment on an online labor platform. In that setting, however, buyers rate a single item, and mechanisms are distinct based on the behavior they induce; in this work, voters see all the candidates and provide a partial ordering, and different designs (e.g., -Approval vs -Approval) constrain the types of orderings voters can provide.

3 Model

We now present our model and a condition under which different positional scoring rules are comparable.

3.1 Model primitives

We begin with the model primitives: candidates and voters, the election goal, and elicitation and aggregation.

Candidates and Voters. There is a set of candidates , typically indexed by . There are voters . Each voter has a strict ranking of candidates , drawn independently and identically from probability mass function over strict rankings . Let denote that is preferred over in , and denote that candidate is in the th position in .

A special case for is the Mallows model [Mallows1957], in which there is a “true” societal preference from which each voter’s ranking is a noisy sample. In particular,

Where is the Kendall’s distance between rankings , and is the noise parameter: the smaller it is, the more concentrated is around .

Election goal. We assume that the goal is to divide the candidates into disjoint, ordered tiers , such that , where candidate is deemed societally preferable over if . The size of each tier is fixed before the election. For example, recovering a strict ranking over all candidates corresponds to , where . Alternatively, identifying a set of winners, without distinguishing amongst the winners, corresponds to , with .

In the main text and especially the empirics, we will focus on the task of selecting winners as it is the most common task in practice. However, this general notation allows comparison of the learning properties of different settings, and for example ask how much more expensive is it (in terms of the number of voters needed) to identify a strict ranking as opposed to just a set of winners.

Elicitation and Aggregation. Voters vote using an elicitation mechanism. Their votes are then aggregated using a positional scoring rule, parameterized as . We consider the following mechanisms:


Voter ranks her favorite candidates, i.e., reveals . Candidate then receives a score if ranked, otherwise. For example, for the Borda count.111In Borda, candidates not ranked receive a score , consistent with assuming they are all tied in position .


Voter selects her favorite candidates, i.e., reveals . A candidate receives a score for being selected, otherwise.

encodes both elicitation and aggregation. For example, -Approval is equivalent to -ranking with score function . Furthermore, note that given -ranking data, one can simulate -ranking elicitation for with a s.t. for .

The scoring rule is a design choice made by the election organizer, and so we will refer to as the election’s design. We restrict ourselves to non-constant, non-increasing scoring rules, i.e., .

Outcome. After voters, candidate ’s cumulative score is . Candidates are ranked in descending order of score, to form ranking , with ties broken uniformly at random. We denote the intermediary outcome, corresponding to the goal , as . For example, for the goal of selecting winners, is simply the top candidates in . When is clear from context, we will refer to the outcome as .

As the number of voters , candidate scores

by the law of large numbers; when such expected scores are distinct, i.e.,

for , then for some ranking . However, note that there may exist an asymptotic outcome even without an asymptotic ranking , as long as expected scores and goal are such that candidates with identical expected scores are sorted into the same tier.

3.2 Asymptotic design invariance

The asymptotic outcome of an election may vary with the scoring rule . For example, there may be a different winner if voters are asked to identify their favorite two candidates than if they identify their single favorite candidate, if the winner in the latter case is a polarizing candidate. As an axiomatic comparison between outcomes is out of the scope of this paper, we restrict our attention to cases where all “reasonable” choices of different asymptotically result in the same outcome (where “reasonable” corresponds to the set of scoring rules defined above).

Definition 1.

A setting is asymptotically design-invariant for goal if the outcome is asymptotically the same under any reasonable .

Such design invariance only occurs under a fairly strong condition on the voter preference distribution: that the candidates can be separated into tiers (according to goal ) such that candidates in higher tiers are strictly more likely to be ranked by a voter in the top positions, for all , than are candidates in lower tiers.

Proposition 1.

A setting for goal is asymptotically design-invariant if and only if there exists candidate tiers (corresponding to ) s.t. : , .

Note that this condition is stronger than stochastic dominance as the inequality is strict for every position .

This proposition connects to caragiannis_when_2013 caragiannis_when_2013 as follows: they prove that many rules (including all positional scoring rules and the Bucklin rule) asymptotically recover the base ranking of a generalization of the Mallows model in which the probability of a ranking is monotonic in the distance , where distance function is itself in some general class that contains the Kendall’s distance. Their results directly imply that such noise models, including the standard Mallows model, are asymptotically design-invariant for any goal .

However, for goals where recovering a full ranking is unnecessary, the condition in Proposition 1 is weaker than the assumptions of caragiannis_when_2013 caragiannis_when_2013; there need not even be a single base ranking . For example, when such that we wish to select a set of winners, corresponding to a mixture of Mallows models – with all possible permutations of the candidates in the top positions in the base rankings – would still be design-invariant. Constructing a general class of ranking noise models that satisfies this property is an avenue for future work.

Assuming asymptotic design-invariance on voter preferences may seem restrictive. However, absent axioms – that are precise enough for design purposes – to prefer one scoring rule over another, the assumption allows us to proceed in a principled manner. We believe it is unlikely that such precise, satisfactory axioms exist generally. In the Appendix, we provide a simple example (similar to that of staring_two_1986 staring_two_1986) where -Approval and -Approval select disjoint sets of -Winners, and such examples can be adapted more generally to selecting winners from either -Approvals or -Approvals. In participatory budgeting with the goal of identifying 6-10 winning projects out of over twenty projects, it is unclear whether there is a principled reason to prefer -Approval over -Approval. However, such axioms would be an interesting avenue for future work.

Furthermore, in Section 6.2 we show that design invariance is often approximately satisfied in practice, especially for identifying a small set of winners, using data from a wide range of participatory budgeting and other elections.

4 Learning Rates and Optimal Design

Different elicitation and aggregation mechanisms may take different amounts of voters to learn the asymptotic outcome. For example, suppose we want to identify the worst candidate out of 100, where the voter’s rankings are drawn from a Mallows model with . Then, asking each voter to identify their single favorite candidate will eventually identify the worst candidate, but after many more voters than if we ask each voter to identify their least favorite candidate. We make such learning rates precise in this section. Our results in this section extend those of caragiannis_learning_2017 caragiannis_learning_2017 as discussed above, both to arbitrary positional scoring rules and by providing tighter bounds for how a scoring rule affects the convergence rate. These rates are precise enough to design scoring rules, for example comparing -Approval and -Approval in the above example.

4.1 Learning rates

We begin by deriving rates for how quickly a given positional scoring rule learns its asymptotic outcome (given it exists), as a function of the voter preference model . In particular, we use large deviation rates at which a scoring rule learns [Dembo and Zeitouni2010].

Definition 2.

Consider a sequence , where . Value is the large deviation rate for if

When exists, exponentially fast, with exponent asymptotically, i.e., is . These rates provide us both upper and lower bounds for the probability of an error or the number of errors in an outcome after voters, up to polynomial factors. In particular, in the propositions below, we will calculate the large deviation rate of errors in the outcome. We will also then provide (loose) upper bounds for such errors after voters that hold without any missing polynomial factors, for any . These upper bounds are equivalent to Chernoff bounds.

The particular forms for these rates, derived below for general noise models , may seem complex. However, they are useful both for theoreticians and practitioners. For example, in Section 5.1, we use the structure of such rates to resolve open questions regarding when randomization between mechanisms can help learn the outcome from votes drawn from an arbitrary noise model. In Section 6, we show that learning rates – even when empirically calculated – reflect the true behavior of errors in real elections with a small number of voters; we then use empirically calculated learning rates to draw design insights across elections.

Rates for separating two candidates. We now derive the large deviation learning rates for recovering the true ordering between a pair of candidates , given noise model . These rates will directly translate to the learning rate for the overall election, given some goal .

Proposition 2.

Fix scoring rule , voter distribution , and consider candidates such that . Then, the probability of making a mistake in ranking these two candidates after voters, , goes to zero with large deviation rate

Further, the following upper bound holds for any .

The proof follows directly from writing a random variable for the event of making a mistake after

voters and then applying known large deviation rates. This simplicity emerges because positional scoring rules are additive across voters.

The proposition establishes that – for a fixed number of candidates and voter noise model – the probability of making a mistake on any single pair of candidates decreases exponentially with the number of voters, at a rate governed by the scoring rule and the candidates’ relative probabilities of appearing at each position of a voter’s preference ranking. The rate is non-negative, and and larger values correspond to faster learning of the relative ranking of . Note that for notational convenience, we suppress in the argument for the rate.

For general , we cannot find a closed form for . However, the structure of this rate, in particular that of the argument in the , will directly let us show that randomization cannot help learning outcomes among positional scoring rules, for arbitrary noise models .

For -Approval voting, further, the rate simplifies.

Proposition 3.

Consider consistent with -Approval voting for some fixed , and candidates such that . Then the large deviation rate in Proposition 2 is

Where , i.e., the probability that a voter approves but not .

The proof follows directly from the structure of for -Approval, . Note that we overload notation and use directly in the argument. This rate function is convex in the probabilities ; this fact will let us show that randomization, even among -Approval mechanisms, cannot help learning the relationship of any pair of candidates.

Rates for learning the outcome. In general, the rates at which one learns each pair of candidates immediately translate to rates for learning the entire outcome .

Proposition 4.

Consider goal and such that . Let be the expected number of errors in the outcome after voters, . Then goes to zero with large deviation rate

Further, the following upper bound holds for any .

The large deviation rate thus provides a tight characterization for how many voters it takes to (with high confidence) recover the asymptotic outcome of an election. Note that the goal plays an important role: for selecting winners, for example, it is not important to learn the exact relationship among candidates , speeding up outcome learning. Design also matters; e.g., even amongst approval voting mechanisms, vs will produce substantially different . To derive learning rates for -Approval for any given noise model or using real-world data, one simply needs to calculate these values. We do so numerically for the Mallows model and empirically with real world data in Sections 5.2 and 6, respectively.

4.2 Optimal design and discussion

Now that we can quantify how quickly a given scoring rule learns its asymptotic outcome, we apply our framework to designing elections, i.e., choosing an optimal scoring rule . For the rest of this work, we assume that the setting is asymptotically design-invariant for the goal . Then, the design of an election only affects the rate at which the election converges to the asymptotic outcome , as calculated above. With no other constraints, then, the design challenge is simple: find the rate optimal .

Definition 3.

A scoring rule is rate optimal if it maximizes the rate in Proposition 4. -Approval is Approval rate optimal if it maximizes the rate among -Approval mechanisms.

Rate optimal designs learn the outcome faster than others in the number of voters, and so are preferable to other designs. What influences how quickly a design learns? must be small for negative , and so must be large when is large. In other words, a scoring rule must reward a candidate achieving a position in a voter’s ranking that is only achieved by asymptotically high-ranking candidates. For example, if it is common for worse candidates to be ranked second in a given voter’s ranking but not to be ranked first, then would be beneficial.

Note that finding such designs requires knowledge of the voter noise model , which in many settings may not be available before the election. However, next in Sections 5 and 6, we show that there are valuable insights that apply across elections, including how our approach has informed participatory budgeting deployments.

5 Theoretical Design Insights

(a) For selecting winner as number of candidates vary.
(b) For candidates as number of winners vary.
Figure 1: -Approval rate optimal mechanism for the Mallows model as , number of candidates, and number of winners vary.

The learning rates derived in the previous section provide election design insights, even before our approach is applied to real-world data. In particular, in this section, we first extend the previous literature on the (potential) benefits of randomizing between mechanisms. Then, we study the task of selecting winners using -Approval voting.

5.1 When does randomization help?

We now consider the question of whether randomizing between mechanisms in an election may speed up learning. By randomization, we mean: consider a set of scoring rules ; elicitation and aggregation for a given voter is done according to a scoring rule picked at random from , where is selected with probability .

Note that the learning rate of such randomized schemes can be calculated as before, by summing across inside the of or – for consisting only of -Approval votes – directly through the resulting probability that the voter approves but not . We use to denote the candidate pairwise and overall outcome learning rates, respectively, for randomized mechanism .

It is known that in some settings randomization improves learning, asymptotically in the number of candidates. caragiannis_learning_2017 caragiannis_learning_2017 provide an example in which randomizing uniformly between all possible -Approval mechanisms outperforms any static -Approval elicitation, when the goal is to rank all the candidates. Their insight is that, under the Mallows model and under a fixed , either the first two candidates will be hard to distinguish from each other, or the last two will, and randomizing between mechanisms balances learning each pair.

We now study randomization for the goal of selecting winners and for arbitrary positional scoring rules and voter noise models. Our first result is that randomizing between scoring rules does not help, for any voter noise model, in contrast to the case when restricted to approval votes.

Theorem 1.

Randomization does not improve the outcome learning rate for any asymptotically design-invariant noise model or goal . For any randomized scoring rule mechanism , where , for any , , the scoring rule satisfies .

The result follows from the fact that is convex in , for all . Then, given a randomization over , we can increase by decreasing its argument, by instead using the static scoring rule defined by the corresponding convex combination of . Note that such a negative result cannot be obtained via analysis that is asymptotic in the number of candidates; we need learning rates for a given election.

Next, we further refine the result of caragiannis_learning_2017 caragiannis_learning_2017, by showing that the “pivotal pair” feature of their example – where different pairs of candidates dominate the learning rate for different mechanisms – is key. In particular, our next result establishes, again for any noise model, that randomization amongst -Approval mechanisms cannot help separate any given pair of candidates.

Theorem 2.

Randomization amongst -Approval mechanisms does not improve the learning rate for separating a given pair of candidates for any asymptotically design-invariant noise model or goal . For any randomized -Approval mechanism , where corresponds to -Approval, for any , , there exists a mechanism -Approval such that .

The proof relies on the pairwise rate function being convex in the approval probabilities .

This theorem directly implies that, for the Mallows model, randomization among -Approval voting cannot speed up learning when the goal is to identify a set of winners, as opposed to when the goal is to rank.

Corollary 1.

Randomization among -Approval mechanisms does not improve the learning rate for selecting winners from the Mallows model. For any randomized -Approval mechanism , where is -Approval, for selecting winners from the Mallows model, there exists an Approval rate optimal mechanism -Approval such that .

The proof simply notes that under the Mallows model with this goal, the candidate pair (when candidates are indexed according to reference distribution ) is pivotal regardless of the -Approval mechanism used. This corollary does not extend to arbitrary noise models, where randomization amongst -approval mechanisms may improve the learning rate.

Theorem 3.

Randomization among -Approval mechanisms may improve the learning rate for the goal of selecting winners. There exist asymptotically design-invariant settings for the goal of selecting winners such that a randomized -Approval mechanism , where corresponds to -Approval, satisfies

We prove the result two ways: (1) we construct an example in which candidate is asymptotically selected, and candidates are not. Which of or is the pivotal pair (determines the overall rate function) depends on the -Approval mechanism used, and randomizing between two mechanisms improves the overall rate; (2) perhaps more interestingly, we find many examples in our real PB elections and other ranking data in which randomization would have sped up learning for the task of selecting a set of winning candidates (see Section 6.4).

5.2 -Approval for selecting winners

One of the most common voting settings is identifying a set of winners using -Approval, whether in representative democracy elections (typically ), polling for such elections (where the goal often is to identify the top few candidates out of many, especially in primary races). Here, we study how to design such elections, i.e., how to choose the best that maximizes the learning rate. For simplicity, we work with the Mallows model, extending the resulting insights to real-world data in the next section.

Recall that in a Mallows model, each voter’s ranking is a noisy sample from a reference distribution . With this model, one may believe that setting is always optimal. For example, when noise parameter and so each voter’s ranking is exactly , is optimal; in fact, any other design fails to correctly identify the set of winners even asymptotically: it would not distinguish among the first candidates in or among the last candidates. However, our next result establishes that the cases with are different.

Theorem 4.

Under the Mallows model and the goal of selecting winners, -Approval may not be Approval rate optimal.

We prove the theorem by example. To find this example and to generate the plots discussed next, we use an efficient dynamic program to exactly calculate the joint distributions of the locations

of pairs of candidates in a voter’s ranking, given the Mallows noise parameter; we can then directly calculate and thus the learning rate for each -Approval mechanism. This program leverages Mallows repeated insertion probabilities [Lu and Boutilier2014, Diaconis1988] and may be of independent interest for numerical analyses of the Mallows model.

Numerical analysis. We now numerically analyze, for the Mallows model, how the Approval rate approval -Approval mechanism varies with the Mallows noise parameter , the number of candidates , and the number of winners . Recall that the Mallows model is asymptotically design invariant, so different mechanisms only differ in how quickly they learn the asymptotic outcome.

In Figure 0(a), the goal is to select winner, and and are varied. With low noise, , it is rate optimal to use -Approval, i.e., ask each voter to select their favorite candidate, regardless of how many candidates there are. However, with higher noise , as the number of candidates in the election increases, so does the in the optimal -Approval mechanism. For , for example, it is best to ask each voter to select their favorite candidates, even if the task is to identify the single best candidate according to the reference distribution .

Similarly, Figure 0(b) shows how the rate optimal -Approval mechanism changes with the number of winners desired and the noise parameter, fixing the number of candidates at . Again with high noise, it is best to ask voters to identify their favorite half of candidates, regardless of how many winners need to be identified. With low noise, however, -Approval is optimal to select winners.

Overall, the analysis suggests that with higher noise in the voter model, one should tend toward asking voters to rank their favorite half of candidates, regardless of and .

The high-noise setting may seem unrealistic; however, as we will see in the next section, the rate optimal -Approval mechanisms in practice often scale like the high noise settings, consistent with the idea that voting distributions in practice do not look like they are drawn from a low-noise Mallows model. We now turn to such empirical analyses.

6 Empirics and PB deployments

(a) Boston 2016 PB election, selecting 1 winner: Average empirical bootstrapped error – i.e., fraction of times the asymptotic winner is selected (solid lines, left axis), compared to such errors over time implied by the (empirically calculated) learning rates – i.e., (dashed lines, right axis). The right axis is a vertically shifted (in log scale) version of the left axis, reflecting that the learning rate errors are asymptotically valid up to polynomial factors. All mechanisms return the same winner when all votes are counted. “Borda” is the Borda count for the 4 candidates ranked, and all others are assumed to be tied at rank 5 for each voter.
(b) Approximate design invariance across elections. For the task of selecting winners, this plot shows the average overlap in the top candidates identified by different mechanisms across all the elections in our dataset, if all voters with complete rankings are counted. For example, of the top 4 candidates identified by -Approval across elections, are also identified as top candidates by -Approval. For each -Approval mechanism, we include all elections where there were at least candidates.
Figure 2: Validating model: comparing learning rates to empirical error, and showing approximate design invariance.

We now apply our insights to practice. We focus on -Approval voting, as opposed to general scoring rules. This section is organized as follows. In Section 6.1, we describe our data sources. We validate our model in Section 6.2; first, we demonstrate that large deviation rates effectively describe how quickly various mechanisms learn; next, we show that in practice voter noise models are approximately design invariant. In Section 6.3, we show that the insights from Section 5.2 regarding optimal approval mechanisms extend to practice. Finally in Section 6.4 we note that we find many examples in practice where randomizing between -Approval mechanisms improves learning.

6.1 Data description

We leverage two data sources (detailed dataset information is in Appendix Table 1). First, we have partnered with dozens of local governments to help run participatory budgeting (PB) elections in the last five years. These elections have used a variety of methods, primarily -Approval; our data in this work comes from 5 elections where -Ranking was used, including 3 recent elections where . This data is particularly useful as PB is among the most common types of elections with many candidates and several winners, with several theoretical analyses [Goel et al.2016, Garg et al.2019, Freeman et al.2019].

Second, we use data available on PrefLib [Mattei and Walsh2013, O’Neill2013, Regenwetter et al.2007, Regenwetter et al.2008, Popov, Popova, and Regenwetter2014], limiting ourselves to 28 elections with at least 5 candidates and 700 voters who provided full rankings. This ranking data spans many domains, from people’s sushi preferences to Glasgow City Council elections. This domain breadth supports the broad applicability of the design insights explored in this section.

We focus on ranking data to be able to simulate counter-factuals for the same election: with -Ranking data, we can simulate what would have occurred with any -Approval elicitation mechanism, for (assuming no behavioral quirks). With approval data, on the other hand, one cannot compare the mechanism to any other for that given election.

One challenge is that ranking many candidates is onerous, and so voters rank at most 14 candidates in our dataset. For the data we use from on PrefLib, full rankings (rankings up to the number of candidates) are available. In the PB elections in our partner cities, typically each voter ranks or selects her favorite candidates.

6.2 Model validation

Our model and design approach has two components that must be validated: (1) that learning rates can effectively be used to compare different mechanisms, and (2) that design invariance (approximately) holds in practice.

Large deviation rates as effective proxies for learning. We now confirm that, for a given election, empirically calculated large deviation learning rates are effective proxies for the rate at which the error in recovering the asymptotic output decreases as the number of voters increases (even though large deviation learning rates are only asymptotically valid in the number of voters). As examples, we first identify three elections and goals for which many of the potential -Approval mechanisms return exactly the same asymptotic outcome. Then, we bootstrap voters from the available data of voters and empirically calculate the errors made in identifying the winning set of candidates. We further calculate the large deviation learning rates for these mechanisms, using implied by the voting data and the formula in Proposition 4.222Given an empirical , learning rates can be numerically calculated: the is a convex minimization problem.

Figure 1(a) shows the resulting errors over time for one such election where -Rankings are available. We further plot for each mechanism, i.e., the error over time implied by the learning rate (up to polynomial factors). This plot, along with Appendix Figure 3, yields several insights:

  1. [(1)]

  2. The mechanism matters: when selecting 1 winner from the election in Figure 1(a) after votes, there is chance of not picking the ultimate winner if -Approval is used. With or -Approval, this number is . The winner appears often in a voter’s top two or three positions (but not necessarily first), while the ultimate second place candidate often falls outside the top three. Scoring rules that reward top three placements thus performs well.

  3. The learning rates effectively capture the behavior of the empirical error: both comparatively across mechanisms, as well as the asymptotic rate (slope of the line in log scale). This property enables use of large deviation learning rates as proxies for learning even in elections with a small number of voters.

  4. Ranking candidates rather than selecting candidates is more onerous for voters. However, it does not always provide more information in terms of learning rates, as in the examples in Appendix Figure 3.

Design invariance in practice. Design invariance does not strictly hold in any election in our dataset (as expected as the condition is strong). However, it approximately holds. Similar mechanisms produce the same asymptotic outcome for many tasks. Figure 1(b) shows, for example, the average overlap across elections in the top candidates identified by each mechanism. (Appendix Figure 4 shows the same plot for the top and candidates, as well as the average Kendall’s rank correlation between the full rankings identified by different mechanisms.). Furthermore, we find many elections and goals where most mechanisms return the same asymptotic answer, as in the elections we leverage for the plots showing learning rates are effective proxies. This relative consistency, especially for similar mechanisms, enables us to compare different mechanisms by their learning rates.

6.3 -Approval for selecting winners

In Section 5.2, we showed for the Mallows model how the rate optimal -Approval mechanism changes with the noise parameter , the number of candidates, and the number of winners. We now show this scaling in practice.

For every election in our dataset, we find the Approval rate optimal mechanism (among we can simulate) for every goal of selecting winners, for . We then run a regression across all the elections for which is rate optimal, versus the number of winners desired and the number of candidates; see Table 2 in the Appendix for the regression table. While there is some variation across elections, the number of candidates and winners proves a reasonable metric across elections for the rate approval -Approval mechanism (). In Figure 5 in the Appendix, we further plot the line induced from the regression coefficients with the Mallows rate optimal lines, for candidates.

This comparison provides several insights. In particular, it confirms the idea that in practice, one should regularize toward asking voters to choose their favorite half the candidates. For picking a small subset of winners out of more than candidates, for example, one should ask voters to provide their favorite candidates, with . This suggestion directly counters common practice. In the PB elections that we have helped run, for example, or -Approval is most typical, even though ultimately 6-10 projects may be funded (out of around -).

Furthermore, comparing to the rate optimal mechanisms for the Mallows model with various (within the candidate range for which we have empirical data), we find that empirical data behaves most closely to a Mallows model with noise parameter .333We are not claiming that empirical data is drawn from a Mallows model; it most certainly is not, with factors such as polarizing projects important in practice. This coarse comparison provides an approximate expected scaling behavior for elections with many candidates.

6.4 Randomization in practice

We find 16 examples in which randomizing between two -Approval mechanisms leads to faster learning than using either mechanism separately, including 8 examples where such randomization beats the Approval rate optimal mechanism. Table 3 in the Appendix contains details.


We thank our Participatory Budgeting city partners, especially those in Boston, Durham, and Rochester. This work was supported in part by the Stanford Cyber Initiative, the Office of Naval Research grant N00014-15-1-2786, and National Science Foundation grants 1544548 and 1637397.


  • [Alós-Ferrer and Granić2012] Alós-Ferrer, C., and Granić, D.-G. 2012. Two field experiments on approval voting in germany. Social Choice and Welfare 39(1):171–205.
  • [Aziz et al.2015] Aziz, H.; Gaspers, S.; Gudmundsson, J.; Mackenzie, S.; Mattei, N.; and Walsh, T. 2015. Computational aspects of multi-winner approval voting. In Proceedings of the 2015 International Conference on Autonomous Agents and Multiagent Systems, 107–115. International Foundation for Autonomous Agents and Multiagent Systems.
  • [Aziz et al.2017] Aziz, H.; Brill, M.; Conitzer, V.; Elkind, E.; Freeman, R.; and Walsh, T. 2017. Justified representation in approval-based committee voting. Social Choice and Welfare 48(2):461–485.
  • [Benade et al.2018] Benade, G.; Itzhak, N.; Shah, N.; and Procaccia, A. D. 2018. Efficiency and Usability of Participatory Budgeting Methods.  8.
  • [Boyd and Vandenberghe2004] Boyd, S. P., and Vandenberghe, L. 2004. Convex optimization. Cambridge, UK ; New York: Cambridge University Press.
  • [Caragiannis and Micha2017] Caragiannis, I., and Micha, E. 2017. Learning a Ground Truth Ranking Using Noisy Approval Votes. In

    Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence

    , 149–155.
    Melbourne, Australia: International Joint Conferences on Artificial Intelligence Organization.
  • [Caragiannis et al.2017] Caragiannis, I.; Nath, S.; Procaccia, A. D.; and Shah, N. 2017. Subset selection via implicit utilitarian voting. Journal of Artificial Intelligence Research 58:123–152.
  • [Caragiannis et al.2019] Caragiannis, I.; Chatzigeorgiou, X.; Krimpas, G. A.; and Voudouris, A. A. 2019. Optimizing positional scoring rules for rank aggregation. Artificial Intelligence 267:58–77.
  • [Caragiannis, Procaccia, and Shah2013] Caragiannis, I.; Procaccia, A. D.; and Shah, N. 2013. When do noisy votes reveal the truth? In Proceedings of the fourteenth ACM conference on Electronic commerce, 143–160. ACM.
  • [Chierichetti and Kleinberg2014] Chierichetti, F., and Kleinberg, J. 2014. Voting with limited information and many alternatives. SIAM Journal on Computing 43(5):1615–1653.
  • [Copeland1951] Copeland, A. H. 1951. A reasonable social welfare function. Technical report, mimeo, 1951. University of Michigan.
  • [de Borda1781] de Borda, J. C. 1781. Mémoire sur les élections au scrutin.
  • [de Weerdt, Gerding, and Stein2016] de Weerdt, M. M.; Gerding, E. H.; and Stein, S. 2016. Minimising the rank aggregation error. In Proceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems, 1375–1376. International Foundation for Autonomous Agents and Multiagent Systems.
  • [Dembo and Zeitouni2010] Dembo, A., and Zeitouni, O. 2010. Large Deviations Techniques and Applications, volume 38 of Stochastic Modelling and Applied Probability. Berlin, Heidelberg: Springer Berlin Heidelberg.
  • [Diaconis1988] Diaconis, P. 1988. Group representations in probability and statistics. Lecture notes-monograph series 11:i–192.
  • [Elkind et al.2017] Elkind, E.; Faliszewski, P.; Skowron, P.; and Slinko, A. 2017. Properties of Multiwinner Voting Rules. Social Choice and Welfare 48(3):599–632.
  • [Faliszewski and Talmon2018] Faliszewski, P., and Talmon, N. 2018. A framework for approval-based budgeting methods. arXiv preprint arXiv:1809.04382.
  • [Fishburn and Gehrlein1976] Fishburn, P. C., and Gehrlein, W. V. 1976. Borda’s rule, positional voting, and Condorcet’s simple majority principle. Public Choice 28(1):79–88.
  • [Fishburn1978] Fishburn, P. C. 1978. Axioms for approval voting: Direct proof. Journal of Economic Theory 19(1):180–185.
  • [Freeman et al.2019] Freeman, R.; Pennock, D. M.; Peters, D.; and Vaughan, J. W. 2019. Truthful aggregation of budget proposals. arXiv preprint arXiv:1905.00457.
  • [Garg and Johari2018] Garg, N., and Johari, R. 2018. Designing informative rating systems for online platforms: Evidence from two experiments. arXiv preprint arXiv:1810.13028.
  • [Garg and Johari2019] Garg, N., and Johari, R. 2019. Designing optimal binary rating systems. In Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics.
  • [Garg et al.2019] Garg, N.; Kamble, V.; Goel, A.; Marn, D.; and Munagala, K. 2019. Iterative local voting for collective decision-making in continuous spaces. Journal of Artificial Intelligence Research 64(1):315–355.
  • [Gelauff et al.2018] Gelauff, L.; Sakshuwong, S.; Garg, N.; and Goel, A. 2018. Comparing voting methods for budget decisions on the ASSU ballot. Technical report.
  • [Goel et al.2016] Goel, A.; Krishnaswamy, A. K.; Sakshuwong, S.; and Aitamurto, T. 2016. Knapsack Voting: Voting mechanisms for Participatory Budgeting.
  • [Guiver and Snelson2009] Guiver, J., and Snelson, E. 2009. Bayesian inference for plackett-luce ranking models. In

    proceedings of the 26th annual international conference on machine learning

    , 377–384.
  • [Kemeny1959] Kemeny, J. G. 1959. Mathematics without numbers. Daedalus 88(4):577–591.
  • [Lackner and Skowron2018a] Lackner, M., and Skowron, P. 2018a. Consistent Approval-Based Multi-Winner Rules. In Proceedings of the 2018 ACM Conference on Economics and Computation, EC ’18, 47–48. New York, NY, USA: ACM.
  • [Lackner and Skowron2018b] Lackner, M., and Skowron, P. 2018b. A quantitative analysis of multi-winner rules. arXiv preprint arXiv:1801.01527.
  • [Lee et al.2014] Lee, D. T.; Goel, A.; Aitamurto, T.; and Landemore, H. 2014. Crowdsourcing for participatory democracies: Efficient elicitation of social choice functions. In Second AAAI Conference on Human Computation and Crowdsourcing.
  • [Lu and Boutilier2011] Lu, T., and Boutilier, C. 2011. Learning Mallows Models with Pairwise Preferences. In Proceedings of the 28th International Conference on International Conference on Machine Learning, ICML’11, 145–152. USA: Omnipress.
  • [Lu and Boutilier2014] Lu, T., and Boutilier, C. 2014. Effective sampling and learning for mallows models with pairwise-preference data. Journal of Machine Learning Research 15:3963–4009.
  • [Mallows1957] Mallows, C. L. 1957. Non-null ranking models. i. Biometrika 44(1/2):114–130.
  • [marquis de Condorcet1785] marquis de Condorcet, M. J. A. 1785. Essai sur l’application de l’analyse a la probabilite des decisions: rendues a la pluralite de voix. De l’Imprimerie royale.
  • [Mattei and Walsh2013] Mattei, N., and Walsh, T. 2013. Preflib: A library of preference data http://preflib.org. In Proceedings of the 3rd International Conference on Algorithmic Decision Theory (ADT 2013), Lecture Notes in Artificial Intelligence. Springer.
  • [Maystre and Grossglauser2015] Maystre, L., and Grossglauser, M. 2015. Fast and accurate inference of plackett–luce models. In Advances in neural information processing systems. 172–180.
  • [O’Neill2013] O’Neill, J. 2013. Open STV.
  • [Popov, Popova, and Regenwetter2014] Popov, S. V.; Popova, A.; and Regenwetter, M. 2014. Consensus in organizations: Hunting for the social choice conundrum in apa elections. Decision 1(2):123.
  • [Procaccia and Shah2015] Procaccia, A. D., and Shah, N. 2015. Is Approval Voting Optimal Given Approval Votes? In Advances in Neural Information Processing Systems 28, 1801–1809.
  • [Public Agenda2016] Public Agenda. 2016. Public Spending By The People: Participatory Budgeting in the United States and Canada in 2014–15. Technical report, The Yankelovich Center for Public Judgment.
  • [Ratliff2003] Ratliff, T. C. 2003. Some startling inconsistencies when electing committees. Social Choice and Welfare 21(3):433–454.
  • [Regenwetter et al.2007] Regenwetter, M.; Kim, A.; Kantor, A.; and Ho, M.-H. R. 2007. The unexpected empirical consensus among consensus methods. Psychological Science 18(7):629–635.
  • [Regenwetter et al.2008] Regenwetter, M.; Grofman, B.; Popova, A.; Messner, W.; Davis-Stober, C. P.; and Cavagnaro, D. R. 2008. Behavioural social choice: a status report. Philosophical Transactions of the Royal Society B: Biological Sciences 364(1518):833–843.
  • [Staring1986] Staring, M. 1986. Two paradoxes of committee elections. Mathematics Magazine 59(3):158–159.
  • [Tataru and Merlin1997] Tataru, M., and Merlin, V. 1997. On the relationship of the Condorcet winner and positional voting rules. Mathematical Social Sciences 34(1):81–90.
  • [Wiseman2000] Wiseman, J. 2000. Approval voting in subset elections. Economic Theory 15(2):477–483.
  • [Young1975] Young, H. P. 1975. Social choice scoring functions. SIAM Journal on Applied Mathematics 28(4):824–838.
  • [Young1988] Young, H. P. 1988. Condorcet’s theory of voting. American Political science review 82(4):1231–1244.
  • [Zhao, Piech, and Xia2016] Zhao, Z.; Piech, P.; and Xia, L. 2016. Learning Mixtures of Plackett-Luce Models. In International Conference on Machine Learning, 2906–2914.

Appendix A Empirics additional information

Name Candidates Votes with complete rankings K-Ranking available
Participatory Budgeting
Boston, 2016 8 4173 4
Durham Ward 1, 2019 21 1637 10
Durham Ward 2, 2019 10 329 10
Durham Ward 3, 2019 12 694 10
Rochester, 2019 22 649 5
Irish01 12 4259 12
Irish02 9 4810 9
Irish03 14 3166 14
ElectorialReformSociety77 12 1312 12
ElectorialReformSociety13 5 1809 5
Sushi10 10 5000 10
Glasgow05 10 718 10
Glasgow17 9 962 9
Glasgow10 9 818 9
Glasgow18 9 767 9
Glasgow20 9 726 9
Glasgow14 8 1071 8
Glasgow12 8 1040 8
Burlington01 6 2603 6
Burlington02 6 2853 6
APA03 5 11539 5
APA01 5 10978 5
APA11 5 10791 5
APA05 5 10655 5
APA02 5 10623 5
APA04 5 10519 5
APA09 5 10211 5
APA06 5 10177 5
APA07 5 9747 5
APA12 5 9091 5
APA08 5 8532 5
APA10 5 8467 5
Aspen02 5 1183 5
Table 1: List of election data that we use in Section 6. From PrefLib, we use all elections where full rankings are available and there are at least 5 candidates and 700 voters. Throughout, we ignore voters who did not submit full rankings (especially with high -Ranking requested, this might only be a fraction of the total number of actual votes). Additionally, for the PB elections, we limit the data to those who submitted votes online rather than through paper ballots.   Sources for the PrefLib datasets are: [Mattei and Walsh2013, O’Neill2013, Regenwetter et al.2007, Regenwetter et al.2008, Popov, Popova, and Regenwetter2014].
(a) Boston 2016, selecting 4 winners.
(b) Durham Ward 1, selecting 4 winners. -Approval for and the Borda rule all have the same asymptotic winners, but we omit several mechanisms from the plot for visualization ease.
Figure 3: Average bootstrapped error (fraction of winning subset not identified) by the number of voters, compared to the errors implied by the (empirically calculated) learning rates. All mechanisms plotted have the same asymptotic winners.
(a) Task of selecting winners.
(b) Task of selecting winners.
(c) Task of ranking all candidates. The values plotted are the average Kendall’s rank correlation between resulting rankings.
Figure 4: More approximate design invariance plots
Dep. Variable: Best Mechanism R-squared: 0.273
Model: OLS Adj. R-squared: 0.264
Method: Least Squares F-statistic: 42.68
Date: Wed, 12 Jun 2019 Prob (F-statistic): 2.72e-11
Time: 16:17:32 Log-Likelihood: -531.30
No. Observations: 241 AIC: 1071.
Df Residuals: 237 BIC: 1085.
Df Model: 3
coef std err z Pz [0.025 0.975]
Intercept -0.1687 0.411 -0.411 0.681 -0.973 0.636
Number Winners 0.9133 0.126 7.229 0.000 0.666 1.161
Number Candidates 0.2662 0.057 4.630 0.000 0.154 0.379
Number Winners:Number Candidates -0.0446 0.008 -5.786 0.000 -0.060 -0.030
Omnibus: 8.524 Durbin-Watson: 1.693
Prob(Omnibus): 0.014 Jarque-Bera (JB): 8.414
Skew: 0.417 Prob(JB): 0.0149
Kurtosis: 2.624 Cond. No. 463.
Table 2:

OLS Regression on the best K to use in K-Approval, by the number of candidates and desired winners. Standard errors are cluster standard errors, where each cluster is an election in our dataset.

(a) For selecting winner as number of candidates vary.
(b) For candidates as number of winners vary.
Figure 5: -Approval rate optimal mechanism for the Mallows model as , number of candidates, and number of winners vary. This plot contains an empirical line, which is calculated using the coefficients in the regression contained in Table 2.
Election Number Winners Mechanism 1 Mechanism 2 Beats Approval rate optimal
Durham Ward 1, 2019 2 3 4 True
Durham Ward 1, 2019 13 8 9 True
Durham Ward 1, 2019 17 6 7 True
Irish03 1 1 3 False
Irish03 10 8 9 False
Irish03 10 8 10 False
Irish01 5 3 4 True
Irish01 5 3 5 False
Irish01 5 3 11 False
Irish01 7 1 5 False
Glasgow05 2 4 5 True
Glasgow05 2 5 6 True
Glasgow10 2 3 4 False
Glasgow10 2 4 6 False
Glasgow10 2 5 6 True
APA08 2 2 3 True
Table 3: Elections and goals where randomizing between two -Approval mechanisms produces leads to faster learning than using either of the mechanisms separately. For several of these cases, randomization also beats the Approval rate optimal mechanism.

Appendix B Proofs

b.1 Asymptotic design-invariance

See 1


. By the definition of asymptotically design-invariant,

For this , we show by contradiction that : , : Suppose such that . Then, let

Then, . Then, with positive probability,


. Suppose there exists such a . Then, : Suppose :

Where the strict inequality follows as . Then, for all candidates , by the strong law of large numbers w.p. . Thus, w.p. . ∎

Remark 1.

The following example, with candidates leads to a disjoint set of winners with -Approval and approval, respectively

Voter 1 Voter 2 Voter 3 Voter 4 Voter 5
Rank 1 A A D D B
Rank 2 B B C C C
Rank 3 C C B B D
Rank 4 D D A A A

With -Approval, candidates are selected. With -Approval, are selected.

b.2 Learning rates


is the probability that is approved but is not, using -Approval.

For convenience, we overload the rate function :

  • is as defined in Proposition 2, the large deviation rate to learn a pair of candidates given scoring rule , for a fixed that should be clear from context. When a goal is clear from context, is as defined in Proposition 4, the minimum over for candidate pairs that are in different asymptotic tiers.

  • is as defined in Proposition 3, the large deviation rate to learn a pair of candidates using -Approval. is analogous to the previous item when using approval.

  • is the large deviation rate to learn a pair of candidates using approval voting when the probability that is approved but is not is ; and is the probability that is approved but is not is not.

When which rate function we mean is clear from context, we may drop the argument and just write or .

Remark 2.

when , OR .


is strictly concave in , with maximum at . Thus, holding either or constant and moving the other farther away strictly decreases , and thus strictly increases . ∎

See 2


Define the following random variable for each voter :

Then, when , and by supposition. Let

Then, by basic large deviation bounds (see, e.g. dembo_large_2010 dembo_large_2010):

And, applying Chernoff bounds, we get the standard relationship to the large deviation rate, giving an upper bound for the probability of error directly, including any polynomial factors out front:

Then, when

See 3


With -approval voting, becomes


The is attained at ( is convex in , and so setting the first derivative to zero finds the ). And so

See 4


By the Union bound

Now, using large deviation properties:

By supposition, , and so approaches from below. Then,


Line (1) follows from: , . See, e.g., Lemma 1.2.15 in [Dembo and Zeitouni2010] for a proof of this property.

Thus is the large deviation rate for .

b.3 Design insights

See 1


From Proposition 2, for a given scoring rule and pair of candidates , the learning rate is

Similarly, if we use scoring rules , each with probability , then,

Now, for a single scoring rule , let

Below, we show that is convex in . Then, by convexity,

and so

The left hand side is equal to the argument inside the for the rate function for randomizing between scoring rules , each with probability