Elections are extensively being used to aggregate preferences of voters. Some elections are centralized, but others are carried out in distributed settings. Consider, for example, a supermarket chain consisting of a large number of stores. Each store collects data on the purchases made in it, and the managers at the chain headquarters might want to aggregate this data, to identify, for example, the most popular items being sold. One solution would be to have a central database, collecting all data from all stores, and to compute the most popular items on this centralized database. As the number of customers might be huge, however, it might not be practical to do so. Further, as the communication between the stores and the headquarters might be expensive, a more efficient solution would be to have some computations being made locally at each store, and to develop a protocol for efficient communication between the stores and the headquarters, to allow the managers at the headquarters to know, at each point in time, what are the most popular items that are being sold throughout the chain. As a concrete example, consider a car manufacturer wanting to decide, in each point in time, which car models and colors to manufacturer.
A similar situation happens in online shopping websites, where buyers from all around the world make purchases. As the design of modern websites is based on data centers, aggregating the data concerning all buyers involves communicating in a distributed setting. Specifically, in order to identify the current trends, and as communication between data centers might be expensive, it is of interest to develop protocols for those data centers to communicate with a central entity.
Our model also catches scenarios of political polls and political elections. That is, in political elections and in TV polls, it is usually the case that there are several polling stations, spread around the country or the region. Then, in order to compute the results of the election (or the intermediate results during the day when the poll is being held), the voters’ preferences from all those polling stations are aggregated at some central station. For example, in the general political elections held in Brazil in 2014, there were roughly 500,000 polling stations, with an average of 300 voters per station. In this situation, it is beneficial to have a protocol allowing the polling stations to efficiently communicate with a central entity, allowing the central entity to maintain a good estimate on the nation-wide (or region-wide) state of affairs.
In this paper, we model such situations as follows. We are considering an election whose electorate is distributed into sites. Assuming some common axis of time111To avoid confusion, let us mention that, while we indeed speak about “time”, we do not consider any external clocks (or, importantly, clocks accessible to the sites or the center). In particular, the voters can be assumed to come at fixed intervals, whose speed is not known to the sites nor to the center. , we have that at each point in time, a new voter arrives and votes, and her vote is assigned to one of those sites222For convenience, we refer to voters as females, while the candidates are males.. There is some center which is able to directly communicate with each of the sites. With respect to a voting rule , the goal of the center is to maintain, at any point in time, a candidate which is a -winner of the whole election (given an election and a voting rule , an -winner of is a winner of under ). More specifically, we are interested in designing communication-efficient protocols, where the center is able, upon request at any time, to return a candidate which, with high probability, is an -winner.
As we are interested in sublinear communication, in addition to allowing mistakes to accrue with some low probability, we will also use approximation. We call a candidate an -winner with respect to a voting rule , if by adding up to -fraction of voters, it can become an -winner. A more formal description of our model and a discussion on our notion of approximation is given in Section 2. Previous works were concerned with bribery (where we are allowed to change an -fraction of the voters), and margin of victory (where we are guaranteed that by changing an -fraction of the voters, the outcome of the election shall remain unchanged), see Section 1.1 for additional details on these notions. These notions are appropriate to deal with noisy data, or to be used in scenarios where some external agent can influence the voters, thus change their votes. Here, however, we are concerned with monitoring an election while minimizing the communication, and the source of our errors is lack of information (rather than noise). Our approximation notion fits better to our scenario, as a candidate is an -winner if it might become a winner under full information. Furthermore, in monitoring an election we do expect more voters to come, thus, in this aspect, an -winner is a candidate who might become a winner very shortly. Finally, as we consider an ongoing election, changing previous votes is not an option. However, the information on whether a candidate is an -winner is very valuable for making, e.g., real-time election policy decisions.
We concentrate on single-winner voting rules, and consider various voting rules, ranging from approval-based rules and scoring rules, to tournament-based rules and round-based voting rules; while we naturally cannot cover all voting rules available, we choose some of the more popular voting rules as well as aim at choosing representative voting rules. Further, we develop some general techniques for designing protocols for maintaining approximate winners in distributed elections, which might be applicable to other voting rules and settings as well. We show how to apply these techniques for the rules we consider. We discuss the effect of several parameters on the communication complexity of the protocols we design; specifically, the effect that the number of voters, the number of candidates, the required approximation , and the number of sites have on the amount of communication used by our protocols. We complement our communication-efficient protocols with lower bounds.
As a by-product of our lower bounds for maintaining an approximate Plurality winner in distributed elections, we have two contributions which might be useful in other contexts. First, we improve the state-of-the-art lower bound on the Count-tracking problem, which is a central problem in distributed streams; this result is discussed in detail in Remark 3. In short, in the Count-tracking problem, the task is to maintain a value which approximates the number of items in a given distributed stream. In the regime where , we improve the lower bound for Count-tracking from , proved by Huang et al. [HYZ12, Theorem 2.3], to (see Remark 3). Second, we define a novel problem in multiparty communication complexity and show a tight lower bound for it; in this problem, which we call the No Strict Majority problem, we have several players, each possesses its own private binary string, and, by communicating bits, the players should decide whether there is some index for which a majority of the players has in it. We prove a lower bound on the No Strict Majority problem, showing that the naive protocol for this problem is essentially optimal: asymptotically, all the bits have to be transmitted. See Section 5 for further details on our lower bounds and their implications to continuous distributed monitoring and to multiparty communication complexity.
1.1 Related Work
We first review related work on sublinear algorithm in computational social choice, as the current paper fits naturally within this line of research. Then we review papers on compilation complexity, vote elicitation, and mention some connections between our notion of approximation to work on control and bribery in elections (as well as to the concept of margin of victory). Finally, we give an overview on the available literature on the continuous distributed monitoring model, which is the computational model we use in the current paper (its formal definition is given in section 2).
Sublinear social choice. As the amount of data in general, and data concerning preferences in particular, is consistently increasing, the study of identifying election winners using time or space which is sublinear in the number of voters is receiving increasing attention. Specifically, the size of some elections might be too big to process in linear time, thus algorithms with sublinear time and/or space complexity are of interest.
In two papers, Bhattacharyya and Dey [DB15, BD15] study sampling algorithms for winner determination as well as winner determination in the streaming model. In fact, some of our sampling-based protocols are inspired by Bhattacharray and Dey [DB15]. In their model, they assume that they are given an election in which the margin of victory is at least (where is the number of voters); this means that the winner is guaranteed to remain such even if an adversary is allowed to change votes. Given such elections, they evaluate the number of vote samples needed in order to identify the winner with high probability. In our current paper, we have a different notion of approximation and we do not assume such margins of victory (we formally describe our notion of approximation in Section 2).
There is a mistake in the preliminary version of this work [FT17], which claims that the sampling-based protocols are implied by the work of Bhattacharyya and Dey [DB15, BD15]. This is incorrect as our notion of approximation is different than theirs, specifically due to this margin of victory assumption which in particular means that, while an approximate winner under our definition always exists, this does not necessarily hold in their model.
In a recent paper, Dey et al. [DTvH17] study winner determination for several multiwinner voting rules aiming at proportional representation. Dey and Narahari [DN15] study sampling algorithms for estimating the margin of victory. These works deal with centralized elections, while the current paper considers distributed elections. Another paper worth mentioning in this context is the paper of Lee et al. [LGAL14] which argues for the importance of developing fast communication-efficient protocols for computing winner in (centralized) streams; they also provide a simple sampling-based algorithm for approximating Borda winners.
Not strictly considering sublinear social choice, but nonetheless concentrating on “huge elections”, in a recent paper, Csar et al. [CLPS17] study winner determination using the MapReduce framework which may allow processing such elections efficiently by distributing the computation among clusters of machines.
Compilation complexity. In a series of papers, Chevaleyre et al. [CLMRA09, CLMM11] and Xia and Conitzer [XC10] define and study the compilation complexity of various voting rules; in their model, the electorate is partitioned into two parts, and the general concern is the amount of communication which needs to be transmitted between the two parts, in order to determine an election winner. In compilation complexity there are no rounds of communication, as only one message is being passed between the two parts. This stands in contrast to our protocols, which use small amounts of communication due to their use of several rounds of communication between the center and the sites.
Vote elicitation. There is quite an extensive literature which deal with vote elicitation [DN13, CS02, LGAL14, Lee15]; these works provide algorithms for finding approximate winners under various voting rules, by elicitating the voters’ preference orders. Conitzer and Sandholm [CS05] study communication complexity for various voting rules, but they are interested in finding exact winners, and do not consider approximations (indeed, usually their upper bounds are quite high, e.g., linearly depend on the number of voters). Further, in their model, each voter acts as a site.
Approximate winners, margin of victory, and election control. In the current paper we do not require our protocols to maintain exact winners, but are satisfied with approximate winners. We formally define our notion of approximation in Section 2; roughly speaking, we consider a candidate to be an approximate winner if it can become a winner if we are allowed to add a small number of additional voters (where we can set their votes as we wish). Our notion of approximation somehow resembles the vast amount of research done on electoral control and bribery in elections (see, e.g., the survey by Faliszewski and Rothe [FR15]). In electoral control by adding voters, there is usually a set of unregistered voters, and the question is whether it is possible to change the outcome of the election, e.g., to have some predefined, preferred candidate to become a winner in a new election, where a small number of those unregistered voters are added to the election.
In bribery problems, such as shift bribery and swap bribery [EFS09], an external agent can change the way some voters vote in order to have some predefined, preferred candidate to become a winner. As observed by Xia [Xia12], the number of such changes that needs to be done in order to make a specific candidate to become a winner (the so-called margin of victory), is a natural notion of this candidate’s closeness to be a winner. Indeed, in this sense, our approximation notion is related to those notions of control and bribery in elections.
Continuous distributed monitoring. The model of computation which we study in the current paper is called the continuous distributed monitoring model, and is usually studied within theoretical computer science and database systems. There is a fairly recent survey about this model [Cor13], as well as quite extensive line of work studying various problems in this model, such as sampling-based protocols [CMYZ12, TW11]
, protocols for approximating moments[CMY11, ABC09], protocols for counting with deletions [LRV12]
(interestingly, that paper specifically mentions elections as a motivation, but do not study it explicitly), heuristic protocols for monitoring most-frequent items[BO03], and randomized protocols for counting the number of items in a distributed stream and finding frequent items [HYZ12]. In the current paper we complement this line of work by studying winner determination in this model.
We begin by providing preliminaries regarding elections and voting rules, continue by describing our notion of approximation, and finish by discussing our model concerning continuous monitoring of distributed streams. We use standard notions from computational complexity. For , we denote the set by .
2.1 Elections and Voting Rules
An election consists of a set of candidates and a collection of voters . We consider both approval elections, where voters cast approval ballots, and ordinal elections, where voters cast ordinal ballots.
Specifically, in approval elections, each voter is associated with her set of approved candidates, such that . We say that approves candidate if (and disapproves it otherwise). In ordinal elections each voter is a total order over .
A single-winner voting rule is a function that gets an election and returns a set of co-winners of that elections, such that is a winner of the election under if .
Next we define our voting rules of interest. We ignore issues of tie-breaking; specifically, we assume an arbitrary tie-breaking order which works in our favor, such that a candidate is a winner if there is some fixed tie-breaking that makes him a winner.
We begin with approval-based voting rules and scoring rules, continue with tournament-based voting rules, and then discuss round-based voting rules.
2.1.1 Approval-based Rules and Scoring Rules
Plurality, -Approval, and Approval. Under Approval, each voter approves a subset of the candidates (that is, it is held in approval elections), and the score of a candidate is the number of voters approving him. The candidates with the highest score tie as co-winners. -Approval is similar to Approval, but with the restriction that each voter shall approve exactly candidates (that is, ; we assume that ). Plurality is a synonym for -Approval, that is, where each voter approves exactly one candidate.
Borda. Borda is the archetypical scoring rule. Under Borda, a voter ranking a candidate in position is giving it points, and the candidates with the highest score tie as co-winners.
2.1.2 Tournament-based Voting Rules
Cup. The Cup voting rule is defined via a balanced binary tree with leaves, such that there is exactly one leaf for each candidate. Starting from the leaves, in a bottom-up fashion, each non-leaf node is associated with the candidate which wins in the pairwise election held with only the two candidates corresponding to the two children of that node. Finally, the candidate which gets assigned to the root of is declared the winner of the election.
Copeland and Condorcet. The Copeland score of a candidate is the number of other candidates for which a majority of voters prefer to . Under Copeland, the candidates with the highest Copeland score tie as co-winners. A Condorcet winner is a candidate with Copeland score . Under Condorcet, a Condorcet winner is selected as a winner if it exists; otherwise, all candidates tie as co-winners.
2.1.3 Round-Based Voting Rules
Plurality with run-off. Plurality with run-off proceeds in two rounds. In the first round, it selects two candidates with the highest Plurality scores, where the Plurality score of a candidate is defined as the number of voters ranking him first. In the second round, it considers only those two candidates selected in the first round and selects as a winner the one which is preferred to the other by the larger number of voters.
Bucklin. Bucklin also proceeds in rounds. In round , it computes, for each candidate , the number of voters ranking among their top choices. Then, if there is a candidate with a strict majority of the voters ranking him among their top choices, then such a candidate is selected as a winner; otherwise, a new round begins.
2.2 Our Notion of Approximation
Since we will be interested in designing protocols where the center cannot see the full election, it will not be possible to guarantee that our protocols will find exact winners; therefore, we will be satisfied with protocols which are guaranteed to find approximate winners. There are several possibilities for defining approximate winners of elections; in this paper we consider -winners. Roughly speaking, an -winner is a candidate which is not far from being the winner of the election in the sense that he might become a winner after the arrival of only few additional new voters. A more formal definition follows.
Definition 1 (-winner).
A candidate is an -winner in an election (with voters) under some voting rule if it can become a winner under by adding at most additional voters to . That is, if there exist an election , where and such that .
Indeed, we view the definition of an -winner as a definition of approximation, as the lower is, the closer an -winner is to a real winner. As we will design our protocols to compute -winners, the lower would be, their guaranteed results would become closer and closer to real winners.
Our approximation notion seems particularly relevant to our setting (as compared to, e.g., the notion used by Bhattacharyya and Dey [DB15, BD15]), for the following reasons. First, we do not assume a margin-of-victory assumption, namely that some candidate is a clear winner. Second, in distributed vote streams we expect more voters to arrive in the future, thus we are interested in candidates which might become winners in the near future: These are exactly the -winners. (As a side note, we mention that in political elections such a knowledge might worth much to these candidates, as it can help them decide on when to spend their campaigning funds.)
2.3 Our Model of Computation
In our computational model we have one center and sites. The center and the sites are arranged in a star-shaped network, centered at the center, such that the center has a direct communication link to each site but two sites cannot communicate directly.
We assume some axis of time, , and a stream of voters , such that voter comes at time . Each voter is assigned to exactly one site, such that each site is aware only of the subset of voters which are assigned to it. We stress that the time is not known to either the center or the sites. Such a stream is called a distributed stream. Figure 1 illustrates the model.
We mention that our model of computation might be seen as the model of computation assumed in the study of Continuous Distributed Monitoring, when instantiated for vote streams (and not general, abstract streams). See the Related Work section for more details on this subject.
We are interested in designing communication-efficient protocols, whose goals are to allow the center to declare, at any point in time, a candidate which is, with constant probability (say, ), an -winner (see Section 4 for a discussion on higher probabilities).
A protocol is defined via the messages which the center and the sites send to each other, and can consist of several rounds. The protocol shall be correct not only at the end of the stream (which is usually the case in streaming algorithms), but shall be correct at any point in time. As it is the custom in protocols operating on distributed streams, we describe our upper bounds in terms of words of communication, where we assume that a word contains bits.
2.4 Useful Results from Probability Theory
For the sampling based protocols, we will use the following bound.
Theorem 1 (Chernoff Bound).
Let be a sequence of i.i.d
random variables in
i.i.d random variables in. Let and let . Then, for any :
Another useful result, which will be the main building block for our sampling-based protocols, is the following.
Let be i.i.d random variables in with mean . Let and let . Then, for it holds that
Set . Using Chernoff Bound (i.e., Theorem 1), it follows that:
3 Algorithmic Techniques
The naive protocol, where each site sends to the center a message for every voter which arrives to it, clearly solves our problem, however it uses communication which is linear in the number of voters. For example, for ordinal ballots, it communicates bits, since bits are sufficient for sending a single vote. In this paper we are interested in protocols which use significantly less communication, namely communication which is polylogarithmic in the number of voters.
In this section we provide high level descriptions of three algorithmic techniques which are useful for developing protocols for maintaining approximate winners in distributed vote streams. Accordingly, in Section 4 we demonstrate how to realize and instantiate those algorithmic techniques as concrete protocols for maintaining approximate winners for various specific voting rules.
3.1 Protocols Based on Counting Frequencies
In the Frequency-tracking problem, we are given a distributed stream where, instead of voters, the items of the stream come from a known universe of items. The goal is for the center to maintain, for each item type in the distributed stream, a value which approximates the frequency of that item type. More formally, let us denote the items of the stream by and consider different item types, such that item (for ) is of type (for ) if . Let us denote the frequency of item type by . A protocol solving the Frequency-tracking problem guarantees that with constant probability, simultaneously for every item type , the center can maintain a value such that .
Estimating the frequencies of item types is a fundamental problem in distributed streams (in fact, also in centralized streams). A deterministic protocol for Frequency-tracking, using words of communication is known [YZ13], and it is known that it is tight as well. Moreover, there is a randomized protocol which uses words of communication [HYZ12]. 333Notice that Huang et al. [HYZ12] consider only situations where , thus their bounds read differently; nevertheless, is the communication complexity of their protocol. Formally, the protocol guarantees that for every and every , after the arrival of voters, . In particular, by setting and applying union bound, we get that for every , . The communication complexity in this case is .
Many voting rules operate by counting points for candidates, thus, it can be seen as if those voting rules actually count frequencies of, say, approvals of each candidate. It turns out that, indeed, it is sometimes possible to reduce the problem of maintaining an -winner under such voting rules to the problem of maintaining approximate frequencies.
During the description of our results for specific voting rules, in Section 4, we will usually use the randomized version of the Frequency-tracking protocol, the only exception being the hybrid protocol for Runoff, for which we will use the deterministic version.
3.2 Protocols Based on Checkpoints
Protocols based on checkpoints are deterministic in nature, and the general idea behind such protocols is as follows. Assume that the center knows an -winner of the election containing the first voters. Then, the crucial observation is that, until the number of voters reaches , the center can declare as an -winner. This suggests protocols where the center only updates its declared candidate whenever the number of voters multiplies by an -fraction. Such points in time will be called checkpoints. Between two checkpoints, the center will declare the previous estimation as the current -winner. This intuition is formulated in the following lemma, the proof of which appears in Appendix A. 444While some of the ideas in the proof might fit naturally in the main text, the proof considers each voting rule studied in this paper separately, and thus it is slightly repetitive, and thus deferred to the appendix.
Let be some voting rule described in Section 2.1. Let and , where , be two elections. If candidate is an -winner w.r.t , then is an -winner w.r.t .
In order to identify the checkpoints, the center shall be able to count the number of voters arriving so far. Fortunately, there is an efficient deterministic protocol for solving the Count-tracking problem, which uses words [YZ13]; in the Count-tracking problem, the center shall maintain a value such that , where is the actual number of items in the distributed stream.
Now we have all the ingredients for our generic protocol. Specifically, the center will maintain a value using a Count-tracking protocol with precession parameter . Each time when exceeds for the first time555In fact, the Count-tracking protocol of [YZ13] only increases its estimation as time go by., for some , the center will initiate a static subprotocol to identify an -winner of the election so far. The center will declare as -winner until the next checkpoint. We argue that is indeed an -winner. Consider a step in time . Then the center’s estimation of the number of voters is at least . In particular, it necessarily had a “checkpoint” at time , for . Thus . By Lemma 2, as was -winner at time , it is also -winner at time .
As the estimation is bounded by , the number of checkpoints is bounded by . Assuming that it is possible to compute an -winner using words, a protocol based on checkpoints would then need words of communication. As will be at least , we would get .666Huang et al. [HYZ12] provide a randomized protocol for Count-tracking which uses bits of communication. As will be greater than , using randomization will not reduce the total asymptotic communication.
During the description of our results for specific voting rules, in Section 4, we will describe only the static protocol in each protocol based on checkpoints. For simplicity of presentation, we will compute -winner instead of -winner as actually needed.
3.3 Protocols Based on Sampling
Instead of sending all voters to the center, as the naive protocol does, it is natural to let each site send only some of the voters arriving to it. Specifically, we would like the center to have a uniform sample of the voters. Cormode et al. [CMYZ12] describe a protocol for maintaining a sample of items chosen uniformly at random from a distributed stream; it’s communication complexity is . Since we are sampling voters, we need to take into account the communication needed to send each of the sampled voters. Specifically, in approval elections (where the voters cast approval ballots), we need bits per voter. Since we count the communication complexity in words, each of which contains bits, we need words per voter. Similarly, in ordinal elections (where the voters cast ordinal ballots), we need () bits per voter, thus words per voter.
But how much samples are needed in order to determine an -winner with high probability? Our main building block would be Lemma 1 (see Section 2) and our general framework will be as follows. For each voting rule, we will use Lemma 1 to argue that, with samples, chosen uniformly with repetitions, we can determine an -winner with high probability. Then, assuming that we need words of communication for each voter, using an efficient sampling protocol [CMYZ12], as discussed above, we will get a communication protocol with complexity
. (As we use asymptotic analysis, it will be enough to find an-winner and to adjust the parameters accordingly.)
4 Communication-efficient Protocols
Our upper bounds are summarized in Table 1. We begin with approval-based rules and scoring rules, continue with tournament-based rules, and then discuss round-based rules. Before we present our specific upper bounds, the following remark, concerning the success probability of our protocols, is in place.
Notice that we state our results for protocols which are correct with some constant probability, say . One can always achieve arbitrary high probability , as follows, and depending on the general technique used:
For protocols based on counting frequencies, following the discussion in Section 3.1, one can get failure probability by replacing the term with a term in the communication complexity.
Protocols based on checkpoints are deterministic anyhow.
For protocols based on sampling, we mention that, as can be seen from the corresponding proofs, the increase of the required sampling size needed for increasing the success probability is quite small. Specifically, the number of samples will increase: in -Approval to , in (Plurality with) Run Off to , and in all other voting rules to .
4.1 Approval-based Rules and Scoring Rules
Let us begin with Plurality, as arguably the simplest voting rule. In Plurality, a vote in a distributed vote stream is associated with one candidate out of the candidates participating in the election, and the goal is for the center to maintain a candidate such that the highest number of voters vote for , or at least it is at most -far from being such a candidate. Equivalently, a distributed stream for Plurality contains item types (one item type for each candidate). Given an approximate frequency for each type (that is, an approximate number of voters voting for each candidate), the center can safely declare the candidate with the highest approximate frequency.
The next result follows by realizing a straight-forward protocol based on counting frequencies, as described in Section 3.1; notice that we use .
There is a protocol for Plurality-winner-tracking which uses words.
We use the efficient protocol for Frequency-tracking [HYZ12] with . This allows the center to maintain, for each candidate , a value which is guaranteed to be at most -far from the real number of voters voting for . The center would declare the candidate for which the approximate frequency is the highest.
Let us denote the real frequency of a candidate by (which equals its Plurality score), and its approximate frequency computed by the Frequency-tracking protocol by . For each , it holds that
where the first and third inequalities follows from the -approximation and the second from our choice of . Therefore, we conclude that is an -winner, as required. ∎
We go on to consider -Approval, where each voter specifies candidates which she approves. We provide three protocols, based on counting frequencies, checkpoints, and sampling, respectively. The protocol based on counting frequencies simulates each voter by voters, each approving only one candidate; then, it uses a protocol for Plurality.
There are three protocols for -Approval-winner-tracking, for .
the protocols use
words of communication.
For the first protocol, we reduce -Approval to Plurality, as follows, and as depicted in Figure 2. Each site, upon receiving a voter which approves candidates, instead of considering the voter , creates and considers voters, , such that voter (for ) is set to approve the th approved candidate of . For example, a voter approving would be reduced to three voters, approving , , and , respectively.
The reduced election has voters, and will be executed with precision parameter . Consider a candidate which is an -winner in the reduced election; we argue that is an -winner in the original election. Indeed, we can add voters, each approving , while for each other candidate , at most of them approve (as ); thus, the relative score of increases by . As is -winner in the reduced election, this is sufficient. By Theorem 2, the communication used is .
The second protocol is based on checkpoints. We describe the static protocol for computing an -winner. The center initiates communication with all sites, asking from each site to send an approximate score for each candidate. That is, each site, for each candidate , sends the number of voters approving , rounded to the closest multiplication of . Such rounding is enough, since, summing up the possible errors from all sites, the center would have a value which is at most -far from the real score. Thus, the candidate with the highest approximated score will indeed be an -winner. Each site should communicate bits per candidate. Thus, the total communication is bounded by . The bound follows.
For the third protocol, we will show that sampled voters, chosen uniformly at random (with repetitions), are enough to determine an -winner with failure probability at most . As we can communicate each voter using bits, the bound follows. Consider such a sample of voters, and, for a candidate , let be an indicator for the event that the ’s sampled voter approved . Let , and denote by the actual number of voters that approved in the original election. Set . Using Chernoff bound (Theorem 1 in Section 2), we have that:
By union bound, we have that:
where the second inequality follows from Claim 1 below, by setting and noting that lies in the convex hall of the set described there. The center will return a candidate with maximal . Correctness follows by the same arguments as in the frequency-count protocol. ∎
Consider the set of points with integer coordinates. Let contain exactly those points in for which the value of exactly coordinates is , while the value of all their other coordinates is . Let . Then, for any arbitrary point in the convex hall of , it holds that:
Consider the function and notice that its second derivative is
Hence, is convex in the domain . Set . As sum of convex functions is also convex, is convex in the domain , which in particular contains the convex hull of . Since is convex function, the maximum value in the convex hall achieved in a point of . We conclude that:
For Approval, where the set of approved candidates of each voter can be arbitrary, thus upper bounded by the number of candidate, we proceed similarly to -Approval. Naturally, we have -factors instead of -factors in our bounds. (Specifically, in the first protocol the size of the reduced election is and in the second protocol we sample slightly more voters.) Other than that, in fact, Approval is even a bit easier than -Approval, as by using voters, we can increase the relative score of a candidate by (since we can add voters all of which approve only ).
There are three protocols for Approval-winner-tracking. Respectively, the protocols use , , and words of communication.
We go on to consider ordinal elections. Specifically, next we consider the Borda rule, for which we describe three protocols.
There are three protocols for Borda-winner-tracking. Respectively, the protocols use , , and words of communication.
We start by discussing the impact of adding voters. For an arbitrary candidate , consider two voters where one voter is ranking first and then ranks the other candidates in an arbitrary order, and another voter is ranking first and then ranks the other candidates in reverse order. Adding these two voters causes an increase to the score of by while the score of all other candidates increases by . Thus, by adding voters, we can increase the relative score of by .
The first protocol is based on reducing Borda to Plurality, similarly to the first protocol stated in Theorem 3. Specifically, we begin by reducing Borda to Plurality, as follows, and as depicted in Figure 3. Each site, upon receiving a voter with preference order , instead of considering the voter , creates and considers voters, such that for , it creates voters, each approving . For example, a voter would be transformed into three voters, approving , , , respectively.
In the reduced election we have voters, where is the number of voters in the original election. We use the protocol for Plurality described in Theorem 2 with . Let us denote the real frequency of a candidate in the reduced election by and its computed approximate frequency by . The error is bounded by . Since by adding voters we can increase the relative score of the chosen candidate by , we are done.
The second protocol is based on checkpoints, thus below we describe the static subprotocol used in each checkpoint. Similarly to the second protocol in Theorem 3, each site sends an approximation of the Borda score of each candidate rounded to the closest multiplication of . Hence the subprotocol uses words, while the combined error for the Borda score estimation of each candidate is .
For the third protocol, we will show that sampled voters, chosen uniformly at random (with repetitions), are enough to determine an -winner with failure probability at most . As we can communicate each voter using bits, the bound follows. For a candidate , let , where is the score that candidate gets from the ’s sampled voter. Let , and denote by the score of the candidate in the election. Set . Using Lemma 1 we have that
and hence by union bound it follows that . The center will return a candidate with maximal . The accuracy of the protocol follows from arguments given in the analysis of the frequency-count protocol. ∎
4.2 Tournament-Based Rules
In this section we consider Condorcet winners and the Copeland voting rule. The rules we consider below are built upon the tournament defined over the election by considering head-to-head contests between all pairs of candidates. The first protocol for Copeland proceeds by approximating, for each pair of candidates and , the number of voters preferring to . Having these approximate counts, we will be able to identify an -winner under Copeland. if there is a candidate which is preferred to all other candidates, then the center shall declare as the Condorcet winner.
There are three protocols for Copeland-winner-tracking. Respectively, the protocols use , , and words.
For the first protocol, we reduce each voter, corresponding to a total order over the candidates, to items; specifically, the reduced distributed stream will contain items of item types, where for each pair of candidates and we have a different type, denoted by . The reduction proceeds as follows. Each site, upon receiving a voter which specifies a linear order, instead of considering the voter , creates and considers items, such that if prefers to , then we create an item (notice that this is an ordered tuple). The reduction is depicted in Figure 4. For example, a voter would be transformed into three items, , , and .
The reduced distributed stream has items and types of items. For two candidates and , let denote the number of voters preferring to . Now we can use a protocol based on counting frequencies (see Section 3.1), with , to let the center maintain, for each pair of candidates and , a value such that .
Let be the number of candidates such that in the election . We denote by the (real) Copeland score of candidate in elections . The center declares as an -winner a candidate with the highest value of . Note that, for every candidate , it holds that ; this is so since the error in the computed frequency is bounded by , while for the declared winner , it holds that there are at least candidates such that .
Next we argue that is indeed an -winner. We add voters which rank on top and then the other candidates in arbitrary order, and another voters which rank on top and then the other candidates in reverse order. Denote the modified election, with these additional voters, by . Then, for every , increased by ; thus . Moreover, the number of wins of any other candidate does not increase. Hence .
The communication complexity follows by the discussion given in Section 3.1; specifically, it is .
The second protocol is based on checkpoints, and thus below we describe the static subprotocol used in each checkpoint. For every pair of candidates, and , every site sends the center the number of voters preferring over , rounded to the closest multiplication of . In each checkpoint, a candidate achieving estimated score higher that for the maximal number of times (that is, for the largest number of other candidates) is declared a winner. As the error in each head-to head contest is upper-bounded by , correctness follows by similar lines as given above in the proof of the frequency-count protocol. As there are quantities to estimate, each site sends words. The total communication follows.
For the third protocol, we will show that sampled voters, chosen uniformly at random (with repetitions), are enough to determine an -winner with failure probability at most . As we can communicate each voter using bits, the bound follows. For two candidates , let be an indicator for the event that the ’s sampled voter prefers over . Let , and denote by the actual number of voters preferring over in the original election. Set . Using Lemma 1 it follows that
By union bound, with probability at least , for every pair of candidates we have that
Let be the number of candidates such that in the election . The center declares as an -winner a candidate with the highest value of . The accuracy of the protocol follows from arguments given in the analysis of the frequency-count protocol. ∎
We go on to consider the Cup rule, which differs from Copeland in several aspects. The first aspect is that, in order to prove that some estimated candidate is indeed an -winner, it is not enough to add arbitrary voters ranking last, but rather a more subtle construction of voters is needed. The second aspect is that, intuitively, while in Copeland we had to send communication regarding all pairs of candidates, in Cup it is enough to send communication only regarding some pairs of candidates, as given by the binary tree corresponding to the “head-to-head” contests performed for finding the winner under Cup.
There are three protocols for Cup. Respectively, the protocols use , , and words.
Let be an implementation of the binary tree of the Cup election: There are ordered pairs of candidates (corresponding to the head-to-head “contests”), such that the winning candidate in each such pair goes up in the tree. In particular, every election which agrees with the tree on , will have the root of as its Cup-winner. We argue that there is an order over the candidates such that, if , then will appear before in . Indeed, consider a directed graph with the candidates as its vertices and as its edges. is acyclic and thus a topological order of will provide us with the desired order . Later we will use this order as a preference order. Now we will proceed to describing the protocols.
Our first protocol is based on counting frequencies, and is similar to the corresponding Copeland protocol. We estimate the frequencies of all head-to-head contests (using the same precision and communication). To return a winner, we simply run a Cup tournament (with the appropriate, given tree), using the estimations instead of the real values . As a result, we have a set of ordered pairs. To prove correctness, it will be enough to show that by adding additional votes it will hold, for every , that . Indeed, following the analysis of the frequency count of Copeland, with high probability for every pair of candidates we have that . Recall the order described at the beginning of the proof, and notice that by adding voters with preference orders as it will hold, for every , that
The second protocol is based on checkpoints777The protocol described here is useful if we assume that . If this is not the case, then we can use instead the communication protocol of Copeland., thus below we describe the static subprotocol carried-out in each checkpoint. The subprotocol has rounds, corresponding to the height of the binary tree associated with the Cup protocol. In each round, the center asks each site to provide approximate values of the pairs currently at interest. Supplied with these approximate values, the center then computes the winner of each head-to-head contest, and continue to the nodes further up the tree. At the end, the center declares the winner of the highest node in the tree.
More concretely, for every pair of candidates of interest , , each site sends the center the number of voters preferring over , rounded to the closest multiplication of . As the error in each head-to-head contest is upper-bounded by , correctness follows by similar lines as given above in the proof of the frequency-count protocol described above. There are rounds, where at round , each site sends values, each requiring bits. Thus, the total number of words in a checkpoint is:
and total communication follows.
The third protocol is based on sampling and is similar to the Copeland sampling protocol. We use the same communication, and hence we insure that with high probability, for every pair of candidates it holds that . Correctness now follows by similar lines as in the frequency-count protocol. ∎
Finally, we consider the Condorcet voting rule. In order to declare a candidate as a Condorcet -winner, it is enough to insure that, by adding voters, every other candidate loses to at least one other candidate in the head-to-head contest (and thus, either can become a Condorcet winner in this way, or there will be no Condorcet winner at all, in which case can be returned). A candidate which is either Copeland or Cup -winner has this property. We conclude that every protocol for Copeland as well as every protocol for Cup is in particular a protocol for Condorcet.
There are three protocols for Condorcet-winner-tracking. Respectively, the protocols use , , and words.
4.3 Round-based Rules
In this section we consider two round-based voting rules; we begin with Plurality with run-off and then continue to Bucklin. For Plurality with run-off we provide there protocols, one of which is a “hybrid” protocol, specifically combining checkpoints and sampling. Intuitively, hybrid protocols fit naturally with round-based voting rules, which, informally speaking, are themselves “hybrids” of voting rules.