# Weighted Voting Via No-Regret Learning

Voting systems typically treat all voters equally. We argue that perhaps they should not: Voters who have supported good choices in the past should be given higher weight than voters who have supported bad ones. To develop a formal framework for desirable weighting schemes, we draw on no-regret learning. Specifically, given a voting rule, we wish to design a weighting scheme such that applying the voting rule, with voters weighted by the scheme, leads to choices that are almost as good as those endorsed by the best voter in hindsight. We derive possibility and impossibility results for the existence of such weighting schemes, depending on whether the voting rule and the weighting scheme are deterministic or randomized, as well as on the social choice axioms satisfied by the voting rule.

• 12 publications
• 6 publications
• 23 publications
12/22/2021

### The Art and Beauty of Voting Power

We exhibit the hidden beauty of weighted voting and voting power by appl...
08/20/2020

### Positionality-Weighted Aggregation Methods on Cumulative Voting

The issue in solving social problems is how to respect minority opinions...
04/30/2020

### Voting Framework for Distributed Real-Time Ethernet based Dependable and Safe Systems

In many industrial sectors such as factory automation and process contro...
11/08/2018

### Incentivising Participation in Liquid Democracy with Breadth First Delegation

Liquid democracy allows an agent to either vote directly over the availa...
02/07/2022

### Using Multiwinner Voting to Search for Movies

We show a prototype of a system that uses multiwinner voting to suggest ...
02/24/2020

### A Probabilistic Approach to Voting, Allocation, Matching, and Coalition Formation

Randomisation and time-sharing are some of the oldest methods to achieve...
06/29/2018

### Are Condorcet and minimax voting systems the best?

In a search for the best voting system for single winner elections with ...

## 1 Introduction

In most elections, voters are entitled to equal voting power. This principle underlies the one person, one vote doctrine, and is enshrined in the United States Supreme Court ruling in the Reynolds v. Sims (1964) case.

But there are numerous voting systems in which voters do, in fact, have different weights. Standard examples include the European Council, where (for certain decisions) the weight of each member country is proportional to its population; and corporate voting procedures where stockholders have one vote per share. Some historical voting systems are even more pertinent: Sweden’s 1866 system weighted voters by wealth, giving especially wealthy voters as many as 5000 votes; and a Belgian system, used for a decade at the end of the 19th Century, gave (at least) one vote to each man, (at least) two votes to each educated man, and three votes to men who were both educated and wealthy Congleton (2011).

The last two examples can be seen as (silly, from a modern viewpoint) attempts to weight voters by merit, using wealth and education as measurable proxies thereof. We believe that the basic idea of weighting voters by merit does itself have merit. But we propose to measure a voter’s merit by the quality of his past votes. That is, a voter who has supported good choices in the past should be given higher weight than a voter who has supported bad ones.

This high-level scheme is, arguably, most applicable to repeated aggregation of objective opinions. For example, consider a group of engineers trying to decide which prototype to develop, based on an objective measure of success such as projected market share. If an engineer supported a certain prototype and it turned out to be a success, she should be given higher weight compared to her peers in future decisions; if it is a failure, her weight should lower. Similar examples include a group of investors selecting companies to invest in; and a group of decision makers in a movie studio choosing movie scripts to produce. Importantly, the recently launched, not-for-profit website RoboVote.org already provides public access to voting tools for precisely these situations, albeit using methods that always treat all voters equally Procaccia et al. (2016).

Our goal in this paper, therefore, is to augment existing voting methods with weights, in a way that keeps track of voters’ past performance, and guarantees good choices over time. The main conceptual problem we face is the development of a formal framework in which one can reason about desirable weighting schemes; in three words, our solution is no-regret learning.

### 1.1 Our Approach

The most basic no-regret learning model involves a set of experts. In each round

, the algorithm chooses an expert at random, with probability proportional to their current weights. Then the loss of each expert

at round is revealed, and the algorithm incurs the expected loss corresponding to its randomized choice. The overall loss (across rounds) of the algorithm, and of each expert, is defined by summing up the per-round losses. The algorithm’s goal is to incur an overall loss that is comparable to the best expert in hindsight. Specifically, under a no-regret learning algorithm, the average (per-round) difference between the algorithm’s loss and the loss of the best expert goes to as goes to infinity.

We depart from the classic setting in several ways — some superficial, and some fundamental. Instead of experts, we have a set of voters. In each round, each voter reveals a ranking over a set of alternatives,111The alternatives can change across rounds, and even their number may vary. and the loss of each alternative is revealed. In addition, we are given a (possibly randomized) voting rule, which receives weighted rankings as input, and outputs the winning alternative. The voting rule is not part of our design space; it is exogenous and fixed throughout the process. The loss of a voter in round is given by assigning his ranking all the weight (equivalently, imagining that all voters have that ranking), applying the voting rule, and measuring the loss of the winning alternative (or the expected loss, if the rule is randomized). As in the classic setting, our benchmark is the best voter in hindsight.

At first glance, it may seem that our setting easily reduces to the classic one, by treating voters as experts. But our loss is computed by applying the given voting rule to the entire profile of weighted rankings, and therein lies the rub. To develop some intuition, consider the case of two alternatives and , and the weighted majority rule, which selects if the total weight of voters who rank above is greater than , and otherwise. Suppose that at round , the loss of is , the loss of is , and the vote profile and weighting scheme are such that voters ranking above have a total weight of . Consequently, the rule selects , and our loss at round is exactly . But if we perturbed the weights slightly, would be selected, and our loss would jump to . By contrast, in the classic setting the algorithm’s loss is obviously continuous in the weights assigned to experts.

An obvious question at this point is whether there is a weighting scheme that would allow us to compete with the best voter in hindsight, under the weighted majority rule. Our main research question is much more general:

For which voting rules is there a weighting scheme such that the difference between our average per-round loss and that of the best voter goes to zero as the number of rounds goes to infinity?

Ironically, the very formulation of this technical question gives a first answer to our original conceptual question: A desirable weighting scheme, with respect to a given voting rule, is one that gives no-regret guarantees.

### 1.2 Our Results

Analogously to the learning literature, we consider two settings that differ in the type of feedback we receive in each time step, which we can use to adjust the voters’ weights. In the full information setting, we are informed of the loss of each alternative. This would be the case, for example, if the alternatives are companies to invest in. By contrast, in the partial information setting, we are only privy to the loss of the selected alternative. This type of feedback is appropriate when the alternatives are product prototypes: we cannot know how successful an undeveloped prototype would have been, but obviously we can measure the success of a prototype that was selected for development.

In Section 4, we devise no-regret weighting schemes for both settings, and for any voting rule. Specifically, in the full information setting, we show that for any voting rule there is a weighting scheme with regret ; in the partial information setting, the regret guarantee is

. While these results make no assumptions on the voting rule, they also impose no restrictions on the weighting scheme. In particular, the foregoing weighting schemes heavily rely on randomization, that is, they are allowed to sample a weight vector from a distribution in each time step.

However, deterministic weighting schemes seem more desirable, as they are easier to interpret and explain: a voter’s weight depends only on past performance, and not on random decisions made by the scheme. In Section 5, therefore, we restrict our attention to deterministic weighting schemes. We find that if the voting rule is itself deterministic, it admits a no-regret weighting scheme if and only if it is constant on unanimous profiles. Because this property is not satisfied by any reasonable rule, the theorem should be interpreted as a strong impossibility result. We next consider randomized voting rules, and find that they give rise to much more subtle results, which depend on the properties of the voting rule in question. Specifically, we show that if the voting rule is a distribution over unilaterals — a property satisfied by randomized positional scoring rules — then it admits a deterministic no-regret weighting scheme. By contrast, if the voting rule satisfies a probabilistic version of the famous Condorcet consistency axiom, then no-regret guarantees are impossible to achieve through a deterministic weighting scheme.

### 1.3 Related Work

Blum and Mansour (2007) provide an excellent overview of basic models and results in no-regret learning; throughout the paper we rely on some important technical results in this space Freund and Schapire (1995); Auer et al. (2002)

. Conceptually, our work is superficially related to papers on online ranking, where the algorithm chooses a ranking of objects at each stage. These papers differ from each other in how the loss function is defined, and the type of feedback used. For example, in the model of

Radlinski et al. (2008), the loss is if among the top objects in the ranking there is at least one that is “relevant”, and otherwise. Chaudhuri and Tewari (2015) assume there is a relevance score for each object, and the loss of a ranking is calculated through one of several common measures; the twist is that the algorithm only observes the relevance of the top-ranked object, which is insufficient to even compute the loss of the ranking that it chose (i.e., it is incomparable to bandit feedback). Our setting is quite different, of course: While voters have rankings, our loss is determined by aggregating these rankings via a voting rule. And instead of outputting a ranking over alternatives, our algorithm can only output weights over voters.

We also draw connections to the computational social choice Brandt et al. (2016) literature throughout the paper Gibbard (1977); Conitzer and Sandholm (2006); Procaccia (2010); Moulin (1983). For now let us just point to a few papers that share some of the features of our problem. Specifically, there is a significant body of work on weighted voting, in the context of manipulation, control, and bribery in elections Conitzer et al. (2007); Zuckerman et al. (2009); Faliszewski et al. (2009, 2015). And there are papers that study repeated (or dynamic) voting Boutilier and Procaccia (2012); Parkes and Procaccia (2013), albeit in settings where the preferences of voters evolve over time.

## 2 Preliminaries

Our work draws on social choice theory and online learning. In this section we present important concepts and results from each of these areas in turn.

### 2.1 Social Choice

We consider a set of voters and a set of alternatives. A vote is a linear ordering — a ranking or permutation — of the alternatives. That is, for any vote and alternative , denotes the position of alternative in vote . For any , indicates that alternative is preferred to under vote . We also denote this preference by . We denote the set of all possible votes over by .

A vote profile denotes the votes of voters. Furthermore, given a vote profile and a weight vector , we define the anonymous vote profile corresponding to and , denoted , by setting

 πσ≜1∥w∥1n∑i=1wi1(σi=σ),∀σ∈L(A).

That is, is an -dimensional vector such that for each vote , is the fraction of the total weight on . When needed, we use to clarify the vote profile and weight vector to which the anonymous vote profile corresponds to. Note that only contains the anonymized information about and , i.e., the anonymous vote profile remains the same even when the identities of the voters change.

To aggregate the (weighted) votes into a distribution over alternatives, we next introduce the concept of (anonymous) voting rules. Let be the set of all possible anonymous vote profiles. Similarly, let denote the set of all possible distributions over . An anonymous voting rule is a function that takes as input an anonymous vote profile and returns a distribution over the alternatives indicated by a vector , where is the probability that alternative is the winner under . We say that a voting rule is deterministic if for any , has support of size , i.e., there is a unique winner.

One class of anonymous voting rules use the positions of the individual alternatives in order to determine the winners. These rules, collectively called positional scoring rules, are defined by a scoring vector such that . Given a vote , the score of alternative in is the score of its position in , i.e., . Given an anonymous vote profile , the score of an alternative is its overall score in the rankings of , that is,

 s-scoreπ(a)≜∑σ∈L(A)πσsσ(a).

A deterministic positional scoring rule chooses the alternative with the highest score, i.e., , where (tie breaking may be needed). On the other hand, a randomized positional scoring rule chooses each alternative with probability proportional to its score, i.e., for all . Examples of positional scoring rules include plurality with , veto with , and Borda with .

Another class of anonymous voting rules use pairwise comparisons between the alternatives to determine the winners. We are especially interested in the Copeland rule, which assigns a score to each alternative based on the number of pairwise majority contests it wins. In an anonymous vote profile , we denote by the event that beats in a pairwise competition, i.e., is preferred to in rankings in that collectively have more than half the weight. More formally, . We also write if they are tied, i.e., . The Copeland score222Some refer to this variant of Copeland as  Faliszewski et al. (2008). of an alternative is defined by

 C-scoreπ(a)≜|{b∈A∣a>πb}|+12⋅|{b∈A∣a=πb}|.

The deterministic Copeland rule chooses the alternative that has the highest Copeland score (possibly breaking ties), and the randomized Copeland rule chooses each alternative with probability proportional to its Copeland score.

The deterministic Copeland rule satisfies a classic social choice axiom, which we present next. We say that is a Condorcet winner in the vote profile if for all . A voting rule is Condorcet consistent if it selects a Condorcet winner whenever one exists in the given vote profile. Note that the Copeland score of a Condorcet winner is , whereas the Copeland score of any other alternative must be strictly smaller, so a Condorcet winner (if one exists) indeed has maximum Copeland score.

An anonymous deterministic voting rule is called strategyproof if for any voter , any two vote profiles and for which for all , and any weight vector , it holds that either or , where and are the winning alternatives in and respectively. In words, whenever a voter reports instead of , the outcome does not improve according to the true ranking . While strategyproofness is a natural property to be desired in a voting rule, the celebrated Gibbard-Satterthwaite Theorem Gibbard (1973); Satterthwaite (1975) shows that non-dictatorial strategyproof deterministic voting rules do not exist.333The theorem also requires a range of size at least . Subsequently, Gibbard (1977) extended this result to randomized voting rules. Before presenting his extension, we introduce some additional definitions.

Given a loss function over the alternatives denoted by a vector , the expected loss of the alternative chosen by the rule under an anonymous vote profile is

 Lf(π,ℓ)≜Ea∼f(π)[ℓa]=f(π)⋅ℓ.

The higher the loss, the worse the alternative. We say that the loss function is consistent with vote if for all , . An anonymous randomized rule is strategyproof if for any voter , any two vote profiles and for which for all , any weight vector , and any loss function that is consistent with , we have .

The next proposition is an interpretation of a result of Gibbard (1977) on the structural property shared by all strategyproof randomized voting rules, applied to anonymous voting rules.

###### Proposition 2.1.

Any strategyproof randomized rule is a distribution over a collection of the following types of rules:

1. Anonymous Unilaterals: is an anonymous unilateral if there exists a function for which

 g(π)=∑σ∈L(A)πσeh(σ).
2. Duple: is a duple rule if .

Examples of strategyproof randomized voting rules include randomized positional scoring rules and the randomized Copeland rule, which were previously studied in this context Conitzer and Sandholm (2006); Procaccia (2010). In particular, a randomized positional scoring rule with score vector is a distribution with probabilities proportional to over unilateral rules , where each corresponds to the function that returns the alternative ranked at position of

. Similarly, the randomized Copeland rule is a uniform distribution over duples

for any two different , where if , if , and if .

### 2.2 Online Learning

We next describe the general setting of online learning, also known as learning from experts. We consider a game between a learner and an adversary. There is a set of actions (a.k.a experts) available to the learner, a set of actions available to the adversary, and a loss function that is known to both parties. In every time step , the learner chooses a distribution, denoted by a vector , over the actions in , and the adversary chooses an action from the set . The learner then receives a loss of for . At this point, the learner receives some feedback regarding the action of the adversary. In the full information setting, the learner observes before proceeding to the next time step. In the partial information setting, the learner only observes the loss .

The regret of the algorithm is defined as the difference between its total expected loss and that of the best fixed action in hindsight. The goal of the learner is to minimize its expected regret, that is, minimize

 E[RegT]≜E[T∑t=1f(xt,yt)−minx∈XT∑t=1f(x,yt)],

where the expectation is taken over the choice of , and any other random choices made by the algorithm and the adversary. An online algorithm is called a no-regret algorithm if . In words, the average regret of the learner must go to as . In general, deterministic algorithms, for which , can suffer linear regret, because the adversary can choose a sequence of actions on which the algorithm makes sub-optimal decisions at every round. Therefore, randomization is one of the key aspects of no-regret algorithms.

Many online no-regret algorithms are known for the full information and the partial information settings. In particular, the Hedge algorithm Freund and Schapire (1995) is one of the earliest results in this space for the full information setting. At time , Hedge picks each action with probability , for and .

###### Proposition 2.2 (Freund and Schapire (1995)).

Hedge has regret

For the partial information setting, the EXP3 algorithm of Auer et al. (2002) can be thought of as a variant of the Hedge algorithm with importance weighting. In particular, at time , EXP3 picks each action with probability , for and

 ~Ft(x)=t∑s=11(xs=x)f(x,ys)psx. (1)

In other words, EXP3 is is similar to Hedge, except that instead of taking into account the total loss of an action, , it takes into account an estimate of the loss, .

EXP3 has regret

## 3 Problem Formulation

In this section, we formulate the question of how one can design a weighting scheme that effectively weights the rankings of voters based on the history of their votes and the performance of the selected alternatives.

We consider a setting where voters participate in a sequence of elections that are decided by a known voting rule . In each election, voters submit their rankings over a different set of alternatives so as to elect a winner. Given an adversarial sequence of voters’ rankings and alternative losses over a span of elections, the best voter is the one whose rankings lead to the election of the winners with smallest loss overall. We call this voter the best voter in hindsight. When such a voter is known a priori, the weighting scheme would do well to follow the rankings of this voter throughout the sequence of elections. In this case, the overall expected loss of the alternatives chosen under this weighting scheme is

 mini∈[n]T∑t=1Lf(πσt,ei,ℓt). (2)

However, when the sequence of elections is not known a priori, the best voter is not known either. In this case, the weighting scheme has to take an online approach to weighting the voters’ rankings. That is, at each time step , the weighting scheme chooses a weight vector , possibly at random, to weight the rankings of the voters. After the election is held, the weighting scheme receives some feedback regarding the quality of the alternatives in that election, typically in the form of the loss of the elected alternative or that of all alternatives. Using the feedback, the weighting scheme then re-weights the voters’ rankings based on their performance so far. In this case, the total expected loss of the weighting scheme is

 T∑t=1Lf(πσt,wt,ℓt).

The type of the feedback is an important factor in designing a weighting scheme. Analogously to the online learning models described in Section 2.2, we consider two types of feedback, full information and partial information. In the full information case, after a winner is selected at time , the quality of all alternatives and rankings of the voters at that round are revealed to the weighting scheme. Note that this information is sufficient for computing the loss of each voter’s rankings so far. On the other hand, in the partial information setting only the loss of the winner is revealed. More formally, in the full information setting the choice of can depend on and , while in the partial information setting it can only depend on and for , where is the alternative that won the election at time .

Our goal is to design a weighting scheme that weights the rankings of the voters at each time step, and elects winners with overall expected loss that is almost as small as that of the best voter. We refer to the expected difference between these losses as the expected regret. That is,

 E[RegT]≜E[T∑t=1Lf(πσt,wt,ℓt)−miniT∑t=1Lf(πσt,ei,ℓt)],

where the expectation is taken over any additional source of randomness in the adversarial sequence or the algorithm. In particular, we seek a weighting scheme for which the average expected regret goes to zero as the time horizon goes to infinity, at a rate that is polynomial in the number of voters and alternatives. That is, we wish to achieve . This is our version of a no-regret algorithm.

No doubt the reader has noted that the above problem formulation is closely related to the general setting of online learning. Using the language of online learning introduced in Section 2.2, the weight vector corresponds to the learner’s action , the vote profile and alternative losses correspond to the adversary’s action , the expected loss of the weighting scheme corresponds to the loss of the learning algorithm , and the best-in-hindsight voter — or weight vector — refers to the best-in-hindsight action.

## 4 Randomized Weights

In this section, we develop no-regret algorithms for the full information and partial information settings. We essentially require no assumptions on the voting rule, but also impose no restrictions on the weighting scheme. In particular, the weighting scheme may be randomized, that is, the weights can be sampled from a distribution over weight vectors. This allows us to obtain general positive results.

As we just discussed, our setting is closely related to the classic online learning setting. Here, we introduce an algorithm analogous to Hedge that works in the full information setting of Section 3 and achieves a total regret of .

###### Theorem 4.1.

For any anonymous voting rule and voters, Algorithm 1 has regret in the full information setting.

###### Proof Sketch..

At a high level, this algorithm only considers weight vectors that correspond to a single voter. At every time step, the algorithm chooses a distribution over such weight vectors and applies the voting rule to one such weight vector that is drawn at random from this distribution. This is equivalent to applying the Hedge algorithm to a set of actions, each of which is a weight vector that corresponds to a single voter. That is,

 E[T∑t=1Lf(πσt,wt,ℓt)]=Eit∼pt[T∑t=1Lf(πσt,eit,ℓt)].

The theorem follows by noting that the loss of the benchmark weighting scheme (See Equation 2) is the smallest loss that one can get from following one such weight vector. That is, by Proposition 2.2, the total expected regret is

 E[T∑t=1Lf(πσt,wt,ℓt)]−miniT∑t=1Lf(πσt,ei,ℓt)≤O(√Tln(n)).

Next, we introduce an algorithm for the partial information setting. One may wonder whether the above approach, i.e., reducing our problem to online learning and using a standard algorithm, directly extends to the partial information setting (with the EXP3 algorithm). The answer is that it does not. In particular, in the classic setting of online learning with partial information feedback, the algorithm observes the action of the adversary and therefore can compute the estimated loss of the action it just played. That is, the algorithm can compute . In our problem setting, however, the weighting scheme only observes and for the specific alternative that was elected at this time. Since the losses of other alternatives remain unknown, the weighting scheme cannot even compute the expected loss of the specific voter it selected at time , i.e., . Therefore, we cannot directly use the EXP3 algorithm by imagining that the voters are actions, as we do not obtain the partial information feedback that the algorithm requires.

Nevertheless, the algorithm we introduce here is inspired by EXP3. Fortunately, certain properties that the performance of EXP3 relies on still hold in our setting. In particular, EXP3 uses

to create an unbiased estimator of the true loss of action

over time steps. As we show, Algorithm 2 also creates an unbiased estimator of the loss of voters in time steps, using .

###### Theorem 4.2.

For any anonymous voting rule and voters, Algorithm 2 has regret in the partial information setting.

Let us first establish a few crucial properties of Algorithm 2 in preparation for proving Theorem 4.2. In the next lemma, we show that creates an unbiased estimator of the expected loss of the weighting scheme. Similarly, we show that for any voter , is an unbiased estimator for the loss that the weighting scheme would have received if it followed the rankings of voter throughout the sequence of elections.

###### Lemma 4.3.

For any and any we have

 Eit,at[n∑i=1pti~ℓti]=Eit[Lf(πσt,eit,ℓt)]andEit,at[~LTi∗]=T∑t=1Lf(πσt,ei∗,ℓt),

where and .

###### Proof.

For ease of notation, we suppress when it is clear from the context. First note that is zero in all of its elements, except for . So,

 n∑i=1pi~ℓi=pit~ℓit=pitℓatpit=ℓat.

Therefore, we have

 Eit,at[n∑i=1pi~ℓi]=Eit,at[ℓat]=Eit[Lf(πσ,eit,ℓ)].

For clarity of presentation, let be an alternative representation of when and . Note that only if . We have

 Eit,at[~LTi∗] =T∑t=1Eit,at[~ℓit,ati∗]=T∑t=1n∑i=1ptiEa∼f(πσt,ei)[~ℓi,ai∗]=T∑t=1pti∗Ea∼f(πσt,ei∗)[ℓtapti∗] =T∑t=1Ea∼f(πσt,ei∗)[ℓta]=T∑t=1Lf(πσt,ei∗,ℓt).

###### Lemma 4.4.

For any , we have

 Eit,at[n∑i=1pti(~ℓti)2]≤n,

where and .

###### Proof.

For ease of notation, we suppress when it is clear from the context. Since is zero in all of its elements, except for , we have

 n∑i=1pi(~ℓi)2=pit(~ℓit)2=pit(ℓatpit)2=(ℓat)2pit.

Therefore,

 Eit,at[n∑i=1pi(~ℓi)2]=Eit,at[(ℓat)2pit]=n∑i=1piEa∼f(πσ,ei)[(ℓa)2pi]=n∑i=1Ea∼f(πσ,ei)[(ℓa)2]≤n.

###### Proof of Theorem 4.2.

We use a potential function, given by We prove the claim by analyzing the expected increase in this potential function at every time step. Note that

 Φt+1−Φt=−1ηln⎛⎝∑ni=1exp(−η~Lt−1i−η~ℓti)∑ni=1exp(−η~Lt−1i)⎞⎠=−1ηln(n∑i=1ptiexp(−η~ℓti)). (3)

Taking the expected increase in the potential function over the random choices of and for all , we have

 E[ΦT+1−Φ1] =T∑t=1Eit,at[Φt+1−Φt] ≥T∑t=1Eit,at[−1ηln(n∑i=1pti(1−η~ℓti+12(η~ℓti)2))] =T∑t=1Eit,at[−1ηln(1−η(n∑i=1pti~ℓti−η2n∑i=1pti(~ℓti)2))] ≥T∑t=1Eit,at[n∑i=1pti~ℓti−η2n∑i=1pti(~ℓti)2] ≥E[T∑t=1Lf(πσt,eit,ℓt)]−ηTn2, (4)

where the second transition follows from Equation (3) because for all , , the fourth transition follows from for all , and the last transition holds by Lemmas 4.3 and 4.4. On the other hand, and for any ,

 ΦT+1≤−1ηln(exp(−η~LTi∗))=~LTi∗.

Therefore,

 E[ΦT+1−Φ1]≤E[~LTi∗+1ηlnn]=E[T∑t=1Lf(πσt,ei∗,ℓt)+1ηlnn]. (5)

We can now prove the theorem by using Equations (4) and (5), and the parameter value :

 E[T∑t=1Lf(πσt,eit,ℓt)−mini∈[n]T∑t=1Lf(πσt,ei,ℓt)]≤1ηlnn+ηTn2≤√2Tnlnn.

## 5 Deterministic Weights

One of the key aspects of the weighting schemes we used in the previous section is randomization. In such weighting schemes, the weights of the voters not only depend on their performance so far, but also on the algorithm’s coin flips. In practice, voters would most likely prefer weighting schemes that depend only on their past performance, and are therefore easier to interpret.

In this section, we focus on designing weighting schemes that are deterministic in nature. Formally, a deterministic weighting scheme is an algorithm that at time step deterministically chooses one weight vector based on the history of play, i.e., sequences , , and . In this section, we seek an answer to the following question: “For which voting rules is there a no-regret deterministic weighting scheme?” In contrast to the results established in the previous section, we find that the properties of the voting rule play an important role here. In the remainder of this section, we show possibility and impossibility results for the existence of such weighting schemes under randomized and deterministic voting rules.

### 5.1 Deterministic Voting Rules

We begin our search for deterministic weighting schemes by considering deterministic voting rules. Note that in this case the winning alternatives are induced deterministically by the weighting scheme, so the weight vector should be deterministically chosen based on the sequences and . We establish an impossibility result: Essentially no deterministic weighting scheme is no-regret for a deterministic voting rule. Specifically, we show that a deterministic no-regret weighting scheme exists for a deterministic voting rule if and only if the voting rule is constant on unanimous profiles.

###### Definition 5.1.

A voting rule is constant on unanimous profiles if and only if

 ∀σ,σ′∈L(A),f(eσ)=f(eσ′),

where denotes the anonymous vote profile that has all of its weight on ranking .

###### Theorem 5.2.

For any deterministic voting rule , a deterministic weighting scheme with regret exists if and only if is constant on unanimous profiles. This is true in both the full information and partial information settings.

###### Proof.

We first prove that for any voting rule that is constant on unanimous profiles there exists a deterministic weighting scheme that is no-regret. Consider such a voting rule and a simple deterministic weighting scheme that uses weight vector for every time step (so it does not use feedback — whether full or partial — at all). Note that at each time step and for any voter ,

where the second transition holds because is constant on unanimous profiles. As a result, . In words, the total loss of the weighting scheme is the same as the total loss of any individual voter — this weighting scheme has regret.

Next, we prove that if is not constant on unanimous profiles then for any deterministic weighting scheme there is an adversarial sequence of and that leads to regret of , even in the full information setting. Take any such voting rule and let be such that . At time , the adversary chooses and based on the deterministic weight vector as follows: The adversary sets to be such that and for all . Let alternative be the winner of profile , i.e., . The adversary sets and for all . Therefore, the weighting scheme incurs a loss of at every step, and its total loss is

 T∑t=1Lf(πσt,wt,ℓt)=T∑t=1ℓtat=T.

Let us consider the total loss that the ranking of any individual voter incurs. By design, for any ,

 f(πσt,e1)=f(eτ)≠f(eτ′)=f(πσt,ej).

Therefore, for at least one voter , . Note that such a voter receives loss of , so the combined loss of all voters is at most . Over all time steps, the total combined loss of all voters is at most . As a result, the best voter incurs a loss of at most , i.e., the average loss.

We conclude that the regret of the weighting scheme is

 RegT=T∑t=1Lf(πσt,wt,ℓt)−mini∈[n]T∑t=1Lf(πσt,ei,ℓt)≥T−(n−1)Tn=Tn.

### 5.2 Randomized Voting Rules

Theorem 5.2 indicates that we need to allow randomness (either in the weighting scheme or in the voting rule) if we wish to have no-regret guarantees. As stated before, we would like to have a deterministic weighting scheme so that the weights of voters are not decided by coin flips. This leaves us with no choice other than having a randomized voting rule. Nonetheless, one might argue in favor of having a deterministic voting rule and a randomized weighting scheme, claiming that it is equivalent because the randomness has simply been shifted from the voting rule to the weights. To that imaginary critic we say that allowing the voting rule to be randomized makes it possible to achieve strategyproofness (see Section 2.1), which cannot be satisfied by a deterministic voting rule.

The next theorem shows that for any voting rule that is a distribution over unilaterals there exist deterministic weighting schemes that are no-regret. Recall that any randomized positional scoring rule can be represented as a distribution over unilaterals, hence the theorem allows us to design a no-regret weighting scheme for any randomized positional scoring rule.

The weighting schemes that we use build on Algorithms 1 and 2 directly. In more detail, we consider deterministic weighting schemes that at time use weight vector and a randomly drawn candidate , where is computed according to Algorithms 1 or 2. The key insight behind these weighting schemes is that, as we will show, if is a distribution over unilaterals, we have

 Ei∼pt[f(πσt,ei)]=f(πσt,pt), (6)

where the left-hand side is a vector of expectations. That is, the outcome of the voting rule can be alternatively implemented by applying the voting rule on the ranking of voter that is drawn at random from the distribution . This is exactly what Algorithms 1 and 2 do. Therefore, the deterministic weighting schemes induce the same distribution over alternatives at every time step as their randomized counterparts, and achieve the same regret.

###### Theorem 5.3.

For any voting rule that is a distribution over unilaterals, there exist deterministic weighting schemes with regret of and in the full-information and partial-information settings, respectively.

###### Proof.

Let be a distribution over unilaterals with corresponding probabilities . Also, let denote the function corresponding to , for . We first prove Equation (6). For ease of exposition we suppress in the notations, when it is clear from the context. Furthermore, let . It holds that

 Ei∼pt[f(πσt,ei)]=n∑i=1ptif(πi)=n∑i=1ptik∑j=1qj∑τ∈L(A)πiτehj(τ)=n∑i=1ptik∑j=1qjehj(σi),

where the last equality follows by the fact that and for any . Moreover, let , then

Now that we have established Equation (6), we use it to conclude that

 T∑t=1Lf(πσt,pt,ℓt)−mini∈[n]T∑t=1Lf(πσt,ei,ℓt)=E[T∑t=1Lf(πσt,ei,ℓt)−mini∈[n]T∑t=1Lf(πσt,ei,ℓt)],

where the expectation is taken over choice of for all