Voting has proven over history to be an effective way to reach collective decisions despite irreconcilable preferences. However, voting schemes have traditionally been designed to handle a tractable set of alternatives. In particular, mechanisms like the majority judgment [BL11] or randomized Condorcet [Hoa17] typically require voters to provide ballots whose size is at least linear in the number of alternatives, and a computation time that is polynomial in this number of alternatives. Such solutions may be prohibitive in modern applications when the number of alternatives is in the thousands or in the billions, electing when electing the best movie of the year, the best paper of a conference or the best text of law to implement. In such contexts, voting becomes inevitably sparse, as voters typically only judge a small fraction of all alternatives.
Sparsity is very challenging because it raises two major issues: preference scaling and Byzantine vulnerability. To illustrate these issues, consider the case of scientific peer reviewing. Different reviewers might adopt very distinct reviewing styles. Some junior reviewers might use only modest judgments, e.g. weak accept/reject, while other reviewers may much more frequently use definitive judgments, e.g. strong accept/reject. Meanwhile, some may be systematically enthusiastic, e.g. only rarely suggest reject, while others may be consistently harsh, and almost always recommend rejection. The resulting acceptance decision of a paper may thus depend more on the reviewing styles of the reviewers assigned to the paper, rather than on the actual quality of the paper. This phenomenon is exacerbated by the fact that the assignment of papers to reviewers is rarely random. In practice, a paper is more likely to be reviewed by someone whose expertise is close to the paper’s focus. Some reviewers may also strongly prefer reviewing top quality papers only, while other reviewers may assign a lot of importance to reviewing and rejecting bad papers. Overall, this means that some papers are much more likely to attract reviewers with a given reviewing style, inducing a systematic unfairness in the peer reviewing system. Basically, robust sparse voting requires protections against such biases.
On the other hand, especially when the number of alternatives far exceeds what all honest voters combined can score, we must expect the existence of alternatives that barely any honest voter scores. Most scores received by such alternatives may then come from malicious voters. In the parlance of distributed computing, we shall call such voters Byzantine. Naive solutions, like computing the median of reported scores, would be extremely vulnerable to Byzantine voters on hardly scored alternatives. Robust sparse voting also requires protections against such attacks.
Besides these aspects, different reviewers may have different levels of expertise to judge papers adequately, or may spend different amounts of attention to evaluate carefully. A robust sparse voting mechanism should also be uncertainty-aware and account for such variations between reviewers. Ideally, such a mechanism should also be able to output a measure of reviewers’ disagreement on the value of a paper, to better reflect reviewers’ ranges of assessment. Finally, some use cases may require privacy guarantees to enable reviewers to speak freely.
We first characterize the problem of robust sparse (cardinal) voting. Essentially, we precisely pose the problem of robustly assigning scores to alternatives, based on a partial reporting of voters’ preferred scores, highlighting the very fact that this is a very important, challenging and yet neglected question. While formalizing it, we identify two key desirable properties for any robust sparse voting. Namely, we precisely capture the notions of sparse unanimity, which demands that any unanimous preference be recovered even when the voters express their preferences with very different “reviewing styles”, and -Byzantine resilience, which demands each voter to have only a bounded effect on any alternative’s global score.
We then show that satisfying any of the properties is nontrivial. Our first key contribution is to prove that voting systems based on individual-based normalized preferences inherently fail to verify sparse unanimity. This impossibility theorem highlights the need for collaborative preference normalization; however, such a normalization can be potentially highly vulnerable to Byzantine voters. Designing a Byzantine resilient collaborative preference normalization is a highly nontrivial challenge that we also highlight. In fact, we stress that naive solutions, based on identifying anchor alternatives to find a common preference scaling, or based on identifying groups of mutually comparable voters, fail to be Byzantine-resilient. Our second contribution is to address Byzantine resilience through new parameterized robust statistics primitives, which we call the Quadratically Regularized Median (abbreviated QrMed) and the Byzantine-Robustified Mean (abbreviated BrMean). Remarkably, both primitives guarantee that any single voter will only have a bounded effect on their output. Moreover, QrMed fits the intuitively appealing “one person, one unit force” principle, while BrMean guarantees the exact computation of the mean under favorable conditions.
Our third contribution is to leverage BrMean to construct a Byzantine-resilient collaborative preference normalization scheme, which allows us to design a new sparse cardinal voting algorithm we call Mehestan111Mehestan, which literally means “the great place” in old Persian, is regarded as one of the earliest proto-Parliaments. Mehestan dates back to 247 BC, at the time of the Parthian Empire, and was located in Ancient Iran.. Remarkably, we prove that Mehestan guarantees both Byzantine resilience and sparse unanimity. More precisely, we show how Mehestan guarantees that any voter can at most have a bounded parametrizable effect on Mehestan’s output, and that Mehestan will recover any unanimous and comparable scores, when sufficiently many voters participate.
Our fourth contribution is to empirically evaluate Mehestan on synthetic data, compared to alternative sparse voting algorithms. Essentially, we empirically evaluate the ability of Mehestan to recover a unanimous preference, under more or less adversarial settings. This allows to explore Mehestan’s performances when the assumptions of sparse unanimity are violated. In particular, we study Mehestan under incomplete voter comparability, small density (voters score only a few alternatives), adversarial sparsity (some voters mostly score a biased subset of alternatives). We also test the robustness of Mehestan by studying the correlation between its output on a given set of inputs, as opposed to a sparsification of the same inputs.
Our paper also makes several insightful remarks on privacy guarantees, uncertainty-awareness and (Byzantine-resilient) polarization measure. Interestingly, because Mehestan is Byzantine-resilient, it is simple to adapt Mehestan to guarantee -differential privacy, by adding to each reported score a Laplacian noise, whose scale is proportional to . Furthermore, Mehestan’s underlying primitives can be naturally adapted to include users’ reported uncertainties on their votes, and then derive -Byzantine resilient measures of uncertainties on Mehestan’s computed scores. In fact, our robust statistics toolbox also allows to robustly measure how polarizing
To understand how widespread and important the robust sparse voting problem is, it suffices to ponder the billions of ethical dilemmas that today’s large-scale Internet algorithms are asked to address on a daily basis. Namely, whenever a user searches “election frauds”, “climate change” or “vaccines” on YouTube, Facebook or Amazon, an algorithm is faced with a potentially life-or-death dilemma over which content ought to be recommended first to the user. This recommendation basically leverages users’ upvotes, dislikes or 5-star ratings, to then determine what to suggest to billions of other users. Crucially, such voting schemes are inevitably sparse. Indeed, most alternatives have been scored by only a small fraction of all users; and some alternatives might have been scored by barely any authentic user.
In this context, Byzantine resilience for such online sparse voting is critical. Social medias have become information battlegrounds [Ata19, BBH21], and their recommendation algorithms have been weaponized by all sorts of private and public actors [Sat21, Yue19], many of which leverage troll farms to fabricate misleading online activities [BH19, NHK19, Woo20], or even simply exploit the vulnerabilities of the social medias’ advertisement systems [ELM20]. The extent of such disinformation campaigns is arguably well illustrated by the whopping 15 billion fake accounts that Facebook reportedly removed in only two years [Dol21]. Unfortunately, as revealed by the infamous Facebook Files [inv21], today’s opaque moderation, recommendation and ad-targeting algorithms seem successfully manipulated by disinformation or hate speech campaigns. Instead, if designed properly, a robust sparse voting mechanism might allow genuine users to collaboratively identify top quality content of public utility, and to amplify them at scale [HFJ21]. However, for such a mechanism to work properly, in addition to being Byzantine resilient, it seems critical that it also be at least sparsely unanimous. This is unfortunately far from trivial to guarantee, given that most users mostly consume only a non-random biased subset of all alternatives, and that they will inevitably use very different “reviewing styles”.
The robust sparse voting problem is arguably more general still.
In particular, safe and ethicalnatural language processing can be framed as a robust sparse voting problem,
where the set of alternatives is now the set of all possible texts that may be generated in a given context.
This is particularly well illustrated by the Delphi project [JHB21], which aimed to assign a moral judgment to any user-generated sentence.
Typically, when prompted on “killing a bear”, Delphi replies222https://delphi.allenai.org/?a1=Killing+a+bear “it’s wrong”.
However, it is important to stress that Delphi’s reply did not emerge out of nothing;
it essentially results from generalizing the moral judgments of the humans that were queried to construct Delphi’s training dataset.
Crucially, any such judgment, or any text completion, must depend on the prompt, also known as the context.
Voting on what a language model should output in a given context would then amount to scoring any possible text completion in this context.
Yet the number of possible text completions is exponential in the number of words needed for text completion.
If we are to consider paragraph-long replies, which is needed for more complex prompts like “how robust is scientific peer reviewing?”, scoring all possible text completions is vastly intractable.
Any user’s scoring333In today’s models, a user “votes” for a text completion, when ”completing” the context. Word embedding then allows to generalize this vote, by considering that the user would score positively similar text completions, where the similarity is defined through word embedding. Whilst this generalization might be reliable for few-word completion, it is untrustworthy for paragraph completion. It is also questionable whether considering user’s natural completions as a vote on what language models ought to output, without the user’s knowledge and informed consent, is safe and ethical.
text completions, where the similarity is defined through word embedding. Whilst this generalization might be reliable for few-word completion, it is untrustworthy for paragraph completion. It is also questionable whether considering user’s natural completions as a vote on what language models ought to output, without the user’s knowledge and informed consent, is safe and ethical.will then be inevitably (extremely) sparse; any user will fail to even consider most imaginable text completions444The design of an actual language model is more challenging still, as it must be designed to address all contexts, which is exponential in the number of words defining the contexts.. Thus, any safe and ethical language model must address the robust sparse voting problem. While Mehestan is unfit to tackle the extreme sparsity of language model design, our work arguably exposes how challenging addressing it is.
Structure of the paper.
Section 2 formalizes the robust sparse voting problem. Section 3 stresses the difficulty of the problem by proving the impossibility of sparse unanimity for individually scaled preferences. Section 4 introduces new Byzantine-resilient primitives, and prove their desirable properties like asymptotic correctness. Section 5 introduces Mehestan, and proves that it guarantees Byzantine resilience and sparse unanimity. Section 6 presents the results of our empirical evaluation of Mehestan on a synthetic dataset, under adversarial sparsification. Section 7 extends Mehestan to guarantee additional desirable properties, like privacy, uncertainty awareness and polarization measurement. Section 8 reviews related work in social choice theory, collaborative filtering, recommender systems and Byzantine resilience. The appendices contain complete proofs.
2 The robust sparse voting problem
We consider a set of voters, and a set of alternatives to score. Each voter is assigned a nonnegative voting right , and is asked to provide scores to different alternatives. However, we assume that their vote may be sparse. In other words, they may only report partialvectors , for a subset of all alternatives, which might verify . We denote by the union of the sets , for , and by the tuple of voters’ reported scores. Similarly, denotes the family of voters’ voting rights, with computing the total of allocated voting rights.
Following [VNM53], we assume that each voter ’s cardinal preference is defined up to a positive affine transformation. Namely, we denote , if and only if, the two vectors score the same subset of alternatives, and there exists and such that, for all alternatives , we have . Equivalently, this amounts to saying that the correlation is total between the two score partial vectors.
Our goal is to aggregate the different voters’ scores into global complete scores , for all alternatives . In other words, we aim to construct a vote algorithm , . The algorithm must be executable for any number of voters and alternatives. We also assume that it should treat any voter without zero voting right the same way as it treats nonvoters, i.e. must imply . In this paper, we focus on the two following desirable vote properties, which we define below.
2.1 Sparse unanimity
The first desirable property we study is what we call sparse unanimity. To state it, for any voter score , denote the scores (partial) vector obtained by selecting only the entries from the partial vector . Moreover, let and define the set of pairs of alternatives that both voters and scored, each providing different scores to the two alternatives. Finally, define the set of alternatives which receive votes from voters whose cumulative voting rights are at least . We can now formalize sparse unanimity.
Vote is sparsely unanimous if, for any , there exists such that, whenever voters’ scores are -unanimous and comparable, if at least voting rights have been allocated, then Vote retrieves the unanimous preference for all -voted alternatives. More precisely, for any number of voters, any voting rights and any reported scores , the following three conditions
must imply .
Sparse unanimity essentially demands that, in very favorable conditions, Vote behave correctly. This can be regarded as a minimally desirable properties that any sparse voting algorithm ought to satisfy. The surprise, however, is that sparse unanimity is actually nontrivial to guarantee. Indeed, our impossibility theorem essentially shows that coordinate-wise (reasonable) aggregations of voters’ scores with individual-based (reasonable) normalizations fail to be sparsely unanimous.
More interestingly, when combined with Byzantine resilience (defined below), sparse unanimity actually becomes very demanding. Indeed, since Byzantine resilience demands that the influence of any small subset of voters be limited, any Byzantine-resilient sparsely unanimous voting algorithm must guarantee a form of approximate sparse unanimity. Intuitively, any quasi-unanimous preference must then be approximately recovered.
2.2 Byzantine resilience
Our second desirable property under study is what we call Byzantine resilience. To formalize it, for any subset of (Byzantine) voters, denote the tuple of voting rights defined by for , and for . In other words, cancels the voting rights of non-Byzantine voters. Conversely, denote . Clearly, we have . Byzantine resilience then demands that canceling (or activating) the Byzantine voters’ voting rights will only have a limited effect on the vote outcome, whose scale is bounded by the Byzantine’s total voting rights. Evidently, since we assume that Vote cannot distinguish Byzantine voters from genuine voters, our definition of Byzantine resilience must treat any subset identically.
Vote guarantees -Byzantine resilience if, for any inputs , a subgroup can affect each output of the vote by at most , i.e.
We say that Vote is Byzantine resilient, if there exists such that Vote is -Byzantine resilient.
The variable can be interpreted as a resilience measure. Intuitively, it protects the vote against Byzantine voters whose cumulative voting right is bounded by . More precisely, the Byzantine voters must have at least voting rights to move an alternative’s score by one unit. Put differently, this amounts to -Lipschitz continuity in voters’ voting rights (with respect to norm).
Now, in practice, some cardinal voting systems ask voters to report scores within a given range of diameter , or rescale voters’ scores so that they lie in . Given this, Byzantine resilience might seem easy to guarantee, e.g. by classical aggregation methods (mean, median, trimmed mean…), for , or by defining and by rescaling scores in with . However, both of these approaches actually fail to satisfy our definition, as here depends on . As it turns out, simply designing -Byzantine resilient mean estimator for bounded inputs is nontrivial; and our paper introduces, to the best of our knowledge, the first estimators able to achieve this task. Having said this, we stress that our definition becomes a lot more interesting, when
is fully a parametrizable hyperparameter of the voting algorithm, and can be made to take arbitrarily large values (based on a potentially evolving threat model).
Our Byzantine resilience definition may then be particularly useful for two use cases. On one hand, it seems critical for very sparse voting, especially when a few alternatives are scored by only a handful of honest voters. Indeed, assuming a lack of Byzantine resilience, Byzantine voters could then coordinate an attack to arbitrarily bias the score of the alternative, and make it either maximally high or low. This is especially important in applications where the alternatives with particularly low scores are censored or rejected, and where the alternatives with particularly high scores are widely recommended and celebrated.
On the other hand, Byzantine resilience seems important as well when voters provide (or updated) their scores through time, and when the global scores of the alternatives are updated online. In this setting, indeed, the stability of global scores over time may be salient. In fact, more generally, our definition may have applications in any system where high volatility is harmful, and where adversaries want to amplify volatility, e.g. to discredit the system. This may be particularly the case of polls: malicious entities may want to give the impression that a certain group’s opinions are erratic, or that the group members disagree a lot more than they actually do.
3 Sparse unanimity requires collaborative preference scaling
In this section, we present our main impossibility theorem, which roughly says that any coordinate-wise vote with individually normalized scores must violate sparse unanimity. Our result highlights a central and nontrivial challenge for sparse voting, even in the absence of disagreeing voters. In spirit, we essentially prove that any scale-resilient sparse voting algorithm must leverage collaborative preference scaling. To formalize our impossibility theorem, we first need to introduce some assumptions on what seem to be reasonable individual-based normalizations and score aggregations.
3.1 Individual-based normalization
Intuitively, any robust sparse voting algorithm must make sure that its output will not be affected to the scaling used by voters when they report their alternative scorings. The simplest way to guarantee this is to perform a score normalization on voters’ reported scores. In this section, we define what a score normalization is, and what desirable properties it ought to have. First we define a normalizer as a function that preserves voters’ preferences, i.e. such that, for any voter , we have . Below, we list other desirable properties.
A normalizer is individual-based if a voter’s normalized scores only depend on the voters’ reported scores, i.e.
A normalizer is scale-invariant if the normalized scores are independent of the preference scaling of the reported scores, i.e.
A normalizer is neutral if it treats all alternatives symmetrically. More precisely, denote the set of permutations of . For any and , we define if the entry of partial vector exists (otherwise is not defined). Similarly, we define . Neutrality then demands that
An individual-based normalizer is stable if the function is Lipschitz continuous on .
As an example of a (single-voter) normalizer, standardization is given by , where is the subset of alternatives scored by the score vector , is the mean of the scores and is their standard deviation, assuming . If , then we may simply set for all scored alternatives . Another popular normalizer is min-max normalization, given by , assuming (otherwise, we return for all scored alternatives ). Applying such single-voter normalizers to all voters clearly yield normalizers and .
Standardization and min-max normalizers are individual-based, scale-invariant, neutral and stable.
It is straightforward that they are an individual-based, scale-invariant and neutral normalizers. Plus, min-max normalizer is clearly stable. To show that standardization is stable, consider . It follows that is continuously differentiable, and therefore Lipschitz continuous on . ∎
3.2 Score aggregation
A score aggregation is a function . Below, we identify properties that score aggregations may have.
A score aggregation Agg is coordinate-wise if the score computed by an alternative only depends on the reported scores for this alternative, i.e., for any alternative ,
A score aggregation Agg is anonymous if it treats alternatives symmetrically, i.e.
A score aggregation Agg is neutral if it treats voters symmetrically, i.e.
where the action of on is defined by .
A coordinate-wise score aggregation Agg is max-dominated if each of its coordinates is dominated by the aggregation, i.e.
A coordinate-wise score aggregation Agg is locally Lipschitz continuous if each of its coordinates is locally Lipschitz continuous with respect to the -norm.
A coordinate-wise score aggregation Agg is asymptotically correct if each of its coordinates can recover any score , once the input is a sufficiently large sequence converging to , i.e.
where is the vector whose entries all equal 1.
The mean, the median and QrMed are all coordinate-wise, anonymous, neutral, max-dominated, locally Lipschitz continuous and asymptotically correct score aggregations.
It is straightforward that the mean, the median and QrMed are coordinate-wise, anonymous, and neutral score aggregations. The mean and the median are asymptotically correct by Lemmas 17 and 18 respectively. QrMed is asymptotically correct by Theorem 2. The mean and the median are trivially max-dominated. QrMed is max-dominated since by Lemma 16. The mean is trivially Lipschitz continuous, and thus locally Lipschitz continuous. QrMed is Lipschitz continuous by Proposition 5, and thus locally Lipschitz continuous. Since , the median is also locally Lipschitz continuous. ∎
3.3 The impossibility theorem
We now state our impossibility theorem. We stress that the theorem does not assume any Byzantine voter; in fact, as demanded by sparse unanimity, it assumes that all voters are honest and express the same preference , albeit each voter only scores a (potentially small) subset of all alternatives.
Given any individual-based, scale-invariant, neutral and stable normalizer and any coordinate-wise, anonymous, neutral, max-dominated, locally Lipschitz continuous and asymptotically correct score aggregation Agg, fails to be sparsely unanimous.
Sketch of proof.
Our proof assumes , for all alternatives . We consider voters, each with a unit voting right, with voters reporting the scores of alternatives 1, 2 and . The assumptions on and Agg then imply that most alternatives will receive roughly the same score, especially for large enough. This then implies that the vote outputs scores that are hardly correlated with . In fact, the correlation goes to in the limit . Appendix A provides the full proof. ∎
Theorem 1 suggests that sparse unanimity cannot be achieved with individual score normalization. Instead, robust sparse voting seems to require adapting a voter’s score normalization based on other voters’ scores, i.e. the score normalization must be collaborative. It is critical to note that this may create a vulnerability in practice, as Byzantine voters may leverage their impact on other voters’ scores to scale these scores as best fits their purposes. Typically, whenever a voter prefers to , a disagreeing Byzantine voter may want to make voter ’s preference scale vanish, so that the vote essentially considers that voter is nearly indifferent between and .
In fact, naive solutions for collaborative preference scaling fail to be Byzantine resilient, especially if they aim to identify a small subset of alternatives to serve as anchors, or if they aim to identify subsets of mutually comparable voters. Indeed, Byzantines with negligible (but positive) voting rights may then fabricate scores in order to upset the set of anchor alternatives, or the subsets of mutually comparable voters. In particular, the intervention of a handful of Byzantines with very small voting rights could then spectacularly upset most voters’ preference scaling, which would upset the outcome of the sparse vote. This may then vastly discredit the vote outcome.
To guarantee the Byzantine resilience of robust sparse voting, it seems critical to first solve the Byzantine resilient collaborative preference scaling problem, which itself requires a much more thorough understanding of what our definition of Byzantine resilience entails. Next, we introduce key primitives which will allow us to construct a solution to this problem. Our solution to robust sparse voting will then heavily rely on these primitives.
4 Introducing new Byzantine resilient primitives
In this section, we discuss Byzantine resilience. More precisely, we study (single-coordinate) score aggregation functions , which output a global score , based on the reported scores and on their voting rights . Without loss of generality, here, we assume that each voter does report a score (those that do not report any score may simply be ignored). In particular, a straightforward generalization of -Byzantine resilience leads us to define the Byzantine resilience of aggregation functions as
In the sequel, we first show that classical (robust) statistics operators fail to be Byzantine resilient, as we defined it. We then introduce a new primitive, called the Quadratically Regularized Median, or QrMed, and we prove that QrMed does guarantee Byzantine resilience. We then leverage QrMed to construct another primitive called the Byzantine-Robustified Mean, or BrMean, which we prove to be Byzantine resilient and to equal the Mean under some favorable conditions.
4.1 Weighted averaging is arbitrarily vulnerable
We start with the most naive voting algorithm, which is weighted averaging, given by . This algorithm is widely used by collaborative filtering algorithms for group recommender systems [FBST18a], and was proved by [PHG00] to be the only universal, unanimous, independent of irrelevant alternatives555Mehestan fails to guarantee this property, which allows it to escape the impossibility theorem of [PHG00]., translation invariant666Translation invariance means invariance when adding a constant, as opposed to positive affine transformation invariance. and sparse voting algorithm. Proposition 3 shows that weighted averaging is extremely vulnerable to Byzantine voters.
Mean is arbitrarily manipulable by any voter with a positive voting right. More precisely, for any with , for any voting right and score for voters , and for any target score , there exists a Byzantine score reporting such that . As an immediate corollary, Mean is not Byzantine resilient.
It suffices to define . ∎
4.2 The median is not Byzantine resilient
A popular robust statistics mean estimator is the median, which we denote Med. The median must guarantee that the left and right voting rights are at most half of all voting rights, i.e. . Note that the median may be ill-defined, as there may be multiple real numbers satisfying these properties. In this case, we define to be the one closest to zero, which can be easily proven to be unique.
Med is not Byzantine resilient.
The proposition is trivial in the case , as the median of the single voter’s input is clearly arbitrarily manipulable by this single voter. In the following proof, we show that this issue remains, even when the number of voters is very large.
Fix and any (potentially large) . Consider , with for all . Assume moreover that for , and for . Now define , i.e. the Byzantine voter is the last voter. Then . But by reporting , then . Thus, the intervention of the Byzantine voter modifies the output score by , which is strictly larger than . Thus Med fails to be -Byzantine resilient, for any value of . This proves the proposition. ∎
Note that the key ideas of the proof can be easily generalized to other robust statistics operators, e.g. to trimmed means, especially if we consider for .
4.3 QrMed: A Byzantine-resilient asymptotically correct aggregation function
To guarantee Byzantine resilience, we introduce a new robust statistics operator, which we call the quadratically regularized weighted median, denoted QrMed. is parameterized by an additional weight , and is defined as follows:
is well-defined, -Byzantine resilient and asymptotically correct.
Sketch of proof.
is clearly -strongly convex, which guarantees that is well-defined. By analyzing the derivatives of the loss, this also intuitively implies that any unit pull on towards the right must overcome an increasing opposite force, whose scale increases at least proportionally to how far is pulled. But given that, because of the absolute value, each (Byzantine) voter only pulls with a force proportional to their voting rights, this implies that the impact of their pull will also be proportional to their voting rights, which is why Byzantine resilience holds. Asymptotic correctness then follows from the fact that, once is a large enough, the quadratic regularization becomes negligible, and QrMed then approximately computes a median, which satisfies asymptotic correctness. The full proof is given in Appendix B. ∎
We also state the following property of QrMed, which will be used to prove the Byzantine resilience of Mehestan.
QrMed is -Lipschitz continuous with respect to the -norm.
Sketch of proof.
Let . For simplicity, denote . Denote the following objective function . We show in the proof that the minimum of (which is exactly ) is necessarily in the interval by remarking that is strictly convex, that its subdifferential at includes non-negative values, and its subdifferential at includes non-positive values. ∎
4.4 BrMean: A Byzantine resilient mean estimator
We now introduce another key robust statistics primitive, which successfully returns the mean of any sufficiently large set of bounded scores. We call it BrMean, for Byzantine-Robustified Mean. To define it, we first introduce the Clipped Mean ClipMean centered on and of radius by
where clips within the interval . BrMean is then obtained by executing ClipMean, centered on QrMed, with a radius that grows linearly with the total amounts of votes:
Crucially for our purposes, BrMean has the following properties.
is -Byzantine resilient. Moreover, if there exists such that and for all , then .
Sketch of proof.
The proof relies on the fact that ClipMean is 1-Lipschitz in the center and in the radius, and is also sufficiently Byzantine resilient for small radii, when the amount of received votes is already large enough. The guarantee then holds once sufficiently many votes have been received, so that the radius of ClipMean could safely grow large enough to contain all voters’ inputs. The detailed proof is provided in Appendix C. ∎
We stress that, remarkably, BrMean eventually returns the mean of bounded inputs, provided it received sufficiently many inputs, even though BrMean itself is oblivious to the input bounds. This is a critical property that will be at the heart of Mehestan. In particular, since Mean is clearly covariant by (positive) affine transformations, this will allow us to guarantee that, in the favorable conditions defined by sparse unanimity, voters can effectively learn a common preference scaling. Designing a Byzantine resilient estimator with this feature turned out to be the most challenging aspect of our algorithm design and of our proof that Mehestan guarantees sparse unanimity.
An additional property, which will be used later in the proof for Mehestan’s Byzantine resilience, is the Lipschitz continuity of BrMean, as stated by Proposition 6.
BrMean is 2-Lipschitz continuous in the scores, with respect to the -norm. Formally, for any , we have .
In this section, we introduce Mehestan, which leverages BrMean to first solve collaborative robust preference scaling, and then QrMed to perform robust score aggregation.
5.1 The algorithm
Mehestan is presented in Algorithm 1. Essentially, Mehestan proceeds in four steps:
First, Mehestan performs an individual-based preference scaling. In our analysis, we consider min-max normalization.
Second, for each voter , Mehestan leverages all other comparable voters’ score scaling to collaboratively rescale the voter ’s preferences, using BrMean.
Third, for each voter , Mehestan leverages all other comparable voters’ score translations to collaboratively re-translate the voter ’s preferences, using BrMean.
Last, Mehestan performs a vote on the rescaled scores using QrMed.
Below, we detail the four steps.
5.1.1 Individual-based pre-normalization
Mehestan is parametrizable by the normalizer used for individual-based pre-normalization. Concretely, in our theoretical analysis, we will consider Mehestan based on min-max normalization. Thus, in particular, we will have .
5.1.2 Collaborative preference scaling
For any voter , define the set of comparison-based comparable voters. In other words, if and only if there exists a pair of videos that and both scored with different scores. In particular, if has scored at least two videos differently, then we have . For each voter , we define the comparative scaling of voters and by . Each voter will then essentially vote for voter ’s preferences to be scaled according to voter ’s own scale, through BrMean:
Note that we do not directly take the BrMean of the comparative scaling ratios , because we want the default value to equal 1 (in the absence of any comparable voter). It is also noteworthy that, if voters’ scores are pre-normalized by min-max normalization, then . However, the definition we provided will guarantee the following Byzantine resilience property, even for different preference pre-normalizations.
5.1.3 Collaborative preference translating
Similarly, voters indirectly vote for each other’s translation factors. More precisely, define the set of translation-comparable voters. For each voter , we define . We then compute the translation factor for voter using the primitive BrMean:
5.1.4 Global scores
The global scores that result from Mehestan are then obtained by applying once again the quadratically regularized median to the rescaled voters’ scores:
5.2 Theoretical guarantees
Finally, we can state the main result of our paper.
Given MinMaxNorm as an individual-based pre-normalizer, is -Byzantine resilient and sparsely unanimous.
Below, we sketch the proof of this main theorem. We first discuss the Byzantine resilience guarantee, which requires controlling the worst-case impact of Byzantine voters at each step of Mehestan. We then prove the sparse unanimity guarantee. Our proofs are nicely decomposed in lemmas, each of which may be interesting in their own rights.
5.3 Mehestan is Byzantine resilient
Let us now sketch the proof of the Byzantine resilience of Mehestan. First, note that the use of BrMean makes it possible to ensure a form of Byzantine resilience on the preference scaling.
For any subset , denote the scaling obtained by involving only the voters . Then, for any alternative , we have .
By Theorem 3, . From this, it follows that, for any alternative , we have . ∎
For any subset , denote the scaling obtained by involving only the voters . Then, for any alternative , we have .
Sketch of proof.
One complication is that Byzantine voters affect the comparative translation factors by affecting the scaling factors and that appear in their computations, even when . Fortunately, combining Lemma 1, Proposition 6 allows to bound this impact. Combining this to Theorem 3 for the direct impact of Byzantines through allows to conclude. The full proof is given in Appendix D. ∎
Combining our lemmas and Theorem 2 guarantees the Byzantine resilience of Mehestan.
Mehestan is -Byzantine resilient.
5.4 Mehestan is sparsely unanimous
We now sketch our proof of sparse unanimity, whose full proofs are provided in Appendix E. For any , define the scaling bound
which is trivially scale-invariant. Moreover, consider a complete score vector , and denote . We then have the following guarantee.
Suppose -unanimity, comparability and MinMaxNorm pre-normalization. Then, for any , there must exist and such that , with and .
Sketch of proof.
Denoting and the best and worst alternatives scored by voter , as opposed to the best and worst alternatives and according to , we can see that . The bound on is then obtained by looking at the normalized score of . ∎
As a consequence, each re-scaled quorum voter’s preference is actually scaled identically to any other re-scaled quorum voter’s preference.
Suppose -unanimity, comparability, -weight and MinMaxNorm pre-normalization. Then the voters’ re-scaled scores are consistent, in the sense that , for all voters and alternatives that both voters scored.
Sketch of proof.
It is noteworthy that the global multiplicative rescaling is a weighted average of the values , which are known to be at least 1. Thus, under -weight, the collaborative preference normalization expands the scores’ multiplicative scales. Aggregating correctly scaled unanimous preferences then allows to recover these preferences.
Under -unanimity, comparability, -weight and MinMaxNorm pre-normalization, is recovered for -scored alternatives, i.e. . In particular, Mehestan guarantees sparse unanimity for .
By Lemma 5, under -unanimity, comparability and -weight, we know that there exists and such that for all voters and alternatives . Now suppose that an alternative has received at least votes (measured in voting rights). We know that these votes are all identical and equal to . But then, the optimality condition of shows that we must have . Thus, for all alternatives , we must have , which is an affine transformation of . This concludes the proof. ∎
6 Empirical Evaluation
We present in this section our practical experiments on Mehestan. and other voting algorithms.
Throughout the experiments, by default, we test with voters and alternatives, on synthetic data. We first randomly generate a score vector
, drawn from a Cauchy distribution. Then, we randomly assign each voter a set of alternatives
that they scored using a Bernoulli distribution of parameterdensity. Moreover, each voter is assigned the following score vector , where .
Each experiment consists in measuring Pearson’s correlation between the algorithms’ outputs and the unanimous preferences , by varying one parameter while keeping the others fixed to their default value. Each experiment is repeated times with the seeds to for reproducibility. The correlation values are averaged, and the confidence interval is shown in the plots.
We vary the following parameters:
comparability is a measure of voter comparability, with indicating that all voters are comparable and indicating that no intervention was made to make voters more comparable than after score generation. By default, in other experiments, we set comparability,
density, as previously mentioned, is a parameter in indicating how sparse the inputs are, with indicating full sparsity and indicating scoring completeness.
n_extreme is the number of voters who have a biased choice of alternatives they were assigned to score. Such voters, which we will refer to as reporting-biased voters, are assigned randomly alternatives with top or bottom scores from . Its default value is .
Additionally, we test the tolerance to biased sparsity. That is, we initially compute the outcome of the vote on the synthetic data with the default parameters (density in particular). Next, we only leave a percentage (denoted as density in Subfigure 0(d)). Importantly, the top/bottom scores of voters are hidden. Then, we report the correlation of the output of the vote with its initial output (not the unanimous preferences).
Our results are presented in Figure 1. Remarkably, Subfigure 0(a) shows that Mehestan performs well for sparse scores, and essentially recovers the unanimous preference with density only. Conversely, Med without score normalization has poor performances, even at relatively small sparsity, like density. This underlines the importance of tackling score normalization.
Subfigure 0(b) shows that Mehestan performs fairly well, even when voters are hardly completely comparable, though increased comparability helps. Note also that QrMed outperforms Med on identically normalized preferences. Meanwhile, Subfigure 0(c) shows that having numerous reporting-biased voters makes recovering the unanimous significantly harder. It also shows that Mehestan provides significant improvement over other algorithms in the presence of biased voters.
Finally, Subfigure 0(d) shows the result of the experiment testing the vote’s tolerance to biased sparsity, as explained in the previous section. It is desirable for a voting algorithm not to change its outcome by too much when a few scores are hidden. This experiment is actually very challenging since the scores to be hidden are voluntarily those of top/bottom alternatives. Nevertheless, Mehestan achieves correlation with its initial output even after of the data is hidden.
We discuss here how additional desirable properties can be guaranteed, by tweaking Mehestan.
7.1 Differential Privacy
In several applications, guaranteeing the privacy of robust sparse voting may be critical to prevent voter coercion. In this section, we prove that Mehestan can be easily made differentially private. Let us first recall the definition of (voter-level) differential privacy.
Definition 13 (Voter-level differential privacy).
Let . A (randomized) vote Vote is -differentially private if, for any and all subsets , we have
For any parameter , we define the -differentially private Mehestan voting algorithm by simply adding a Laplacian noise to each returned global score, whose scale is proportional to . More formally, for any alternative , we define
is a random variable drawn from the Laplace distribution with meanand scale .
For all , is -differentially private.
This follows trivially from the -differential privacy of the Laplace mechanism, combined with the Byzantine resilience guarantee of Mehestan. ∎
7.2 Uncertainty-aware voting
In practice, reported scores are noisy, with potentially different levels of noise. Here, we show how Mehestan can be enhanced to account for uncertainty in the input. Our key solution is to leverage a new operator called the mean-risk distance MrDist. Essentially, MrDist
simulates the fact that a voter’s uncertainty-aware vote is its expected vote, when the voter’s score is drawn from our Bayesian prior on their actual score. More formally, given a prior probability distributionof finite expectation and a point , MrDist is defined as:
QrMed can then be easily made uncertainty-aware, by replacing the absolute values as follows:
Crucially, since MrDist is a mean of functions whose subderivatives are always of absolute value at most 1, the subderivatives of MrDist are also always of absolute value at most 1. Intuitively, this means that if falls into a voter