An election rule is a function that takes as input a collection of voters preferences over a given set of candidates and returns a single candidate, called the winner. There is a large variety of election rules known in the literature (we refer the reader to the survey by Zwicker [Zwi15] for an overview); most of them require the voters to provide strict linear orders over the candidates. Yet, it is often hard, or even infeasible for a voter to provide such a prefernce ranking, especially when the set of candidates is large. Indeed, it is often believed that a voter can rank at most five to nine candidates [Mil56].
In this paper we ask how the quality of decisions made through voting depends on the amount of information available. Specifically, our goal is to assess the quality of outcomes of elections when each voter can be asked to rank at most candidates. We compare two ways of eliciting preferences. In the first approach—which we call randomized—we ask each voter to rank a random subset of candidates. In the second approach—which we call deterministic—we ask each voter to provide the ranking of her top most preferred candidates (the, so-called, -truncated ballot). For a number of rules (we analyze positional scoring rules and the Minimax method), we investigate how well they can be approximated by algorithms that use one of the two elicitation methods.
Our contribution is the following:
In Section 3.1 we identify a class of positional scoring rules that, for a given , can be well approximated using the randomized approach. consists of a single rule, namely the Borda count; the number of rules in grows exponentially with . We theoretically prove approximation guarantees for the rules from —these guarantees are more likely to be accurate when the number of voters is large—we analytically show how, in the worst case, the approximation guarantees depend on the number of voters. In Section 3.2 we provide an analogous analytical analysis for the Minimax rule.
In Section 4 we prove upper-bounds on the approximation ratios of an algorithm that uses -truncated ballots; we prove these bounds both for positional scoring rules and for the Minimax rule. In both cases, we show that the algorithm that minimizes the maximal regret of Lu and Boutilier [LB11] (we recall this algorithm in Section 4.1) matches our upper-bounds (for Minimax our analysis is tight up to a small constant factor).
We ran computer simulations in order to verify how the approximation ratio depends on the particular distribution of voters preferences (Section 5). Our experiments confirm that in most cases (with the exception of very unstructured preferences) the randomized approach is superior. We also show that usually only a couple of hundreds of voters are required to achieve a reasonably good approximation.
Our work contributes to the broad literature on handling incomplete information in voting—for a survey on this topic, we refer the reader to the book chapter by Boutilier and Rosenschein [BR15]. Specifically, our research is closely related to the idea of minimizing the maximal regret [LB11]. Therein, for a partial preference profile , the goal is to select a candidate such that the score of in the worst possible completion of is maximized. In particular, algorithms minimizing the maximal regret yield the best possible approximation ratio. Our paper complements this literature by (1) providing an accurate analysis of these approximation ratios for various methods (which allows to better judge suitability of different methods for handling incomplete information), and (2) by providing the analysis for two natural methods of preference elicitation (which also allows to assess which of the two methods is better).
Algorithms for minimizing the maximal regret interpret the missing information in the most pessimistic way: they assume the worst-possible completion of partial preferences. Other approaches include assuming the missing pairwise preferences to be distributed uniformly (e.g. Xia and Conitzer [XC11]
) and machine-learning techniques (Doucette[Dou14, Dou15]) to “reconstruct” missing information (assuming that the missing pairwise comparisons are distributed similarly as in observed partial rankings).
. There, an underlying utility model is assumed, and the goal is to estimate how well various voting rules that have only access to ordinal preferences, approximate optimal winners, i.e., candidates that maximize the total utility of the voters. The concept of distortion has recently received a lot of attention in the literature. The definition of distortion has for example been adapted to social welfare functions (where the goal is to output a ranking of candidates rather than a single winner)[BPQ19] and to participator budgeting [BNPS17]. Some works also study distortion assuming a certain structure of the underlying utility model (e.g., that it can be represented as a metric space) [ABE18, AP17, FFG16, GKM17, GAX17].
Finally, we mention that our randomized algorithms are similar to the one proposed by Hansen [Han16]
. The main difference is that the rule proposed by Hansen asks each voter to compare a certain number of pairs of candidates, while in our approach we ask each voter to rank a certain fixed-size subset of them. Hansen views his algorithm as a fully-fledged standalone rule (and compares it with other election systems, mostly focusing on assessing the probability of selecting the Condorcet winner), while our primary goal is to investigate how well our rules approximate their original counterparts.
An election is a pair , where and denote the sets of voters and candidates, respectively. Each voter is endowed with a preference ranking over the candidates, which is a total ordering of the candidates and which we denote by . For each candidate by we denote the position of in ’s preference ranking. The position of the most preferred candidate is one, of the second most preferred candidate is two, etc. For example, for a voter with the preference , we have , , and .
For an integer we use to denote the set and we use the Iverson bracket notation—for a logical expression the term means if is true and otherwise.
A voting rule is a function that, for a given election , returns a subset of candidates, which we call tied winning candidates. Below we describe several (classes of) voting rules that we will focus on in this paper.
A positional scoring function is a mapping that assigns to each position a real value: intuitively, is a score that a voter assigns to a candidate that she ranks as her -th most preferred one. For each positional scoring function we define the -score of a candidate as , and the corresponding election rule selects the candidate(s) with the highest -score. Examples of common positional scoring rules include:
- Borda rule:
Based on a linear decreasing positional scoring function, the Borda rule is formally defined by for .
- Plurality rule:
Being equivalent to the -approval rule, the positional scoring function for the Plurality rule assigns a score of one to the first position and zero to all others.
Another important class of voting rules origins from the Condorcet criterion. It says that if there exists a candidate that is preferred to any other candidate by a majority of voters, then the voting rule should select . We focus on one particular rule satisfying the Condorcet criterion (we chose a rule picking the candidates that maximize a certain scoring function so that we could apply to the rule the standard definition of approximation):
- Minimax rule.
For an election and two candidates , we define as the number of voters who prefer to and we set
The rule then selects the candidates with the highest score.
Since all rules described above select the candidates with the maximal scores (with particular rules differing in how the score should be calculated), a natural definition of approximation applies.
We say that is an -approximation algorithm for a rule if for each election instance it holds that:
where is a function representing the score awards each candidate, is the set of winners returned by , and is the candidate returned .
Later on, we will consider algorithms that have access only to certain parts of the input instances. In such cases the above definition still applies. For example, let denote the truncated instance obtained from , i.e., a partial election which for each voter contains her preferences ranking from , truncated to the top positions. Then we say that is an -approximation algorithm for for -truncated instances, when for each election instance it holds that:
3 Randomized Approach
In this section we explore a randomized approach, where each voter can be asked to rank a random subset of candidates.
3.1 Scoring Rules
We start our analysis by looking at the class of positional scoring rules. For the sake of simplicity we will assume throughout this section that is divisible by 111We will always implicitly assume that is much larger than , and we will use randomized algorithms only. Thus, if does not divide , then in our algorithms we can add a preliminary step that randomly selects a set of voters, and ignores the remaining ones. We mention that other authors also suggested to give multiple randomized ballots to each voter.. We first present an algorithm that estimates the score of each candidate and picks the candidate with the highest score. The algorithm is parameterized with a natural number
and a vector ofreals —for a fixed vector we will call the algorithm -PSF-ALG. This algorithm asks each voter to rank a random set of candidates. We say that a candidate is ranked by a voter if belongs to the set of candidates that was asked to rank. If is the -th most preferred among the candidates ranked by a voter, then receives the score of from the voter. Such scores are summed up for each candidate, normalized by the number of voters who ranked the respective candidate, and the candidate with the highest total score is declared the winner. Pseudcode of the algorithm is given in Algorithm 1.
Below, we will show that for some positional scoring rules, by choosing the vector carefully, we can find good approximations of winning candidates with high probability. First, through Theorem 1 we establish a relation between positional scoring functions and vectors that should be used to assess ; the formula is not intuitive, and we will discuss it later on. In particular, we will explain which positional scoring functions can be well approximated using this approach, that is, we will discuss the structure of the class of positional scoring functions which are covered by the following theorem.
Fix a non-increasing sequence of reals and consider the positional scoring function defined by
For a candidate that is ranked by at least one voter, we denote by the random variable describing the total normalized score that
the random variable describing the total normalized score thatwas assigned by -PSF-ALG. Then, the expected value is equal to the -score of , and the probability that the score computed by -PSF-ALG for differs from its expected value by a multiplicative factor of is upper-bounded by , i.e.,
Let us fix a candidate who is ranked by at least one voter. The process of computing the score of according to Algorithm 1 can be equivalently described as follows. We first decide on the number of voters we ask to rank . Second, we pick uniformly at random a set of voters such that all voters in are asked to rank and all voters in are not asked to rank . Finally, we ask each voter from to rank and a randomly selected set of candidates. Let be a random variable describing the number of voters who rank . Further, for each voter , let denote the random variable equal 1 if is the -th candidate among those ranked by voter and zero otherwise. In particular, is zero when is not asked to rank . Observe that if , then the value of can be expressed as
Further, let be 1 if each voter from ranks and 0 otherwise. Similarly, let be 1 if ranks and 0 otherwise. Let be equal to 1 if is ranked as the -th most preferred candidate among by and 0 otherwise. We next compute the conditional expected value . We first give the formal equalities and give reasoning for the more complicated ones afterwards.
We will now explain some of the equalities in the above sequence. (3) is an effect of regrouping the summands; each summand in the previous line is added for each set of size which includes —there are such sets and is the same as . (5) holds for the following reason: A voter who ranked was asked to rank some set of candidates including . Each possible set has the same probability of being selected, thus this probability is . (6) is true as we will show that . Consider a fixed voter , a fixed candidate , and a set such that (i) , (ii) , and (iii) considers to be her -th most preferred candidate from . Each such a set must consist of candidates that are ranked before by and candidates that are ranked after . Thus, there are such sets. We refer to Fig. 1 for an illustration.
Next, we will use the Chernoff’s inequality to assess the probability that the computed score of a candidate does not differ from its true score by a factor of . We will first assess the conditional probability . Observe that the conditional variables are not independent. For instance, if , then for each . However, they are all negatively correlated—intuitively meaning that if a variable becomes 1 (resp., 0), then the other variables are less likely to become 1 (resp., 0). Thus, we can still apply the Chernoff’s bound [AD11, Theorem 1.16, Corollary 1.10] which states that for any negatively-correlated random variables such that and any it holds that
It follows immediately that .
Finally, using the binomial identity we get
This concludes the proof. ∎
Now, let us discuss the form of positional scoring functions used in the statement of Theorem 1. First, observe that for , if we set and we have that . This means that by asking each voter to rank only two candidates, we can correctly (in expectation) assess the Borda scores of the candidates.
For a candidate the expected value of the score computed by Algorithm -SEP-ALG for is the Borda score of .
Unfortunately, not every positional scoring function can be efficiently assessed while asking each voter to rank only few candidates. For example, we can generalize Corollary 2 and show that for any vector of two elements , the algorithm -SEP-ALG can only compute scores that are affine transformations of the Borda scores (thus, for the algorithm can only be used to approximate the Borda rule).
We will now describe the class of all positional scoring functions which can be computed correctly in expectation by our algorithm for any fixed . Since each positional scoring function is based on some -dimensional vector which can be expressed as , where , and so on, these -vectors form a basis of the linear space of positional scoring functions.
Let be the set of all positional scoring functions that can be computed (correctly in expectation) by our algorithm for a fixed . Since it holds for each two -element vectors that we have that is a linear space too.
Thus, is an -dimensional linear subspace of the -dimensional space of all positional scoring functions, and so we can compactly describe it by providing scoring functions forming a basis of . Figure 2 visually illustrates the scoring functions forming a basis for . In other words, for a given value of , we can use Theorem 1 to correctly compute (in expectation) all scoring functions which can be obtained as linear combinations of the scoring functions depicted in Figure 2.
Finally, let us give some intuition regarding the probabilities assessed in Theorem 1. For example, for candidates and voters the Borda score of a winning candidate is at least . Assume that we want to ask each voter to compare only two candidates, and set . When assessing the score of a winning candidate, to get we need about 72 thousands voters. For one million voters, this probability drops below .
Finally, note that Theorem 1 applies to any candidate, not only to election winners. This makes the result slightly more general, since it also applies to e.g., social welfare functions, where the goal is to output a ranking of the candidates instead of a single winner.
3.2 Minimax Rule
We will now investigate whether the Minimax rule can be well approximated when each voter is only asked to rank a few candidates. We will use an algorithm similar to Algorithm 1: each voter ranks a subset of candidates and whenever two candidates are ranked by a voter , we use her preference list to estimate . Notably, we scale the values for each two candidates by the number of times they were compared and use these normalized values to compute the Minimax winner. This algorithm is formalized in Algorithm 2.
For each candidate the probability that the total normalized score computed by Algorithm 2 for differs from the true Minimax score of by a multiplicative factor of at least is upper-bounded by:
First, let us fix a pair of candidates and let be the random variable describing the value as computed by Algorithm 2. Similarly as in the proof of Theorem 1 we can express as a sum of negatively correlated random variables.
Specifically, computing according to Algorithm 2 can be equivalently described as follows: First, we decide on how many voters will be asked to compare and . Let be the random variable describing this number of voters. Second, assuming , we pick uniformly at random a set of voters, and we ask them to compare and . For each voter , let denote the random variable equal 1 if voter said that she prefers to , and 0 otherwise. In particular, is zero when is not asked to compare and . Observe that if , then , and so the value of can be expressed as
We next compute the conditional expected value :
Next, we will use the Chernoff’s inequality to upper-bound the probability that the value of random variable does not differ from its expected value by a factor of . We first look at the conditional probability . As in the proof of Theorem 1, we note that the conditional variables are not independent, yet they are all negatively correlated—the fact that one variable becomes 1 (resp., 0) can only decrease the probabilities that some other becomes 1 (resp., 0). Thus, we can still apply the Chernoff’s bound [AD11, Theorem 1.16, Corollary 1.10], which states that for any negatively-correlated random variables such that and any it holds that
Since, for each , we have , and , we get that:
Next, we get that:
Notice that can be represented by the following. First we decide for out of voters to rank and . We then ask these voters to rank out of remaining candidates and ask all other voters to not rank and . This can be modeled by as is the total number of possible sets to ask a voter to rank and as discussed before, there are sets that contain and . Hence
Again using the binomial identity , we get
Finally, let . The probability that for candidate the score computed by Algorithm 2 differs from its true Minimax score by a multiplicative factor of at least is upper-bounded by
Clearly, we have:
Further, since by definition for each we have , it holds that: