“[…] and that government of the people, by the people, for the people, shall not perish from the earth.”
Abraham Lincoln’s Gettysburg Address culminated with the oft-quoted words above. This single sentence gives a remarkably succinct summary of the role of a country’s populace in a participatory democracy, identifying three distinct facets: (1) The government should be of the people: the members of the government should be drawn from — and by inference representative of — the country’s populace. (2) The government should be by the people: decisions should be made by the populace. (3) The government should be for the people: its objective should be to serve the interests of the populace. In Lincoln’s words, the central question we study here is the following:
If a government by the people is to be for the people, how important is it that it also be of the people?
In quantifying this question, we observe that there is a surprisingly clean mapping of Lincoln’s vision onto central concepts of social choice theory:
Who is the government of? Who are the candidates (people or ideas) to be aggregated?
Who is the government by? What are the social choice rules used for aggregation?
Who is the government for? What objective function is to be optimized?
While the exact social choice rules to be used have been a topic of vigorous debate for several centuries [14, 15, 5, 12], the broad class they are drawn from is generally agreed upon: voters provide an ordinal ranking of (a subset of) the candidates, and these rankings are then aggregated to produce either a single winner or a consensus ranking of all (or some) candidates. Social choice is limited by the severe impossibility results of Arrow  and Gibbard and Satterthwaite [19, 28], establishing that even very simple combinations of desired axioms are in general unachievable. These impossibility results in turn have resulted in a fruitful line of work exploring restrictions on individuals’ preference orders for circumventing the impossibility of social choice.
One of the avenues toward circumventing the impossibility results simultaneously doubles as a framework for addressing the third question: What objective function is to be optimized by the social choice rule? The key modeling assumption is that all candidates (ideas or people) and voters are embedded in a metric space: small distances model high agreement, while large distances correspond to disagreement [8, 17, 9, 24, 23, 7, 27, 6]. The metric induces a preference order over candidates for each voter: she simply ranks candidates by distance from herself. When the metric space is specifically the line, we obtain the well-known and much studied special case of single-peaked preferences [8, 24]. Embedding voters and candidates in a metric space has historically served two purposes: (1) Restricting the metric space — for example, by limiting its dimension — defines a restricted class of ordinal preference profiles, and might help circumvent the classic impossibility results of social choice. (2) The distances naturally provide an objective function: the best alternative is the one that is closest to the voters on average. Even when the metric space is unrestricted, replacing the hard axioms of social choice theory with this objective function can “circumvent” impossibility results through approximation , and permits comparing different social rules by quantifying their worst-case performance.
While distances yield cardinal preferences and a social objective function, it is arguably unrealistic to expect individuals to articulate distances accurately. It is consequently unsurprising that common and well-established voting rules typically restrict voters to providing ordinal information, such as rankings or a single vote. Therefore, we view the metric space as implicit, and a social choice function as optimizing the associated cardinal objective function using only ordinal information.
This viewpoint was recently crisply expressed in a sequence of works originating with Anshelevich et al. [2, 3, 4, 1, 20]. In particular, Anshelevich et al.  examine many of the most widely used election voting rules, guided by the question: “How much worse is the outcome of voting than would be the omniscient choice of the best available candidate?” They showed remarkable separations: while some voting rules guarantee a distortion of no more than a constant factor, others are off by a factor that increases linearly in the number of candidates or — even worse — voters. The simplest, and in some sense canonical, example of such distortion is captured as follows:
A population consists of voters of whom just below half lean solidly left (at position ), while just over half are just to the right of center (at position ). The population conducts an election between a solidly left-wing (position ) and a solidly right-wing (position 1) candidate.
Because the centrist voters express their (slight) preference for the right-wing candidate, he is elected by a small majority. However, the average distance from the population to the right-wing candidate (1.5) is thrice that to the left-wing candidate (0.5), meaning that the majority vote led to a loss of a factor three in the utility.
We follow prior nomenclature in this domain [26, 10, 13, 11, 2] and term this utility loss the distortion. In examining Example 1 more closely, we identify a likely culprit for the high distortion: the right-wing candidate was not representative of the population — he was not of the people. Had we drawn two candidates from the population, the winner would in fact always be the socially optimal choice in this case. If we wanted to create the possibility of recreating the above example, we would need to move some fraction of the population to the right wing. If were large, then the election of a right-wing candidate would not be nearly as bad according to the objective function; conversely, if were small, then it would be unlikely that a right-wing candidate would run, so most of the time, the social choice rule would select an optimal candidate. Thus, intuitively, when candidates are drawn from the population, we would expect the distortion in the social cost to be better than when they are not. The goal of this article is to investigate to what extent this intuition holds.
Formally, we assume that the candidates and voters are jointly located in a (finite) metric space; the distance between and is denoted by
. The candidates’ locations are given by a probability distribution, while the voters’ location distribution is denoted by . In order to isolate the driving question and side-step issues of specific voting rules, we focus on the simplest social choice scenario: two candidates are drawn i.i.d. from , and a simple majority vote determines the winner between them. Voter votes for the one of who is closer111 Throughout, we will assume when convenient that the metric and distribution are in general position. Specifically, there are no ties in any voter’s preference order, and there are no ties in any election outcome. Ties could in principle be dealt with using suitable tie breaking rules, but the slight gain in generality would not be worth the overhead. to . The social cost of candidate is . With denoting the winner of the election and the socially optimal candidate, the expected distortion of voting is . Our goal is then to understand whether and by how much the distortion decreases when candidates are of the people (when ).
We begin our investigation with arguably the simplest metric space, which nonetheless is frequently used to describe the political spectrum of countries: the line. As we saw in Example 1, even for the line, voting between two arbitrary candidates can lead to a distortion of 3. Our first main result (proved in Section 3) is that when two candidates are drawn i.i.d. from , the expected distortion is at most , and this bound is tight. The lower-bound example is in fact of the type discussed after Example 1, obtained by moving a suitable population mass from location to location 1. The more difficult part of the proof is the upper bound, and in particular, the proof that the worst-case distribution of voters/candidates always has support size 3. The proof proceeds by showing that for larger support sizes, there is always a sequence of alterations that gradually shifts the population to fewer locations, without lowering the distortion.
Next, we turn our attention to general metric spaces. For arbitrary metric spaces, the distortion of voting can be larger. In Section 5, we analyze a simple example: just under half the population is located at one point , while the rest of the population is spread out evenly over locations that are at distances just below 1 from each other and at distance 1 from . As , we show that the expected distortion converges to . The upper bound we establish in Section 5 does not match this lower bound: we show that for every metric and every , the expected distortion is at most . We conjecture that the bound of is in fact tight — proving or disproving this conjecture is a natural direction for future work, discussed in Section 6.
The significance of our upper bounds on distortion (for the line and for general metric spaces) arises from the contrast to the corresponding bounds when . In revisiting the improved distortion results we prove, we notice two potential driving factors: (1) The two candidates are independently and identically distributed. (2) The distributions of candidates and voters are the same. One may wonder whether the innocuous-looking assumption of i.i.d. candidates alone could be responsible for the lower distortion, without requiring that candidates be of the people. In Section 4, we rule out this possibility by establishing a (tight) bound of 2 on the distortion of voting when candidates are drawn i.i.d. from , both in general metrics and on the line. The (small, but constant) gap between the distortions of and 2 in general metric spaces, and the significant gap between the distortions of and on the line, show that government by the people is more efficient when it is also of the people. The exact size of the gap between the two distortions in general metric spaces is a natural open question.
There has been a lot of interest recently in circumventing the impossibility results of voting and social choice by approximation; see, e.g., [26, 25, 13] and  for a recent survey. Of particular interest is the recent direction in which the voters’ objective functions are derived from proximity in a metric space [2, 3, 4, 1, 20, 18]. One of the important issues is providing incentives for truthful revelation of preferences (e.g., ); in this paper, we side-step this issue by considering only elections between two candidates at a time.
Our work is most directly inspired by the recent work of Anshelevich et al. [2, 3], which analyzes the distortion of ordinal voting rules when evaluated for metric preferences. Our work departs from [2, 3] in assuming that the candidates themselves are drawn i.i.d. from underlying distributions, and in particular in analyzing the case when the distribution of the candidates is equal to that of the voters.
Anshelevich and Postl  consider a condition of instances that also aims to capture that candidates are in some sense “representative” of the voting population. Specifically, they define a notion of decisiveness as follows: Let be a voter, and her two closest candidates, with . An instance is -decisive (for ) if for all ; in other words, when , every voter has a strongly preferred candidate. Naturally, the decisiveness condition is applicable only in elections in which the number of candidates is large or the space of voters is highly clustered. In our work, by considering candidates drawn from the voter distribution, we avoid such assumptions.
The candidates and voters are embedded in a finite metric space with points (locations) . Depending on the context, we will refer to as a point, candidate, or voter. The probability for a candidate to be drawn from point is ; we write . The fraction of voters at is , summarized as . For a subset of points , we write to denote the total probability mass in , and similarly for . The social cost of a candidate is his average distance to all voters:
When candidates and are competing, each voter votes for the candidate that is closer222Recall the discussion of tie breaking in Footnote 1. to her, i.e., for . The winner is the candidate who gets more votes: wins iff . For two candidates , let denote the winner as just described, and let be the candidate of lower social cost. The distortion of an election between two candidates is defined as
We are interested in the (expected) distortion of the instance , defined as the expected distortion of an election between two candidates drawn i.i.d. from the candidate distribution :
In particular, our goal is to analyze the worst-case distortion when the candidates are representative and when they are not, that is, we want to find the gap between
3 Identical Distributions on the Line
We begin with the simplest setting: the underlying metric space is the line, and two candidates are drawn independently from the population of voters (). We first show a family of examples (a variant of Example 1) for which the expected distortion gets arbitrarily close to .
The metric space is the line, denoted by . There are voters at location , voters at , and voters at . This example is obtained from Example 1 by moving a suitable fraction of voters from location to , carefully trading off between two factors: (1) decreasing the pairwise distortion between the candidates at and , but (2) increasing the chance of a such an election happening.
Because the voters at are slightly closer to 1 than to -1, a candidate drawn from will win against a candidate drawn from . The costs of the two candidates are
Because the candidates are drawn independently from , the election between and happens with probability . In all other cases (when a candidate from runs against one from or , or both candidates are from the same location), the voters elect the socially better candidate. Therefore, the expected distortion is
Our first main result is that Example 3 gives the worst distortion on the line.
For any distribution , we have .
3.1 Characterizing the Structure of Voting on the Line
Given a distribution on the line with support size , we label the support points as from left to right. Let be the index of the median333Recall that we assume the instance to be in general position, which implies uniqueness of the median., and let and denote the locations to the left and to the right of the median, respectively. By the definition of the median, and .
If two candidates are drawn, the one closer to wins the election.
Without loss of generality, we assume that and ; that is, lies to the left of the median, or is the median. There are two cases depending on whether is also to the left of .
If , then all voters to the right of the median as well as the median are going to vote for , so gets a fraction of the votes.
If , then all voters in as well as are going to vote for , so gets a fraction of the votes.
In either case, gets more than half of the votes and wins the election. ∎
The next lemma characterizes the social cost ordering on the line. If are on the same side of the median (including one of them being the median), the one closer to has smaller social cost.
Without loss of generality, assume that , , and . Intuitively, has smaller social cost because more than half of the population need to first get to before they can get to . Formally, we have
3.2 Proof of the Upper Bound of
In this section, we prove Theorem 3, showing that the worst-case distortion on the line is . The high-level idea is that, given any instance with support size larger than 3, we can iteratively reduce its support size to 3 using a series of operations (Lemmas 3.2, 3.2 and 3.2), while preserving (or increasing) . Once the instance has support size 3, we can optimize the locations and probabilities of these 3 points.
As before, let be the index of the median, and let and denote the points to the left and to the right of the median, respectively. We can assume that both and are non-empty; otherwise, the median is the leftmost or rightmost point, and we always elect the socially better candidate.
The proof proceeds by moving probability mass within or within to merge points until . None of the operations in this section will change the median , so the election results are still decided by the candidates’ distance to .
When shifting the probability mass, we will not be able to guarantee that no pairwise election sees a decrease in distortion. Instead, we use a more global argument to show that the operation increases the distortion on average. We define to be the expected distortion conditioned on one of the candidates being , and the other candidate being drawn according to , that is,
We will show that so long as , , and remain the same, is a linear function of the average distortion on one side of the median. By Lemmas 3.1 and 3.1, the pairwise distortion can be larger than 1 only if two candidates are on different sides of ; therefore,
The two preceding equations formalize that whenever and stay constant and (or ) does not decrease, also does not decrease. This fact is exploited repeatedly in the proofs of the following lemmas.
Let be the “worst” candidate in . Then, moving all probability mass from indices to does not decrease . (A symmetric claim holds for the worst candidate .)
Since the operation does not change or , it is sufficient to show that does not decrease. By Lemma 3.1, all election results between pairs are preserved. Let and denote the corresponding values of and after the operation. Then, for all ,
After the shift of probability mass, is the largest index. Consider . Using that elections between two candidates on the same side of the median always result in the socially better candidate winning, we bound
Any candidates that used to be at are now at , and used to be the worst candidate in . Hence, for all of the probability mass from locations , the expected distortion also weakly increases. Combining these two cases, we get
Lemma 3.2 can be applied repeatedly unless the two worst candidates and are the leftmost and rightmost points. We next show that in that case, either all the probability mass of or all the probability mass of can be moved to or , respectively.
Let and be the worst candidates in and , respectively. Assume w.l.o.g. that . If and , then moving all probability mass from to does not decrease .
As for the previous lemma, because we are only shifting probability mass within , it is sufficient to show that does not decrease. Because more probability mass moved closer to , we have that , and because probability mass moved away from (to the right), we get that for all .
By Lemma 3.1, loses all of his elections both before and after the move. Moreover, by Lemma 3.1, we get before the move, and after the move. Since for all , we get that . Finally, because used to be the worst candidate in , and after the move of probability mass is the only candidate in , we bound
which concludes the proof. ∎
Let , be the worst candidates in and , respectively. If , and , then the size of can be reduced by 1 without decreasing .
Notice that and . Recall that the only elections in which the winner could be socially inferior are those involving and a candidate . Also, because for all , we obtain that loses all elections. We split the proof into two cases.
If there exists an with , then in particular, . Thus, candidate wins all elections against (as he should) and against (as he should), while losing to (as he should). This implies that .
Consider the effect of moving all probability mass from to the median . First, all election results remain the same. The contribution of the probability mass that used to be at to the distortion does not change. (It was 1 before and is still 1.) Furthermore, decreases while increases for all . Because loses all pairwise elections, the overall distortion can only increase.
If for all , the expected distortion is exactly
Let denote the position of point on the line. Since , we have a point at position in with . Writing and , we get that , and . Hence, we can rewrite
Treating everything except as constant, this expression is of the form for all , where , , and are constants independent of . The derivative of this expression with respect to is ; its sign is always the sign of . If , then increasing to strictly increases the expected distortion; otherwise, can be decreased to without decreasing the expected distortion. In either case, we reduce the size of by 1.
We are now ready to prove Theorem 3.
Theorem 3 By Lemmas 3.2, 3.2 and 3.2, the worst-case instance has support size (at most) 3. Let be the locations on the line. By rescaling and mirroring, we may assume without loss of generality that , , and .
If were not the median of the distribution, then the socially better candidate would always win, giving . So in a worst-case distribution, must be the median, and the socially worse candidate must win the election between and . Because , is closer to the median, so he wins the election between and ; therefore, must have lower cost than . The expected distortion is
This expression is monotonically decreasing in and monotonically increasing in , so it is maximized when we take the limit and . In particular,
which is maximized at (as in Example 3), where it attains a value of .
4 Different Distributions
In this section, we prove a tight bound of 2 on the worst-case distortion of voting, when two candidates are drawn i.i.d. from a distribution which may be different from the voter distribution . This ratio is tight for both general metric spaces and the line, and the lemmas we prove in this section apply to arbitrary metric spaces.
We begin with an example on the line (a variant of Example 1) which establishes the lower bound of 2. The candidate distribution has probability 1/2 at position , and the other at position 1. The voter distribution has a fraction of the voters at position , while the remaining voters are just to the right of center at position . With probability 1/2, we draw two different candidates, and the distortion is ; otherwise, we draw two candidates from the same location, getting a distortion of 1. Therefore, the expected distortion of the instance is as .
The challenge is to establish the matching upper bound. In proving the upper bound, some of the techniques we establish will be useful in Section 5.
For all instances , the expected distortion is at most .
The overall proof structure is as follows. First, we show in Lemma 4 that if , then . That is, while the election winner can be socially worse, he cannot be too much worse.444Lemma 4 is a special case of the more general result [2, Theorem 4]; we present a self-contained proof here for completeness. Lemma 4 is the only place where we use the metric structure and the voter distribution. Subsequently, we rewrite the social cost function accordingly, and then treat the costs as completely arbitrary numbers.
Second, in Lemma 4, we prove that if all pairwise elections have distortion at most , then . (While in this section, we will only use the lemma with , the version with general constitutes a key step in Section 5.)
[] Let . Then, .
In the following derivation, we will use that:
Because beats , at least half of the voters are at least as close to as to .
For any voter who is at least as close to as to , the triangle inequality implies that .
Then, we can bound as follows:
For any and any instance , if for all , then .
Consider an instance and its associated costs . Without loss of generality, assume that . For each candidate , let . Notice that by the assumption that for all , whenever , we have that , resulting in a cost ratio of 1. We can therefore bound the expected distortion (minus 1) as follows:
The upper bound assumes that the worse candidate wins whenever the two candidates’ social costs are within a factor of of each other. Note that this upper bound
makes no more reference to distances or voter distributions. It depends on a distribution over candidates and a cost vector, both of which can be arbitrary, and it assumes that all elections whose candidates’ costs are more than a factorapart choose the socially better candidate, while all other elections choose the socially worse candidate.
We will now argue that is at most . First, we show that the expression is maximized by moving probability mass so that and are at most a factor apart for every and in the support of . Suppose that there exists a pair in the support of with , i.e., with . Consider moving probability mass from to , where a negative value of moves probability mass from to ; call the resulting probability vector . Because our choice of and avoids the bilinear term in (3), is a linear function of . Therefore, the expression is maximized at an extreme, i.e., by moving all the probability mass from one of and to the other.
Once all points in the support of are at most a factor apart in social cost, the expression for in (3) becomes a sum over all pairs of points. Assume that the support of has size , and associated costs . (The inequalities can be assumed to be strict, because two points with the same cost can be merged without affecting the value .) Considering all terms except as constants, is of the form (with ), which is convex in . In particular, it attains its maximum at or . In either case, we can merge the probability mass of point 2 with 1 or 3, reducing the support size by 1 without decreasing . By repeating such merges, we eventually arrive at a distribution with support size and . Finally, we can bound
5 Identical Distributions in General Metric Spaces
In this section, we examine the setting where the underlying metric space is arbitrary, and the candidates are drawn independently from the population of voters. We establish the following main theorem:
The worst-case distortion is between and .
Key to the upper bound portion of this theorem is the following lemma. Assume that . Let be an instance with maximum pairwise distortion (exactly) . Then, . We prove Lemma 5 in Section 5.2. That proof relies on the following structural characterization: if a pair of candidates has distortion for sufficiently small , then the instance must be very structured: nearly half the probability mass must be concentrated very close to the socially optimal candidate, and most of the remaining candidates must be nearly equidistant to the two candidates.
Theorem 5 We begin by proving the lower bound, by constructing a family of instances whose distortion converges to . We label the points . We set , and all other . The distances555To avoid tie breaking issues, consider the distances as perturbed by distinct and very small amounts. are for all , and for all . See Figure 4 for an illustration.
This way, all voters/candidates in the set prefer each other over the voter/candidate . Therefore, even though candidate is socially optimal (with a cost , he loses to any other candidate in the election; the other candidates’ costs are .
With probability , an election occurs between candidate 0 and some other candidate , resulting in distortion . In the other cases (two candidates from 0, or two candidates ), the distortion is at least 1. Hence, the overall expected distortion is at least . As and , the distortion approaches .
For the upper bound, let and consider the following two cases. If all pairwise elections have distortion at most , then Lemma 4 implies that the overall expected distortion is at most . If some pair of candidates has distortion at least , then Lemma 5 implies that the overall expected distortion is at most . Together, these two cases complete the proof of the theorem.
As mentioned above, the key insight in the proof of Lemma 5 is that when a pair of candidates has , nearly half the probability mass must be concentrated very close to the socially optimal candidate, and most of the remaining candidates must be nearly equidistant to the two candidates. Trading off these four sources of approximation makes the proof of the lemma fairly complex. To illustrate the key ideas more cleanly, we therefore begin by proving the following special case of Lemma 5 with .
Let be an instance. If there exists a pair of candidates with , then .
As before, we let denote the total probability mass in . In addition, throughout this section, is the conditional candidate/voter distribution given that candidate is drawn from ; that is, . We use to denote the average distance from to the set , i.e., .
5.1 Proof of Lemma 5
Assume that and . We assume without loss of generality that . The fact that implies very stringent conditions on the instance: we will begin by showing that half of the probability mass must be at , is socially optimal, and all other locations are at distance666In a sense, this extreme example does rely on tie breaking. Since we are proving an upper bound here, this is not a concern. 1 from and .
Let be the set of voters preferring over , and the set of voters preferring over . Then, we can bound