Ordinal Approximation for Social Choice, Matching, and Facility Location Problems given Candidate Positions

05/08/2018 ∙ by Elliot Anshelevich, et al. ∙ 0

In this work we consider general facility location and social choice problems, in which sets of agents A and facilities F are located in a metric space, and our goal is to assign agents to facilities (as well as choose which facilities to open) in order to optimize the social cost. We form new algorithms to do this in the presence of only ordinal information, i.e., when the true costs or distances from the agents to the facilities are unknown, and only the ordinal preferences of the agents for the facilities are available. The main difference between our work and previous work in this area is that while we assume that only ordinal information about agent preferences in known, we know the exact locations of the possible facilities F. Due to this extra information about the facilities, we are able to form powerful algorithms which have small distortion, i.e., perform almost as well as omniscient algorithms but use only ordinal information about agent preferences. For example, we present natural social choice mechanisms for choosing a single facility to open with distortion of at most 3 for minimizing both the total and the median social cost; this factor is provably the best possible. We analyze many general problems including matching, k-center, and k-median, and present black-box reductions from omniscient approximation algorithms with approximation factor β to ordinal algorithms with approximation factor 1+2β; doing this gives new ordinal algorithms for many important problems, and establishes a toolkit for analyzing such problems in the future.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Many important problems involve assigning agents to facilities. For example, assigning patients to hospitals, students to universities, people to houses, etc. The target of assignment problems is usually to minimize social cost or maximize social welfare. When we consider the social cost of assignment problems, it is natural to assume the agents prefer facilities that are “closer” to them in some sense, thus the social cost of an agent is often represented by the distance between the agent and the facility it is assigned to. Besides the cost of distances, there are many other cost functions and constraints for different problems; for example, in the capacitated facility assignment problem, each facility has a maximum number of agents it can accommodate.

In this work we consider general facility location problems, in which sets of agents and facilities are located in a metric space, and our goal is to assign agents to facilities (as well as choose which facilities to open) so that agents are assigned to facilities which are close to them. For example, may be possible locations for opening new stores, and the goal may be that all agents have a store near them, or that the sum of agent distances to the stores they are assigned to is small, etc. This setting also captures many social choice problems, in which the facilities correspond to candidates, and the goal would be to choose a single candidate (and assign all agents to this candidate) so that the distances from the agents to the chosen candidate are small. Here the distances correspond to spatial preferences, i.e., the metric space represents the ideological space in which a more preferred candidate would be closer to me; see [18, 3] for discussion of such spatial preferences in social choice. Our setting also captures matching and many related problems, in which we would open all facilities, but are only able to assign one agent to each facility, thus forming a matching between agents and facilities; facilities here could correspond to houses or items, for example.

If the distances between agents and facilities are known, then we can calculate the optimal solution for these assignment problems. Note that many of the facility location problems are NP-Complete, but at least it is possible to compute optimum assignments of agents to facilities (or the optimum candidates to select for social choice settings) given unlimited computational resources. For many of the settings we mentioned above, however, it is unlikely that we know the exact distances from the agents to the facilities. For social choice these distances would correspond to the cardinal preferences of voters for candidates, for example, “My cost for candidate X winning is exactly 2.35.” It is far more common that only ordinal preferences of the agents for the candidates are known, i.e., “I prefer X to Y”. Similarly, when trying to form a matching, or even in general facility location problems where we survey the agents to find out their preferences, it is much easier to elicit ordinal preferences (“I prefer to be matched with X over Y”) over precise numerical preferences. These observations have recently led to a large body of work using the utilitarian approach, in which we assume that some latent numerical costs or utilities exist, but we only know the ordinal preferences of the agents, not their underlying numerical costs. See for example [3, 4, 11, 23, 21, 29, 16] for the social choice setting, [5, 6, 7, 1] for matching and other graph problems, and [13] for facility location. These works focus on measuring the distortion of various algorithms: a measure of how well an algorithm behaves when using only ordinal information, as compared to the optimum algorithm which has access to the true underlying numerical information. More formally, the distortion [28, 3] of an assignment is defined as the worst-case ratio of its social cost to the social cost of the optimal solution.

As in the work mentioned above, we assume that only ordinal information about the distances between agents and facilities is known. However, although the locations and numerical preferences of the agents are usually difficult to obtain, the locations of facilities are mostly public information. The locations of political candidates in ideological space can be reasonably well estimated based on their voting records and public statements. When forming a survey about new stores to open, we may not know exactly how much the customers would prefer one store over the other since the customer locations may be private, but the locations of the possible stores themselves are public knowledge. The main difference between our work and previous work in this area is that we assume:


While only ordinal information about agent preferences in known, we know the exact locations of the possible facilities .

As we discuss below, this extra information about the locations of the facilities relative to each other allows us to produce much stronger algorithms, and show much nicer bounds on distortion. In fact, in many cases, we do not even need the full information about the locations of the facilities. The main message of this paper is that having a small amount of information about the candidates in social choice settings, or the facilities in facility location, allows us to obtain solutions which are provably close to optimal for a large class of problems even though the only information we have about the agent preferences is ordinal, and thus it is impossible (even given unlimited computational resources) to compute the true optimum solution.

1.1 Our Contributions

We begin by looking at the social choice setting, in which we have agents and candidates in a metric space, and we are given an ordinal ranking of each agent for the candidates. This setting was considered in e.g., [3, 4, 23, 21, 29, 16, 24]. In particular, for the objective of minimizing the total distance from the agents to the chosen candidate, [3] showed that Copeland and similar voting mechanisms always have distortion of at most 5, while no deterministic voting mechanism can achieve a worst-case distortion of less than 3. Finding a deterministic mechanism with distortion less than 5 has been an open problem for several years [23]. In this paper, we show that if we know the exact locations of the candidates in addition to the ordinal ranking of the agents, then there is a simple algorithm which achieves a distortion of 3, and no better bound is possible. In other words, while we do not know the true distances from agents to candidates, we can compute an outcome which is a 3-approximation no matter what the true distances are, as long as they are consistent with the ordinal preferences given to us. Moreover, this approximation is possible even if for each agent we are only given their favorite (i.e., top-choice) candidate: there is no need for the agents to submit a full preference ranking over all the alternatives.

We also study other objective functions in addition to minimizing the total distance from agents to the chosen alternative. We give a natural deterministic voting mechanism which has distortion at most 3 for objectives such as minimizing the median voter cost, the egalitarian objective of minimizing maximum voter cost, and many other objectives. This mechanism achieves all these approximation guarantees simultaneously, and moreover it does not need the exact locations of the candidates: it suffices to be given an ordinal ranking of the distances from each candidate to each other candidate. In other words, this mechanism is especially suitable for the case when candidates are a subset of voters, as our mechanism will obtain the ordinal ranking of each voter for all the candidates, and this is the only information which would be required. Note that [3] proved that no deterministic mechanism can achieve a distortion of better than 5 for the median objective; the reason why we are able to achieve a distortion of 3 here is precisely because we also know how each candidate ranks all the other candidates, in addition to how each voter ranks all the candidates.

We then proceed to our general facility assignment model. We are given a set of agents and a set of facilities in a metric space. The distances between facilities are given, but the distances between agents and facilities are unknown; instead we only know ordinal preferences of agents over facilities which are consistent with the true underlying distances. There could be arbitrary constraints on the assignment, such as facility capacities, or constraints enforcing that some agents cannot be (or must be) assigned to the same facility, etc. A valid assignment is to assign each agent to a facility without violating the constraints. We consider many different social cost functions to optimize. For a general class of cost functions (essentially ones which are monotone and subadditive), we give a black-box reduction which converts an algorithm for the omniscient version of this problem (i.e., the version where the true distances are known) to an ordinal algorithm with small distortion. Specifically, if we have an omniscient algorithm which always produces an assignment which is a -approximation to the optimum, then using it we can create an ordinal algorithm which only knows the ordinal preferences of the agents instead of their true distances to the facilities, but has distortion of at most .

Omniscient: Agents’ ordinal prefs Only agents’ ordinal
full distances and facility locations prefs (lower bounds)
Total (Sum) Social Choice 1 3 5(3)
Median Social Choice 1 3 5(5)
Min Weight Bipartite Matching 1 3 (3)
Egalitarian Bipartite Matching 1 3 -(2)
Facility Location 1.488 [27] 3.976 ()
-center 2 [25] 5 - (-)
-median 2.675 [12] 6.35 - ()
Table 1: Best known distortion of polynomial-time algorithms in different settings. “Omniscient” stands for the setting where all the distances between agents and facilities are known, and the numbers represent the best-known approximation ratios. The second column represent our setting, in which the ordinal preferences of the agents, and the numerical distances between facilities are known. The last column represents the pure ordinal setting in which only the agent ordinal preferences are known, but the distances between facilities are unknown; this setting has been previously studied, and we include the known lower bounds on the possible distortion in parentheses, including some which we prove in the Appendix.

Many well-known problems fall into our facility assignment model; Table 1 summarizes some of our results. For example, classic facility location with facility costs, minimum weight bipartite matching, egalitarian bipartite matching, -center, and -median are all special cases. In particular our results show that if we are given unbounded computational resources, then it is always possible to form an assignment with distortion of at most 3 for these problems, and no better bound is possible simply due to the fact that we do not possess all the relevant information to compute the true optimum. This is a large improvement over previously known distortion bounds: for minimum cost ordinal matching the best-known distortion bound is using random serial dictatorship (RSD) [13]; by using the knowledge of facility locations we are able to reduce this approximation ratio to 3.

1.2 Discussion and Related Work

Ordinal approximation [2] for the minimum social cost (or maximum social welfare) with underlying utilities/distances between agents and alternatives has been studied in many settings including social choice [28, 11, 3, 4, 23, 21, 14, 29, 16], matchings [9, 22, 5, 6, 13, 17, 7], secretary problems [26], participatory budgeting [8], general graph problems [5, 1] and many other models in recent years. The general assumption of the ordinal setting is that we only have the ordinal preferences of agents over alternatives, and the goal is to form a solution that has close to optimal social cost. There are different models: social choice, matching, facility location, etc.; different objectives: minimizing social cost, maximizing social welfare, total cost objective, median objective, egalitarian objective, etc.; different assumptions on utility or cost functions: unit-sum, unit-range, metric space, etc. In this paper, we study general facility assignment problems in a metric space, and assume that the ordinal preferences of agents over alternatives are given. Unlike previous work on this topic, we also assume the locations of the alternatives are known; we show that this extra information enables us to achieve much better approximation ratios than in the pure ordinal setting for many problems.

The distortion of social choice functions was first introduced in [28], to describe the ratio between the total utility of the optimal candidate and the candidate selected by a mechanism using only ordinal preferences. [3, 29, 23] studied the distortion of social choice functions in a metric space; the assumption that the underlying numerical costs have this metric property allows for much better results than more general costs. In particular, for the objective of minimizing the total distance from the agents to the chosen candidate, the above papers were able to show good distortion bounds for many well-known mechanisms, in particular a bound of 5 for Copeland [3], a bound of for Single Transferable Vote (STV) [29], and many others. In addition, [3] proved that no deterministic mechanism can have worst-case distortion better than 3, and [29] showed that all scoring rules for -candidates have a distortion of at least . Goel et al. [23] showed that Ranked Pairs, and the Schulze rule have a worst-case distortion of at least 5, and the expected worst-case distortion of any (weighted)-tournament rule is at least 3. They also introduced the notion of “fairness” of social choice rules, and discussed the fairness ratio of Copeland, Randomized Dictatorship, and a general class of cost functions. Finding a deterministic mechanism with distortion less than 5 has been an open problem for several years. In this paper, we show that if we know the exact locations of the candidates in addition to the ordinal ranking of the agents, then there is a simple algorithm which achieves a distortion of 3, and no better bound is possible.

While the above work, as well as our paper, only focuses on deterministic algorithms, the distortion of randomized algorithms in social choice has also been considered, see for example [4, 19, 24, 21]. In a slightly different flavor of result, [15, 16] consider the special case where candidates are randomly and independently drawn from the set of voters. While we leave the analysis of randomized algorithms which know the location of the facilities to future work, and consider the worst-case candidate locations, it is worth pointing out that our deterministic algorithm achieves a distortion of 3, which is also the best known distortion bound for any randomized mechanism which only knows the ordinal preferences of the agents. Similarly, another common goal is to form truthful mechanisms with small distortion for matching and social choice, as in [21, 6, 13]; we focus on general mechanisms in this paper in order to understand the limitations of knowing only certain kinds of ordinal information, and leave the goal of forming truthful mechanisms for future work.

For the median objective of social choice problems, [3] showed that Copeland gives a distortion of at most 5, while no deterministic mechanism can achieve a distortion of better than 5 . [4] also gave a randomized algorithm that has a distortion of at most 4. In this paper, we are able to improve this bound to a tight worst-case distortion of 3 by a deterministic mechanism, because we also know how each candidate ranks all the other candidates, in addition to how each voter ranks all the candidates.

The distortion of matching in a metric space has received far less attention than social choice questions. [5, 6, 7] analyzed maximum-weight metric matching; the maximization objective makes this problem far easier, and even choosing a uniformly random matching yields a distortion of a small constant. This is very different from our goal of computing a minimum-cost matching, for which no ordinal approximations better than are known. [13] studied facility assignment problems in a metric space; they considered the problem with or without resource augmentation, and the cases without augmentation are exactly the minimum weight bipartite matching problem. [13] showed that the approximation ratio of random serial dictatorship (RSD) is at most , and gave a lower bound of for the approximation ratio of serial dictatorship (SD), and a lower bound of for RSD. Their results are the best known ordinal approximations for this problem. In this paper, we are able to give a tight 3-approximation for the minimum weight matching problem, given the locations of facilities in addition to the agents’ ordinal preferences.

2 Model and Notation: Social Choice

For the social choice problems studied in this paper, we let be a set of agents, and let be a set of alternatives, which we will also refer to sometimes as candidates or facilities. We will typically use and to refer to agents and to refer to alternatives. Let be the set of total orders on the set of alternatives . Every agent has a preference ranking ; by we will mean that is preferred over in ranking . Although we assume that each agent has a total order of preference over the alternatives and that this order is known to us, for many of our results it is only necessary that the top choice of each agent is known. We say is ’s top choice if prefers to every other alternative in

. We call the vector

a preference profile. We say that an alternative pairwise defeats if . The goal is to choose a single winning alternative.

Cardinal Metric Costs.

In this work we take the utilitarian view, and assume that the ordinal preferences are derived from underlying (latent) cardinal agent costs. Formally, we assume that there exists an arbitrary metric on the set of agents and alternatives. The cost incurred by agent of alternative being selected is represented by , which is the distance between and . Such spatial preferences are relatively common and well-motivated, see for example [18, 3] and the references therein. The underlying distances are unknown, but unlike most previous work we do assume the distances between alternatives are given. For example, when alternatives represent facilities or stores to be opened, it makes sense that their specific locations would be known, while the distances from the customers to the stores may be private. Similarly, when the alternatives represent political candidates, it may be easy to estimate their locations in ideological space (for example based on their voting records and public statements), but the ideology of the voters is much harder to estimate, with mechanism designers only knowing which candidates the voters prefer but not how much they prefer them. The distance between two alternatives and is denoted by . We say that is consistent with if , .

The metric costs naturally give rise to a preference profile. We say that is consistent with if , , if , then . It means that the cost of is less than the cost of for agent , so agent prefers over . As described above, we know exactly the distances and the preferences , but do not know the true costs which give rise to . Let be the set of metrics that are consistent with and ; we know that one of the metrics from this possibly infinite space captures the true costs, but do not know which one.

Social Cost Distortion

We study several objective functions for social cost in this paper. First, the most common notion of social cost is the sum objective function, defined as . We also study the median objective function, , as well as the egalitarian objective and many others (see Section 3.2). We use the notion of distortion to quantify the quality of an alternative in the worst case, similar to the notation in [11, 28]. For any alternative , we define the distortion of as the ratio between the social cost of and the optimal alternative:

In other words, saying that the distortion of is at most 3 means that, no matter what the true costs are (as long as they are consistent with the and which we know), it must be that the social cost of is within a factor of 3 of the true optimum alternative, which is impossible to compute without knowing the true costs. Because of this, a small distortion value means that there is no need to obtain the true agent costs, and the ordinal information (together with information about the alternatives) is enough to form a good solution.

A social choice function on and takes and as input, and returns the winning alternative. We say the distortion of is the same as the distortion of the winning alternative chosen by on and . In other words, the distortion of a social choice mechanism on a profile and facility distances is the worst-case ratio between the social cost of , and the social cost of the true optimal alternative.

3 Distortion of Social Choice Mechanisms

3.1 Distortion of Total Social Cost

In this section, we study the sum objective and provide a deterministic algorithm that gives a distortion of at most 3. According to [3], the lower bound on the distortion for deterministic social choice functions with only ordinal preferences (without knowing ) is 3. This occurs in the simple example with 2 alternatives which are tied with approximately half preferring each one. No matter which one is chosen, the true optimum could be the other one, and its social cost can be as much as 3 times better. Because the example in Theorem 3 from [3] only has two alternatives, knowing does not provide any extra information, and thus that example also provides a lower bound of 3 in our setting, although we assume the distances between facilities are known in this paper. Therefore, our mechanism achieves the best possible distortion in this setting. Note that if we only have ordinal preferences of the agents without the distances between facilities, then the best known approach so far is Copeland, which gives a distortion at most 5. Thus our results establish that by knowing the distances between alternatives, it is possible to reduce the distortion from 5 to 3, and no better deterministic mechanism is possible.

Lemma 3.1.

Let be alternatives. If , then . [Lemma 5 in [3]]

In the following algorithm, we generate a set of projected agents as follows: Given agents , alternatives , and the preference profile , for each agent denote alternative as ’s top choice. Then we create a new agent at the location of in the metric space, as shown in Figure 1 (a); consequently, , . Denote the set of the new agents as . For any metric consistent with , , so the distances between agents in and alternatives in are known to us, unlike the true distances between and .

Figure 1: (a) For each agent, generate a projected agent at the location of its top choice alternative. (b) A figure demonstrating agent , ’s top choice alternative , ’s projected agent located at , the winner , and the optimal alternative for the proof of Theorem 3.2.
Input : Agents ,
Alternatives ,
Each agent ’s top choice alternative,
Distances between alternatives, i.e., ,
Output :  The winning alternative .
Generate projected agent set . For each alternative , calculate the total social cost on by choosing , i.e., . Final Output: Return the alternative that has the minimum social cost .
Algorithm 1 Algorithm for the minimum total social cost.
Theorem 3.2.

The distortion of Algorithm 1 for minimum total social cost on is at most 3.

Proof.

Let denote the winning alternative. has the minimum social cost on the agent set , so for any alternative , it must be that

(1)

Let denote the true optimal alternative for . We want to get by upper bounding the cost incurred by compared to :

(2)

The inequality is due to the triangle inequality since is a metric, as shown in Figure 1 (b). , we know that is located at ’s top choice alternative, so the distance between and must be less than (or equal to) the distance between and any alternative; thus . Summing up for all , we get that . For any agent such that is not ’s top choice, suppose alternative is ’s top choice, then has the same location as and . By Lemma 3.1, , thus . For all that is ’s top choice, , so the inequality holds for all . Together with inequality 1 and  2,

3.2 Distortion of Median Social Cost

In this section, we study the median objective function, and provide a deterministic mechanism that gives a distortion of at most 3. Recall that we define the median social cost of an alternative as . We will refer to this as when and are fixed. If is even, we define median to be the smallest value of the distances. Note that no deterministic mechanism which only knows ordinal preferences can have worst-case distortion better than 5 (Theorem 14 in [3]). With known distances between facilities, we are able to provide a natural social choice function with distortion of 3, which is also provably the best possible distortion in our setting (consider the example in Theorem 3 from [3] again). Moreover, our social choice function only uses ordinal information about the alternatives, and not the full distances ; in particular as long as we have ordinal preferences of each alternative for each other alternative (and thus a total order of the distances from each alternative to the others), then our mechanism will work properly. Such ordinal information may be easier to obtain than full distances ; for example candidates can rank all the other candidates. In particular, given agents with ordinal preferences such that the candidates are a subset of the agents, our mechanism will always form an outcome with small distortion, even if we do not know the distances .

Note that using only agents’ top choices over alternatives and the distances between alternatives, as Algorithm 1 does for the total social cost objective, is not enough to give a worst-case distortion of 3 for the median objective. Consider the following example: there are 4 alternatives , the distances between them are: and . Suppose is agents 1, 2’s top choice, is agent 3, 4’s top choice, is agent 5, 6’s top choice, and is agent 7, 8’s top choice. This graph is symmetric, so we choose an arbitrary alternative as the winner. Suppose we choose as the winner, and the distances between agents and facilities are: the distances from agents 1, 2 to are both 100, the distances from agents 1, 2 to are all 102. The distances from agents 5, 6 to are all 1, and the distances from agents 5, 6 to are all 3. The distances from agents 7, 8 to are all 1, and the distances from agents 7, 8 to are all 3. The distances from agents 3, 4 to are both 1, the distances from 3, 4 to are all 3, and the distances from 3, 4 to are both 5. In this example, the median is the distance from closest agent to the winning alternative. is the optimal alternative with , while has a distortion of 5.

We will use the following Lemmas from [3] in the proof of our algorithm:

Lemma 3.3.

For any two alternatives and , we have . [Lemma 11 in [3]]

Lemma 3.4.

For any two alternatives and , if pairwise defeats (or pairwise ties) , then . [Proved in Theorem 16 in [3]]

Lemma 3.5.

Let be an alternatives , if pairwise defeats (or pairwise ties) , then . [Proved in Theorem 8 in [3]]

The main easy insight which we use in the formation of our algorithm comes from the following lemma.

Lemma 3.6.

For any three alternatives , , and , if pairwise defeats (or pairwise ties) , and , then .

Proof.

By Lemma 3.3, . By Lemma 3.4, . And we know that , thus

We use a natural Condorcet-consistent algorithm to approximate the minimum median social cost with the agents’ preference rankings and the ordinal preferences of every alternative over other alternatives. First, create the majority graph , i.e., a graph with alternatives as vertices and an edge if pairwise defeats or pairwise ties . If a Condorcet winner (i.e. an alternative which pairwise defeats all others) exists, then we return it immediately.

Otherwise, we consider each pair of alternatives. By Lemma 3.5, if the edge , then . When considering an alternative pair , if and we know that there exists another alternative which meets the conditions of Lemma 3.6, then we add an edge to . It is not difficult to see that whenever the edge is in our graph, this means that . As we prove below, at the end of this process there always exists at least one alternative which has edges to all the other alternatives, and thus the distortion obtained from selecting it is at most 3, no matter which alternative is the true optimal one.

Note that from the ordinal preferences of alternatives over each other, we can get a partial order of distances between the alternatives. Denote this partial order as , i.e., we say that if we know that prefers to (we do not have information about strict preference). This is the information we have on hand: we only know the partial order of distances between pairs of alternatives which share an alternative in common. Note, however, that if there exists a cycle in this partial order, i.e., , then this implies that all the distances in the cycle are actually equal, and thus we can also add the relations . Such cycles are easy to detect (e.g., by forming a graph with a node for every alternative pair and then searching for cycles), and thus we can assume that whenever a cycle exists in our partial order, then for every pair of distances and in the cycle, we have both and .

Input : Agents ,
Alternatives ,
The majority graph ,
Ordinal preferences of each alternative over other alternatives,
Partial order of distances between alternatives.
Output :  The winning alternative .
If there is a Condorcet winner , return as the winner. forall alternative pairs  do
       if  or  then
             WLOG, suppose exists, but does not exist.
             if there exists an alternative , such that we have in our partial order information, and pairwise defeats (or ties)  then
                   Add edge to ;
                   continue;
                  
             end if
            
       end if
      
end forall
There must exists an alternative such that , . Return as the winner.
Algorithm 2 Algorithm for the minimum median social cost.
Lemma 3.7.

Consider the modified majority graph at any point during Algorithm 2. For any edge , we have that .

Proof.

By Lemma 3.5, for any edge in the original majority graph, .

Now consider an edge added to when processing the alternative pair . It must be the case that there exists an alternative , such that and pairwise defeats (or ties) . By Lemma 3.6, . ∎

Lemma 3.8.

At the end of Algorithm 2, there must exist an alternative such that , .

Proof.

We prove this lemma by contradiction. Suppose no such alternative exists. Then for each alternative , there is at least one alternative , such that only and . This is because we start with the majority graph, so at least one edge always exists between every pair. We create another directed graph , with being all the edges such that . Thus any pair of alternatives in have at most one direction of edge between them. And by our assumption, each alternative has at least one incoming edge in . Since the in-degree of each node is at least 1 in , there must be at least one cycle in . To see this, one can for example take the edge coming into , then the edge coming into , and proceed in this way until a cycle is formed. Note that every edge in must be in the original majority graph, because if we add an edge when processing a pair of alternatives in our algorithm, that pair must have edges in both directions.

Consider a cycle formed by edges , , …, , . When processing the alternative pair in Algorithm 2, we did not add edge to , so it must be the case that no alternative exists such that and pairwise defeats (or ties) . But we know that pairwise defeats (or ties) , because edge is in the original majority graph. Then the only possibility is we don’t know if , i.e., either and are incomparable in our partial order, or we only know that . They cannot be incomparable, since we have the ordinal preferences of for and , thus our partial order must state that , i.e., prefers to . By the same reasoning, we also get that prefers to , and more generally that prefers to for all , where and since it is a cycle. This means that in our partial order, we have that . Recall, however, that this means we know , and before running Algorithm 2, we detect cycles in the partial order of alternative distances, and add the equality information to the partial order. This means that whenever exists in our partial order, we also have in the partial order as well. But this gives us a contradiction, since having in the partial order, combined with the fact that pairwise defeats , would cause us to add the edge in our algorithm, which contradicts the statement that only the edge is in the final graph produced by the algorithm, but not . Thus there must exist at least one alternative with edges from it to all the others. ∎

Theorem 3.9.

The distortion of Algorithm 2 for minimum median social cost is at most 3.

Proof.

If there is a Condorcet winner, by Lemma 3.5, the distortion is at most 3.

Otherwise, by Lemma 3.8, the algorithm always returns a winner. Suppose it returns alternative as the winner, by Lemma 3.7, has a distortion at most 3 with any alternative as the optimal solution. ∎

3.2.1 Generalizing Median: Percentile Distortion

Instead of just considering the median objective, we also consider a more general objective: the -percentile social cost. Let denote the value from the set , that fraction of the values lie below . Thus median is a special case when , . It was shown in [3] Theorem 17 that the worst-case distortion when in that setting (only have agent’s ordinal preferences over alternatives) is unbounded, and the same example shows in our setting is also unbounded. However, we are able to give a distortion of 3 for in this paper, while for the setting in [3], the lower bound for distortion when is 5. The reason is that the ordinal preferences between alternatives are also available in our setting. We will show that Algorithm 2 gives a distortion of at most 3 not only for the median objective, but also for the general -percentile objective, because all the lemmas we used to prove the conclusion for the median objective could be generalized to -percentile.

We use the following lemma from [3] in the proof of our algorithm:

Lemma 3.10.

For any two alternatives and , we have . [Lemma 18 in [3]]

We can generalize Lemma 3.6 to the following lemma, and the proof is by using Lemma 3.10 instead of Lemma 3.3 in the proof of Lemma 3.6,

Lemma 3.11.

For any three alternatives , , and , if pairwise defeats (or pairwise ties) , and , then .

Theorem 3.12.

The distortion of Algorithm 2 for the objective social cost with is at most 3.

Proof.

Note that Lemma 3.10 is actually a generalization of Lemma 3.3, and Lemma 3.11 is a generalization of Lemma 3.6. Lemma 3.4 and Lemma 3.5 also generalize to the objective, because when , for any alternative , we know . Then Lemma 3.7 also generalizes to the objective, because it only uses Lemma 3.5 and Lemma 3.6 in the proof. And Lemma 3.8 still holds for the same algorithm. Thus all the lemmas and properties of the median objective used in the proof of Theorem 3.9 could be generalized into the objective, so the conclusion still holds for the objective when . ∎

3.2.2 Algorithm 2 and the Total Social Cost

Although Algorithm 2 is designed for the median objective, it also performs quite well for the sum objective. Interestingly, the distortion of this algorithm for the minimum total social cost is at most 5, which is the same as Copeland (the best known deterministic algorithm with no knowledge of candidate preferences). Thus this algorithm gives a distortion of 3 for median (and in fact for all -percentile objectives) and distortion of 5 for sum simultaneously. In settings where we are not sure which objectives to optimize, or ones where we care both about the total social good, and about fairness, this social choice mechanism provides the best of both worlds. The lemmas and proofs for this result are similar to Theorem 3.9, as follows.

Lemma 3.13.

Let be alternatives . If pairwise defeats (or pairwise ties) , then . [Proved in Theorem 7 in [3]]

Lemma 3.14.

For any three alternatives , , and , if pairwise defeats (or pairwise ties) , and , then .

Proof.

For all , we know by the triangle inequality. Summing up for all , we get .

pairwise defeats (or pairwise ties) , so at least half of the agents prefer to ; thus the total social cost of is at least the sum of the social cost of these half of agents. By Lemma 3.1, we get . Thus,

Lemma 3.15.

Consider the modified majority graph at any point during Algorithm 2. For any edge , we have that .

Proof.

By Lemma 3.13, for any edge in the original majority graph, .

Now consider an edge added to when processing the alternative pair . It must be the case that there exists an alternative , such that and pairwise defeats (or ties) . By Lemma 3.14, . ∎

Theorem 3.16.

The distortion of Algorithm 2 for minimum total social cost is at most 5, and this bound is tight.

Proof.

If there is a Condorcet winner, by Lemma 3.13, the distortion is at most 3. Otherwise, suppose the algorithm returns alternative as the winner; by Lemma 3.15 has a distortion at most 5 with any alternative as the optimal solution.

To see that this bound is tight, consider the following example. There are three facilities , , and . There are agents who prefer to to , agents who prefer to to , and 1 agent who prefers to to . We denote these three sets of agents as , and separately. By the preferences of agents, we know that pairwise defeats , pairwise defeats , and pairwise defeats . The distances between facilities are: , , , where is a very small positive number. is located at the same location as , so , = 2, and . The distances between and the alternatives are: , . has a distance of 1 to all alternatives. Run Algorithm 2 on this example, and consider the alternative pair , . Because pairwise defeats and , we add edge to the graph and make the winner. The total social cost of is . While the optimal solution is to choose as the winner, and get a total social cost of . When is very large and is very small, the distortion in this example approaches 5. ∎

4 Model and Notation: Facility Assignment Problems

The mechanism we used for approximation of total social cost in Theorem 3.2 can be applied to far more general problems. In this section, we describe a set of facility assignment problems that fit in this framework. As before, let be a set of agents, and be a set of facilities, with each agent having a preference ranking over the facilities, and .

As in the social choice model, we assume that there exists an arbitrary unknown metric on the set of agents and facilities. The distances between agents and facilities are unknown, but the ordinal preferences and the distances between facilities are given. Let be the set of metrics consistent with and , as defined previously in Section 2.

Unlike for social choice, our goal is now to choose which facilities to open, and which agents should be assigned to which facilities. Formally, we must choose an assignment , where is the facility that is assigned to. Every must be assigned to one (and only one) facility in ; other than that, there could be arbitrary constraints on the assignment. Here are some examples of constraints which fall into our framework: each facility has a capacity , which is the maximum number of agents that can be assigned to ; at least (or at most) facilities should have agents assigned to them; agents and must be (or must not be) assigned to the same facility, etc. The social choice model is a special case of this one with the constraint that exactly one facility must be opened, and all agents must be assigned to it. Note that the constraints are only on the assignment, and independent of the metric space . An assignment is valid if it satisfies all constraints. Let be the set of all valid assignments.

The cost function of assignments.

The cost of an assignment consists of two parts. The first part is the distance cost between agents and facilities. , let denote the distance between and the facility it is assigned to, i.e., . For a given metric and assignment , let denote the vector of distances between each and , i.e., . Let be a cost function that takes a vector of distances as input. For example, this could simply sum up all the distances, take the maximum distance for an egalitarian objective, etc. To be as general as possible, instead of fixing a specific function we consider the set of distance cost functions that are monotone nondecreasing and subadditive. Formally, is monotonically nondecreasing means that for any vectors and such that componentwise, we have that . Any reasonable cost function should satisfy this property if agents desire to be assigned to closer facilities. being subadditive means that for any vectors and , we have that . While not all functions are subadditive, many important ones are, as they represent the concept of “economies of scale”, a common property of realistic costs.

The second part of the assignment cost is the facility cost. Let denote the facility cost for assignment . can be an arbitrary function over the assignments, for example, the opening cost of facilities, the penalty (or reward) for assigning certain agents to the same facility, etc. Our framework includes all such functions, and thus is quite general, as we discuss below. The main components needed for our framework to work is that the function does not depend on the distances, only on , and that the function is subadditive.

The total cost of an assignment is the sum of the distance cost and the facility cost, i.e. . We study algorithms to approximate the minimum cost assignment given only agents’ ordinal preferences over facilities, and the distances between facilities, as described above.

Social Cost Distortion

As for social choice, we use the notion of distortion to measure the quality of an assignment in the worst case, similar to the notation in [11, 28]. For any assignment , we define the distortion of as the ratio between the social cost of and the optimal assignment:

A social choice function on and takes and as input, and returns a valid assignment on and . We say the distortion of on and is the same as the distortion of the assignment returned by . In other words, the distortion of an assignment function on a profile and facility distances is the worst-case ratio between the social cost of , and the social cost of the true optimal assignment, to obtain which we would need the true distances .

Approximation ratio of omniscient algorithms

Consider omniscient algorithms which know the true numerical distances between agents and facilities for the facility assignment problems, in other words, the metric . In some sense, the goal of our work is to determine when algorithms with only limited information can compete with such omniscient algorithms. With the full distances information, we can of course obtain the optimal assignment using brute force, while for our algorithms with limited knowledge this is impossible even given unlimited computational resources. Nevertheless, we are also interested in what is possible to achieve if we restrict ourselves to polynomial time. To differentiate traditional approximation algorithms from algorithms with small distortion, suppose that an omniscient approximation algorithm returns assignment . Then we denote the approximation ratio of a valid assignment as:

Thus we say the approximation ratio of an omniscient algorithm is if for any input of the problem, the assignment returned by has .

4.1 Examples of Facility Assignment Problems

In this section we illustrate that our framework is quite general by giving various important examples which fit into our framework. In the section which follows, we prove a general black-box reduction theorem for our framework, and thus immediately obtain mechanisms with small distortion for all these examples simultaneously.

The total social cost problem we discussed in Section 3.1 is a special case of the facility assignment problem such that the constraint is only one facility (alternative) is chosen, and all agents are assigned to it. For any assignment , the facility cost function , and the distance cost function is the sum of distances from the winning alternative to all agents in the metric . is monotone and additive (thus subadditive). Here are some other examples that fit in our framework:

Minimum weight metric bipartite matching. Given a set of agents and a set of facilities such that . is an undirected complete bipartite graph. The facilities and agents lie in a metric space . The weight of each edge is the distance between and , . The goal is to find a minimum weight perfect matching of the bipartite graph given only ordinal information. This setting has been studied before, and the best distortion bound known is [13] given by RSD for the case when only the ordinal preferences are known. Our results show that if we also know the distances between facilities, then even without knowing the distances between agents and facilities, it is possible to create simple mechanisms with distortion at most 3 (we can show that no better bound is possible for this setting). Thus having a bit more information about the facilities immediately improves the distortion bound by a very large amount. We show this result by using our facility assignment framework above: the constraint here is that each facility has a capacity of 1, thus a valid assignment is a perfect matching of the bipartite graph. For any assignment , the facility cost function is , and the distance cost function is the total edge weight in the assignment. is monotone and additive (thus subadditive).

Egalitarian bipartite matching. With the same bipartite graph as in minimum weight matching problems, the only difference is that the goal of egalitarian bipartite matching is to find a perfect matching such that maximum edge weight (instead of the total weight) in the matching is minimized [10].

The egalitarian bipartite matching problem is the same as minimum weight bipartite matching except the distance cost function is the maximum edge weight in the assignment. This function is also monotone and subadditive.

Metric Facility Location. In this problem, one is given a set of agents and a set of facilities such that , . The facilities and agents lie in a metric space . Each facility has an opening cost . Each agent is assigned to a facility; in different versions there may be capacities on the number of agents assigned to a facility, lower bounds on the number of agents assigned to a facility, or various other constraints [20]. The goal is to find a subset of facilities to open, so that the sum of opening costs for facilities in and total distance of the assignment is minimized.

Our framework allows arbitrary constraints on what constitutes a valid assignment, which captures facilities with capacities or lower bounds if needed. For any assignment , the facility cost function is the sum of the opening costs for those facilities that have at least one agent assigned to it. The distance cost function is the total distances in the assignment, which is monotone increasing and additive (thus subadditive).

-center problem. The goal in this classic problem is to open a set of facilities, with each agent assigned to the closest one. The optimal solution is the subset of which minimizes . To express this in our framework, the constraint is that no more than facilities have agents assigned to them. For any assignment , the facility cost function , and the distance cost function is the maximum distance between any agent and facility in the assignment.

-median problem. This classic problem is the same as -center, except the goal is to minimize the sum of distances of agents to the facilities instead of the maximum distance.

5 Distortion of Facility Assignment Problems

In this section, we study general facility assignment problems, as described in Section 4, and form mechanisms with small distortion. First, we construct a projected problem such that the distances between agents and facilities are known, so it could be solved by an omniscient algorithm. Then we map the result of the projected problem to the original problem and bound the distortion of the original problem.

Given agents and facilities , suppose facility is ’s top choice in . We create a new agent at the location of in the metric space. Consequently, , . Denote the set of the new agents as .

The original assignment problem is on agents and facilities , and only ordinal preferences of agents in over facilities are given. The projected problem is on agents and facilities , and we know the actual distances between agents in and facilities , since we know the distances between facilities. The constraints and costs and remain the same for both the original and the projected problem; the only difference is in the distances . Our main result is that if we have a -approximation assignment to the minimum assignment cost on the projected problem, then we can get an assignment that has a distortion of for the original problem in polynomial time.

Theorem 5.1.

Given a valid assignment for the projected problem on and , with , the assignment has distortion of at most for original assignment problem on and .

Proof.

First, is a valid assignment for the projected problem on and , so must also be a valid assignment for the original problem on and . This is because the constraints are only on the assignment, and are independent of the metric space . For the same reason, the facility cost of equals the facility cost of , .

Now consider the distance cost of . Let denote the optimal assignment for the original problem. , let , , . Similarly, let , .

For any agent and facility , by triangle inequality,

Because is monotonically nondecreasing and subadditive,

Therefore, the cost of our assignment is bounded as follows:

Because is located at ’s top choice facility, and is a facility, we thus know that