Blind, Greedy, and Random: Ordinal Approximation Algorithms for Matching and Clustering

12/17/2015 ∙ by Elliot Anshelevich, et al. ∙ 0

We study Matching and other related problems in a partial information setting where the agents' utilities for being matched to other agents are hidden and the mechanism only has access to ordinal preference information. Our model is motivated by the fact that in many settings, agents cannot express the numerical values of their utility for different outcomes, but are still able to rank the outcomes in their order of preference. Specifically, we study problems where the ground truth exists in the form of a weighted graph, and look to design algorithms that approximate the true optimum matching using only the preference orderings for each agent (induced by the hidden weights) as input. If no restrictions are placed on the weights, then one cannot hope to do better than the simple greedy algorithm, which yields a half optimal matching. Perhaps surprisingly, we show that by imposing a little structure on the weights, we can improve upon the trivial algorithm significantly: we design a 1.6-approximation algorithm for instances where the hidden weights obey the metric inequality. Using our algorithms for matching as a black-box, we also design new approximation algorithms for other closely related problems: these include a a 3.2-approximation for the problem of clustering agents into equal sized partitions, a 4-approximation algorithm for Densest k-subgraph, and a 2.14-approximation algorithm for Max TSP. These results are the first non-trivial ordinal approximation algorithms for such problems, and indicate that we can design robust algorithms even when we are agnostic to the precise agent utilities.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Consider the Maximum Weighted Matching (MWM) problem, where the input is an undirected complete graph and the weight of an edge represents the utility of matching agent with agent . The objective is to form a matching (collection of disjoint edges) that maximizes the total utility of the agents. The problem of matching agents and/or items is at the heart of a variety of diverse applications and it is no surprise that this problem and its variants have received extensive consideration in the algorithmic literature [26]

. Perhaps, more importantly, maximum weighted matching is one of the few non-trivial combinatorial optimization problems that can be solved optimally in poly-time 

[14]. In comparison, we study the MWM problem in a partial information setting where the lack of precise knowledge regarding agents’ utilities acts as a barrier against computing optimal matchings, efficiently or otherwise.

More generally, in this work, we also look at other graph optimization problems such as clustering in a similar partial information setting, where optimal computation is preemptively stymied by the NP-Hardness of the problem (even in the full information case). This includes the problem of clustering agents to maximize the total weight of edges inside each cluster (Max -sum), Densest -subgraph, and the max traveling salesman problem. Furthermore, for the majority of this work, we assume that the edge weights obey the triangle inequality, since in many important applications it is natural to expect that the weights have some geometric structure. Such structure occurs, for instance, when the agents are points in a metric space and the weight of an edge is the distance between the two endpoints.

Partial Information - Ordinal Preferences

A crucial question in algorithm and mechanism design is: “How much information about the agent utilities does the algorithm designer possess?”. The starting point for the rest of our paper is the observation that in many natural settings, it is unreasonable to expect the mechanism to know the exact weights of the edges in  [7, 30]. For example, when pairing up students for a class project, it may be difficult to precisely quantify the synergy level for every pair of students; ordinal questions such as ‘who is better suited to partner with student : or ?’ may be easier to answer. Such a situation would also arise when the graph represents a social network of agents, as the agents themselves may not be able to express ‘exactly how much each friendship is worth’, but would likely be able to form an ordering of their friends from best to worst. This phenomenon has also been observed in social choice settings, in which it is much easier to obtain ordinal preferences instead of true agent utilities [2, 30].

Motivated by this, we consider a model where for every agent , we only have access to a preference ordering among the agents in so that if , then prefers to

. The common approach in Learning Theory while dealing with such ordinal settings is to estimate the ‘true ground state’ based on some probabilistic assumptions on the underlying utilities 

[29, 31]. In this paper we take a different approach, and instead focus on the more demanding objective of designing robust algorithms, i.e., algorithms that provide good performance guarantees no matter what the underlying weights are.

Despite the large body of literature on computing matchings in settings with preference orderings, there has been much less work on quantifying the quality of these matchings. As is common in much of social choice theory, most papers (implicitly) assume that the underlying utilities cannot be measured or do not even exist, and hence there is no clear way to define the quality of a matching [1, 4, 19]. In such papers, the focus therefore is on computing matchings that satisfy normative properties such as stability or optimize a measure of efficiency that depends only on the preference orders, e.g., average rank. On the other hand, the literature on approximation algorithms usually follows the utilitarian approach [20] of assigning a numerical quality to every solution; the presence of input weights is taken for granted. Our work combines the best of both worlds: we do not assume the availability of numerical information (only its latent existence), and yet our approximation algorithms must compete with algorithms that know the true input weights.

Model and Problem Statements

For all of the problems studied in this paper, we are given as input a set of points or agents with , and for each , a strict preference ordering over the agents in We assume that the input preference orderings are derived from a set of underlying hidden edge weights ( for edge ), which satisfy the triangle inequality, i.e., for , . These weights are considered to represent the ground truth, which is not known to the algorithm. We say that the preferences are induced by weights if , if prefers to , then . Our framework captures a number of well-motivated settings (for matching and clustering problems); we highlight two of them below.

  1. Forming Diverse Teams Our setting and objectives align with the research on diversity maximization algorithms, a topic that has gained significant traction, particularly with respect to forming diverse teams that capture distinct perspectives [22, 27]. In these problems, each agent corresponds to a point in a metric space: this point represents the agents’s beliefs, skills, or opinions. Given this background, our matching problem essentially reduces to selecting diverse teams (of size two) based on different diversity goals, since points that are far apart ( is large) contribute more to the objective. For instance, one can imagine a teacher pairing up her students who possess differing skill sets or opinions for a class project, which is captured by the maximum weighted matching problem. In section 4, we tackle the problem of forming diverse teams of arbitrary sizes by extending our model to encompass clustering, and team formation.

  2. Friendship Networks In structural balance theory [10], the statement that a friend of a friend is my friend is folklore; this phenomenon is also exhibited by many real-life social networks [18]. More generally, we can say that a graph with continuous weights has this property if , for some suitably large . Friendship networks bear a close relationship to our model; in particular every graph that satisfies the friendship property for must have metric weights, and thus falls within our framework.

In this paper our main goal is to form ordinal approximation algorithms for weighted matching problems, which we later extend towards other problems. An algorithm is said to be ordinal if it only takes preference orderings as input (and not the hidden numerical weights ). It is an -approximation algorithm if for all possible weights , and the corresponding induced preferences , we have that . Here is the total value of the maximum weight solution with respect to , and is the value of the solution returned by the algorithm for preferences . In other words, such algorithms produce solutions which are always a factor away from optimum, without actually knowing what the weights are.

In the rest of the paper, we focus primarily on the Maximum Weighted Matching(MWM) problem where the goal is to compute a matching to maximize the total (unknown) weight of the edges inside. A close variant of the MWM problem that we will also study is the Max -Matching (M-M) problem where the objective is to select a maximum weight matching consisting of at most edges. In addition, we also provide ordinal approximations for the following problems:

Max -Sum

Given an integer , partition the nodes into disjoint sets of equal size in order to maximize . (It is assumed that is divisible by ). When , Max -sum reduces to the Maximum Weighted Matching problem.

Densest -subgraph

Given an integer , compute a set of size to maximize the weight of the edges inside .

Max TSP

In the maximum traveling salesman problem, the objective is to compute a tour (cycle that visits each node in exactly once) to maximize .

Challenges and Techniques

We describe the challenges involved in designing ordinal algorithms through the lens of the Maximum Weighted Matching problem. First, different sets of edge weights may give rise to the same preference ordering and moreover, for each of these weights, the optimum matching can be different. Therefore, unlike for the full information setting, no algorithm (deterministic or randomized) can compute the optimum matching using only ordinal information. More generally, the restriction that only ordinal information is available precludes almost all of the well-known algorithms for computing a matching. So, what kind of algorithms use only preference orderings? One algorithm which can still be implemented is a version of the extremely popular greedy matching algorithm, in which we successively select pairs of agents who choose each other as their top choice. Another trivial algorithm is to choose a matching at random: this certainly does not require any numerical information! It is not difficult to show that both these algorithms actually provide an ordinal -approximation for the maximum weight matching. The main result of this paper, however, is that by interleaving these basic greedy and random techniques in non-trivial ways, it is actually possible to do much better, and obtain a -approximation algorithm. Moreover, these techniques can further be extended and tailored to give ordinal approximation algorithms for much more general problems.

Our Contributions

Problem Full Info Our Results (Ordinal Bounds)
Deterministic Randomized
Max Weighted Matching  [14]
Max -Matching  [26]
Max -Sum  [16, 21]
Densest -Subgraph  [6, 21]
Max TSP  [25]
Table 1: A Comparison of the approximation factors obtained by ordinal approximation algorithms and previous results for the full information metric setting. All of our results for the non-matching problems are obtained by using our algorithms (deterministic and randomized) for matching as a black-box to construct solutions for the respective problems.

Our main results are summarized in Table 1. All of the problems that we study have a rich history of algorithms for the full information setting. As seen in the table, our ordinal algorithms provide approximation factors that are close to the best known for the full information versions. In other words, we show that it is possible to find good solutions to such problems even without knowing any of the true weights, using only ordinal preference information instead.

Our central result in this paper is an ordinal -approximation algorithm for max-weight matching; this is obtained by a careful interleaving of greedy and random matchings. We also present a deterministic -approximation algorithm for Max -Matching. Note that Max -Matching for is the same as the MWM problem.

We also provide a general way to use matching algorithms as a black-box to form ordinal approximation algorithms for other problems: given an ordinal -approximation for max-weight matching, we show how to obtain a , , and approximation for Max -Sum, Densest -Subgraph, and Max-TSP respectively. Plugging in the appropriate values of for deterministic and randomized algorithms yields the results in Table 1.

In total, our results indicate that for matching and clustering problems with metric preferences, ordinal algorithms perform almost as well as algorithms which know the underlying metric weights.

Techniques: More generally, one of our main contributions is a framework that allows the design of algorithms for problems where the (metric) weights are hidden. Our framework builds on two simple techniques, greedy and random, and establishes an interesting connection between graph density, matchings, and greedy edges. We believe that this framework may be useful for designing ordinal approximation algorithms in the future.

Related Work

Broadly speaking, the cornucopia of algorithms proposed in the matching literature belong to one of two classes: Ordinal algorithms that ignore agent utilities, and focus on (unquantifiable) axiomatic properties such as stability, and Optimization algorithms where the numerical utilities are fully specified. From our perspective, algorithms belonging to the former class, with the exception of Greedy, do not result in good approximations for the hidden optimum, whereas the techniques used in the latter (e.g., [11, 12]) depend heavily on improving cycles and thus, are unsuitable for ordinal settings. A notable exception to the above dichotomy is the class of optimization problems studying ordinal measures of efficiency [1, 9], for example, the average rank of an agent’s partner in the matching. Such settings often involve the definition of ‘new utility functions’ based on given preferences, and thus are fundamentally different from our model where preexisting cardinal utilities give rise to ordinal preferences.

The idea of preference orders induced by metric weights (or a more general utility space) was first considered in the work of Irving et al. [23]. Subsequent work has focused mostly on analyzing the greedy algorithm or on settings where the agent utilities are explicitly known [3, 15]. Most similar to our work is the recent paper by Filos-Ratsikas et al. [17], who prove that for one-sided matchings, no ordinal algorithm can provide an approximation factor better than . In contrast, for two-sided matchings, there is a simple (greedy) -approximation algorithm even when the hidden weights do not obey the metric inequality.

As with Matching, all of the problems studied in this paper have received considerable attention in the literature for the full information case with metric weights. In particular, metric Densest Subgraph (also known as Maximum Dispersion or Remote Clique) is quite popular owing to its innumerable applications [6, 5]. The close ties between the optimum solutions for Matching and Max -sum, and Densest -subgraph was first explored by Feo and Khellaf [16], and later by Hassin et al. [21]; our black-box mechanism to transform arbitrary matchings into solutions for other problems can be viewed as a generalization of their results. In addition, we also provide improved algorithms for these problems (see Table 1) that do not depend on matchings; for Max -sum, the bound that we obtain for the ordinal setting is as good as that of the best-known algorithm for the full information setting.

Distortion in Social Choice Our work is similar in motivation to the growing body of research studying settings where the voter preferences are induced by a set of hidden utilities [2, 7, 8, 28, 30]. The voting protocols in these papers are essentially ordinal approximation algorithms, albeit for a very specific problem of selecting the utility-maximizing candidate from a set of alternatives.

Finally, other models of incomplete information have been considered in the Matching literature, most notably Online Algorithms [24] and truthful algorithms (for strategic agents) [13]. Given the strong motivations for preference rankings in settings with agents, it would be interesting to see whether algorithms developed for other partial information models can be extended to our setting.

2 Framework for Ordinal Matching Algorithms

In this section, we present our framework for developing ordinal approximation algorithms and establish tight upper and lower bounds on the performance of algorithms that select matching edges either greedily or uniformly at random. As a simple consequence of this framework, we show that the algorithms that sequentially pick all of the edges greedily or uniformly at random both provide -approximations to the maximum weight matching. In the following section, we show how to improve this performance by picking some edges greedily, and some randomly. Finally, we remark that for the sake of convenience and brevity, we will often assume that is even, and sometimes that it is also divisible by 3. As we discuss in the Appendix, our results still hold if this is not the case, with only minor modifications.

Fundamental Subroutine: Greedy

We begin with Algorithm 1 that describes a simple greedy procedure for outputting a matching: at each stage, the algorithm picks one edge such that the both and prefer this edge to all of the other available edges. We now develop some notation required to analyze this procedure.

  • (Undominated Edges) Given a set of edges, is said to be an undominated edge if for all and in , and .

input : Edge set , preferences ,
output : Matching with edges
while  is not empty (AND)  do
       pick an undominated edge from and add it to ;
       remove all edges containing or from ;
      
end while
Algorithm 1 Greedy -Matching Algorithm

Given a set , let us use the notation to denote the set of undominated edges in . Finally, we say that an edge set is complete if some such that is the complete graph on the nodes in (minus the self-loops). We make the following two observations regarding undominated edges

  1. Every edge set has at least one undominated edge. In particular, any maximum weight edge in is obviously an undominated edge.

  2. Given an edge set , one can efficiently find at least one edge in using only the ordinal preference information. A naive algorithm for this is as follows. Consider starting with an arbitrary node . Let be its first choice out of all the edges in (i.e., is ’s first choice of all the nodes it has an edge to in ). Now consider ’s first choice. If it is , then the edge must be undominated, as desired. If instead it is some , then continue this process with . Eventually this process must cycle, giving us a cycle of nodes such that is the top preference of , taken with respect to . This means that all edges in this cycle have equal weight, even though we do not know what this weight is, since preferring over means that . Moreover, the edge weights of all edges in this cycle must be the highest ones incident on the nodes in this cycle, since they are all top preferences of the nodes. Therefore, all edges in this cycle are undominated, as desired.

In general, an edge set may have multiple undominated edges that are not part of a cycle. Our first lemma shows that these different edges are comparable in weight.

Lemma 2.1.

Given a complete edge set , the weight of any undominated edge is at least half as much as the weight of any other edge in , i.e., if , then for any , we have . This is true even if is another undominated edge.

Proof.

Since is an undominated edge, and since is a complete edge set this means that , and . Now, from the triangle inequality, we get

It is not difficult to see that when , the output of Algorithm 1 coincides with that of the extremely popular greedy algorithm that picks the maximum weight edge at each iteration, and therefore, our algorithm yields an ordinal -approximation for the MWM problem. Our next result shows that the approximation factor holds even for Max -Matching, for any : this is not a trivial result because at any given stage there may be multiple undominated edges and therefore for , the output of Algorithm 1 no longer coincides with that the well known greedy algorithm. In fact, we show the following much stronger lemma,

Lemma 2.2.

Given , and , the performance of the greedy -matching with respect to the optimal -matching (i.e., ) is given by,

  1.  if

  2.   if

Thus, for example, when , and , we get the factor of , i.e., in order to obtain a half-approximation to the optimum perfect matching, it suffices to greedily choose two-thirds as many edges as in the perfect matching.

Proof.

We show the claim via a charging argument where every edge in the optimum matching is charged to one or more edges in the greedy matching . Specifically, we can imagine that each edge contains a certain (not necessarily integral) number of slots , initialized to zero, that measure the number of edges in charged to . Our proof will proceed in the form of an algorithm: initially denotes the set of uncharged edges. In each iteration, we remove some edge from , charge its weight to some edges in and increase the value of for the corresponding edges so that the following invariant always holds: Finally, we can bound the performance ratio using the quantity .

We describe our charging algorithm in three phases. Before we describe the first phase, consider any edge in . The edge must belong to one of the following two types.

  1. (Type I) Some edge(s) consisting of or (both and ) are present in .

  2. (Type II) No edge in has or as an endpoint.

Suppose that contains Type I edges, and Type II edges. We know that . Also, let denote the top edges in , i.e., the edges with the highest weight. In the first charging phase, we cover all the Type I edges using only the edges in , and so that no more than two slots of each edge are required.

Claim 2.3.

(First Phase) There exists a mechanism by which we can charge all Type I edges in to the edges in so that and for all , .

Proof.

We begin by charging the Type I edges to arbitrary edges in , and then transfer the slots that are outside to edges in . Consider any Type I edge : without loss of generality, suppose that is the first edge containing either or that was added to by the greedy algorithm. Since the greedy algorithm only adds undominated edges, we can infer that (or else would be dominated by ). Using this idea, we we charge the Type I edges as follows

(Algorithm: Phase I (Charging)) Repeat until contains no Type I edge: pick a type I edge from . Suppose that is the first edge containing either or that was added to . Since , charge to , i.e., increase by one and remove from .

At the end of the above algorithm, contains no type edge. Moreover, since every Type I edge requires only one slot. Finally, for every , . This is because any edge charged to must contain at least one of or . Now, without altering the set of uncharged edges , we provide a mechanism to transfer the slots to edges in . The following procedure is based on the observation that for every such that and , .

(Algorithm: Phase I (Slot Transfer)) Repeat until for every edge outside : pick such that . Pick any edge such that . Transfer the edge originally charged to to , i.e., decrease by one and increase by one.

Notice that at the end of the above mechanism, , for all , and for all . ∎

Now, consider any type II edge . We make a strong claim: for every , . This follows from Lemma 2.1 since at the instant when was added to , was an undominated edge in the edge set and was also present in the edge set. Therefore, each type II edge can be charged using two (unit) slots from any of the edges in (or any combination of them). We now describe the second phase of our charging algorithm that charges nodes only to edges in , recall that there such edges.

(Second Phase) Repeat until for all (or) until is empty: pick any arbitrary edge from and such that . Since , charge using two slots of , i.e., increase by two and remove from .

During the second phase, every edge in is charged to exactly (two slots of) one edge in . Therefore, the number of edges removed from during this phase is . Since the number of uncharged edges at the beginning of Phase I was exactly , we conclude that the number of uncharged edges at the end of the second phase, i.e., is . If , we are done, otherwise we can charge the remaining edges in uniformly to all the edges in using a fractional number of slots, i.e.,

(Third Phase) Repeat until : pick any arbitrary edge from . Since for all , charge uniformly to all edges in , i.e., increase by for every and remove from .

Now, in order to complete our analysis, we need to obtain an upper bound for over all edges in . Recall that at the end of phase II, for all . In the third phase, increased by for every edge in , and the number of edges in is . Therefore, at the end of the third phase, we have that for every ,

Since , we can simplify the second term above and get

(1)
(2)

How large can be? Clearly, . But a more careful bound can be obtained using the fact that the Type II edges have no node in common with any of the edges in . But the total number of nodes is , therefore, or . This gives us . Depending on what the minimum is, we get two cases:

  1. Case I: or equivalently, . Substituting in Equation 1, we get that for all , . Replacing by and by , we get that when , .

  2. Case II: or equivalently . Substituting in Equation 1, we get that or equivalently .

Plugging in in the above lemma immediately gives us the following corollary.

Corollary 2.4.

Algorithm 1 is a deterministic, ordinal -approximation algorithm for the Max -Matching problem for all , and therefore a -approximation algorithm for the Maximum Weighted Matching problem.

Fundamental Subroutine: Random

An even simpler matching algorithm is simply to form a matching completely at random; this does not even depend on the input preferences. This is formally described in Algorithm 2. In what follows, we show upper and lower bounds on the performance of Algorithm 1 for different edges sets.

input : Edge set ,
output : Matching with edges
while  is not empty (AND)  do
       pick an edge from uniformly at random. Add this edge to ;
       remove all edges containing or from ;
      
end while
Algorithm 2 Random -Matching Algorithm
Lemma 2.5.

(Lower Bound)

  1. Suppose is a complete graph on the set of nodes with . Then, the expected weight of the random (perfect) matching returned by Algorithm 2 for the input is

  2. Suppose is a complete bipartite graph on the set of nodes with . Then, the weight of the random (perfect) matching returned by Algorithm 2 for the input is

Proof.

We show both parts of the theorem using simple symmetry arguments. For the complete (non-bipartite) graph, let be the set of all perfect matchings in . Then, we argue that every matching in is equally likely to occur. Therefore, the expected weight of is

(3)

where

is the probability of edge

occurring in the matching. Since the edges are chosen uniformly at random, the probability that a given edge is present in is the same for all edges in . So , we have the following bound of , which we can substitute in Equation 3 to get the first result.

For the second case, where is the set of edges in a complete bipartite graph, it is not hard to see that once again every edge is present in the final matching with equal probability. Therefore,

Lemma 2.6.

(Upper Bound) Let be a complete subgraph on the set of nodes with , and let be any perfect matching on the larger set . Then, the following is an upper bound on the weight of ,

Proof.

Fix an edge . Then, by the triangle inequality, the following must hold for every node : . Summing this up over all , we get

Once again, repeating the above process over all , and then all we have

Each appears twice in the RHS: once when we consider the edge in containing , and once when we consider the edge with . ∎

We conclude by proving that picking edges uniformly at random yields a -approximation for the MWM problem.

Claim 2.7.

Algorithm 2 is an ordinal -approximation algorithm for the Maximum Weighted Matching problem.

Proof.

From Lemma 2.5, we know that in expectation, the matching output by the algorithm when the input is has a weight of at least . Substituing in Lemma 2.6 and (max-weight matching) gives us the following upper bound on the weight of , . ∎

Lower Bound Example for Ordinal Matchings

Before presenting our algorithms, it is important to understand the limitations of settings with ordinal information. As mentioned in the Introduction, different sets of weights can give rise to the same preference ordering, and therefore, we cannot suitably approximate the optimum solution for every possible weight. We now show that even for very simple instances, there can be no deterministic -approximation algorithm, and no randomized -approximation algorithm.

Claim 2.8.

No deterministic ordinal approximation algorithm can provide an approximation factor better than , and no randomized ordinal approximation algorithm can provide an approximation factor better than for Maximum Weighted Matching. No ordinal algorithm, deterministic or randomized can provide an approximation factor better than for the Max -Matching problem.

Proof.

Consider an instance with nodes having the following preferences: , , , . Since the matching is weakly dominated, it suffices to consider algorithms that randomize between , and , or deterministically chooses one of them.

Now, consider the following two sets of weights, both of which induce the above preferences but whose optima are and respectively: , , and . The best deterministic algorithm always chooses the matching , but for the weights , this is only a -approximation to OPT.

Consider any randomized algorithm that chooses with probability , and with probability . With a little algebra, we can verify that just for , and , the optimum randomized algorithm has , yielding an approximation factor of .

For the Max -Matching problem, our results are tight. For small values of , it is impossible for any ordinal algorithm to provide a better than -approximation factor. To see why, consider an instance with nodes . Every ’s first choice is and vice-versa, the other preferences can be arbitrary. Pick some uniformly at random and set , and all the other weights are equal to . For , it is easy to see that no randomized algorithm can always pick the max-weight edge and therefore, as , we get a lower bound of . ∎

Since Max -sum is a strict generalization of Maximum Weighted Matching, the same lower bounds for Maximum Matching hold for Max -sum as well.

3 Ordinal Matching Algorithms

Here we present a better ordinal approximation than simply taking the random or greedy matching. The algorithm first performs the greedy subroutine until it matches of the agents. Then it either creates a random matching on the unmatched agents, or it creates a random matching between the unmatched agents and a subset of agents which are already matched. We show that one of these matchings is guaranteed to be close to optimum in weight. Unfortunately since we have no access to the weights themselves, we cannot simply choose the best of these two matchings, and thus are forced to randomly select one, giving us good performance in expectation. More formally, the algorithm is:

input : 
output : Perfect Matching
Initialize to the be complete graph on , and ;
Let be the output returned by Algorithm 1 for ,;
Let be the set of nodes in not matched in , and is the complete graph on .;
First Algorithm;
;
Second Algorithm ;
Choose half the edges from uniformly at random and add them to ;
Let be the set of nodes in ;
Let be the edges of the complete bipartite graph ;
Run Algorithm 2 on the set of edges in to obtain a perfect bipartite matching and add the edges returned by the algorithm to ;
Final Output Return with probability and with probability .
Algorithm 3 -Approximation Algorithm for Maximum Weight Matching
Theorem 3.1.

For every input ranking, Algorithm 3 returns a -approximation to the maximum-weight matching.

Proof.

First, we provide some high-level intuition on why this algorithm results in a significant improvement over the standard half-optimal greedy and randomized approaches. Observe that in order to obtain a half-approximation to , it is sufficient to greedily select edges (substitute , in Lemma 2.2). Choosing all edges greedily would be overkill, and so we choose the remaining edges randomly in the First Algorithm of Alg 3. Now, let us denote by , the set of nodes that are matched greedily. The main idea behind the second Algorithm is that if the first one performs poorly (not that much better than half), then, all the ‘good edges’ must be going across the cut from to Bottom (). In other words, must be large, and therefore, the randomized algorithm for bipartite graphs should perform well. In summary, since we randomized between the first and second algorithms, we are guaranteed that at least one of them should have a good performance for any given instance.

We now prove the theorem formally. By linearity of expectation, Now, look at the first algorithm, since has two-thirds as many edges as the optimum matching, we get from Lemma 2.2 that . As mentioned in the algorithm, is the set of nodes that are not present in ; since we randomly match the nodes in to other nodes in , the expected weight of the random algorithm (from Lemma 2.5 with ) is Therefore, we get the following lower bound on the weight of ,

Next, look at the second algorithm: half the edges from are added to . constitutes the set of nodes from that are not present in , these nodes are randomly matched to those in . Let denote the matching going ‘across the cut’ from to . Since the set is chosen randomly from the nodes in , the expected weight of the matching from to is given by,

The second equation above comes from Lemma 2.5 Part 2 for , and the last step follows from the observation that is exactly equal to the probability that the edge containing in is not added to , which is one half (since the edge is chosen with probability ). We can now bound the performance of as follows,

Now, let us apply Lemma 2.6 to the set (), with being the matching: we get that or equivalently Substituting this in the above equation for along with the fact that , we get the following lower bound for the performance of in terms of OPT,

Recall that

The final bound comes from adding the two quantities above and multiplying by half. ∎

Matching without the Metric Assumption

We now discuss the general case where the hidden weights do not obey the triangle inequality. From our discussion in Section 2, we infer that Algorithm 1 still yields a -approximation to the MWM problem as its output coincides with that of the classic greedy algorithm. No deterministic algorithm can provide a better approximation; consider the same preference orderings as Claim 2.8 and the following two sets of consistent weights: , other weights are , and , other weights are . The only good choice for case is the matching , which yields a -approximation for case .

We now go one step further and show that even if we are allowed to utilize randomized mechanisms, we still can’t do that much better than an ordinal -approximation factor.

Claim 3.2.

When the hidden weights do not obey the triangle inequality, there exist a set of preferences such that no randomized algorithm can provide an ordinal approximation factor better than for every set of weights consistent with these preferences.

Proof.

The instance consists of nodes , and we describe the preferences in a slightly unconventional but more intuitive manner. Every node is assigned a rank and two or more nodes may have the same rank. Now, the preference ordering for a node is simply a list of the nodes in sorted in the ascending order of their rank and ties in the rank can be broken arbitrarily. For this instance, we set for to . So for instance, ’s preference ordering can be: .

For all of the weights that we consider, , if , . Moreover, for a given , the weight of all edges going out of and are the same. By symmetry, there exists an optimal randomized mechanism for this instance that randomizes among the following six matchings, choosing matching with probability .

  1. .

We now construct sets of weights consistent with the preferences. Suppose that the optimal randomized strategy provides an ordinal approximation factor of for some . Then for every set of weights , it is true that

(4)

We now explicitly construct some weights and derive an upper bound of on , which implies the no randomized strategy can have an approximation factor better than for this instance.

  1. All weights are zero except . For this instance , and only give non-zero utility, so . Applying Equation 4, we get .

  2. , rest are zero. , . The corrresponding inequality is .

  3. All weights going out of are one. Among the remaining weights, only . . , giving us the inequality, .

  4. The final instance has all weights coming out of to be one. . The final inequality is .

Adding all of the inequalities above gives us . Since , we get that , which completes the proof. ∎

We hypothesize that as we extend the instance in the above claim, as , we should obtain a lower bound of , which meets our upper bound. For Max -matching, the situation is much more bleak; no algorithm, deterministic or randomized can provide a reasonable approximation factor if is small. As we did before, consider an instance with nodes . Every ’s first choice is and vice-versa, the other preferences can be arbitrary. Pick some uniformly at random and set , and all the other weights are equal to . For , it is easy to see that every randomized algorithm obtains non-zero utility only with probability , whereas . Therefore, the ordinal approximation factor for any random algorithm is and as , the factor becomes unbounded. Moreover, for other values of , the ordinal approximation factor for the same instance is .

4 Matching as a Black-Box for other Problems

In this section, we highlight the versatility of matchings by showing how matching algorithms can be used as a black-box to obtain good (ordinal) approximation algorithms for other problems. These reductions serve as stand-alone results, as the algorithms for matching are easy to implement as well as extremely common in settings with preference lists. Moreover, future improvements on the ordinal approximation factor for matchings can be directly plugged in to obtain better bounds for these problems.

Informal Statement of Results

  1. Max -Sum: Given an -approximate perfect matching , we can obtain a nice clustering as follows: “simply divide into equal sized sets (with edges in each) and form clusters using the nodes in each of the equal-sized sets.” It turns out that this simple mechanism provides a -approximation to the optimum clustering. Plugging in our -approximation algorithm, we immediately get a -approximation algorithm for Max -sum.

  2. Densest -Subgraph: Suppose we are provided an -approximate matching of size , how good is the set containing the nodes in ? Using Lemma 2.6, we can establish that is at least as dense as , and the density of the optimum solution is at most . Therefore, is a -approximation to the optimum set of size . This easy-to-implement mechanism directly yields a -approximation algorithm for Densest -subgraph.

  3. Max TSP: Given an -approximate perfect matching , any tour containing is a -approximation since the weight of the optimum tour cannot be more than twice that of the optimum matching. However, if we carefully form using only undominated edges, we can show that the resulting solution is a -approximation to the optimum tour. Plugging in , we get an ordinal -approximation algorithm for Max TSP.

Formal Results

Theorem 4.1.
  1. Any -approximation algorithm for the Maximum Weight Perfect Matching problem can be used to obtain a -approximation for the Max -Sum problem.

  2. Any -approximation algorithm for the Maximum Weight -Matching problem can be used to obtain a -approximation for the Densest -Subgraph problem.

  3. Any -approximation algorithm for the Maximum Weight Perfect Matching problem can be used to obtain a -approximation for the Max TSP problem.

Proof.

(Part 1) Suppose that we are provided a perfect matching that is a -approximation to the optimum matching. We use the following simple procedure to cluster the nodes into -clusters:

  1. Initialize empty clusters , and .

  2. While some such that

  3. Pick some edge , add both the end-points of to , remove from .

The only important property we require is that for every edge , both its end points belong to the same cluster. We now prove that this a -approximation to the optimum max -sum solution (). Let be the optimum perfect matching, and let . Finally, since the end points of every edge in belong to the same cluster, without l.o.g, let denote the edges of that are present in the cluster . First, we establish a lower bound on the quality of our solution ,

(Lemma 2.6)

Now, we establish an upper bound for in terms of . Suppose that is the maximum weight perfect matching on the set of nodes . Then,

(Corollary LABEL:corr_optlower)

Reconciling the two bounds gives us the desired factor of .

(Part 2) Once again, we use to denote the optimum -matching and to denote the -approximation. Let be the optimum solution to the Densest -subgraph problem for the given value of . Then our algorithm simply returns the solution compromising of the endpoints of all the edges in . The proof is quite similar to the proof for Part 1.