In recent years, the field of algorithm design has been marked by a steady shift towards newer paradigms that take into the account the behavioral aspects and communication bottlenecks pertaining to self-interested agents. In contrast to traditional algorithms that are assumed to have complete information regarding the inputs, mechanisms that interact with autonomous individuals commonly assume that the input to the algorithm is controlled by the agents themselves. In this context, a natural constraint that governs the process by which the algorithm elicits inputs from these agents is truthfulness: agents cannot improve upon the resulting outcome by misreporting the inputs. Another constraint that has recently gained traction in optimization problems on weighted graphs (where the agents correspond to the nodes) is that of ordinality: here, each agent can only submit a preference list of their neighbors ranked in the order of the edge weights. The need for algorithms that are both truthful and ordinal arises in a number of important settings; however, it is well known that it is impossible to obtain optimum solutions even when the algorithm is required to satisfy only one of these two constraints.
In this work, we study the design of approximation algorithms for popular graph optimization problems including matching, clustering, and team formation with the goal of understanding the combined price of truthfulness and ordinality. To be more specific, we consider the above optimization problems on a weighted graph whose vertices represent the agents, and where the edge weights (that correspond to agent utilities) are private to the agents constituting that edge, and pose the following natural question: “How does a computationally efficient, truthful algorithm that only has access to each agent’s edge weights in the form of preference rankings perform in comparison to an optimal algorithm that has full knowledge of the weighted graph?”.
Truthfulness in an ordinal world
Mechanisms that are either truthful or ordinal have received extensive attention across the spectrum of optimization problems. However, non-trivial algorithms that satisfy both of these considerations exist only for very specific settings [14, 2]. For instance, the price of ordinality (also referred to as distortion) is well understood for a number of applications such as voting [3, 7], matching [16, 5], facility location , and subset selection [5, 9]. The common thread in all of these settings where the (input) information is often held by the users is that it may be impossible or prohibitively expensive for the agents to express their full utilities to the mechanism; the same agents may incur a smaller overhead if they communicate preference lists over the other users or candidates in the system. Our main contention in this paper is that in exactly the same types of settings, it is reasonable to expect strategic agents to lie about their preferences if it improves their resulting utilities. Motivated by this, we study ordinal algorithms that are also truthful. Even though such mechanisms are clearly less powerful than their ‘ordinal but not necessarily truthful’ counterparts, our high level-level contribution is that for several well-studied graph maximization problems, one can obtain solutions that are only a constant factor away from the (social welfare of the) optimum, omniscient solution.
Model and Problem Statements
The high-level model in this paper is the same as the one in , with the addition of truthfulness as a constraint. The common setting for all the problems studied in this work is an undirected, complete weighted graph whose nodes are the set of self-interested agents with . We use to denote the weight of the edge in the graph for . All of the optimization problems studied in this work involve selecting a subset of edges from that obey some condition, with the objective of maximizing the weight of the edges chosen.
- Max -Matching
Compute the maximum weight matching consisting of exactly edges. We refer to the case as the Weighted Perfect Matching problem.
- -Sum Clustering
Given an integer , partition the nodes into disjoint sets of equal size in order to maximize . (It is assumed that is divisible by ). When , -sum clustering reduces to the weighted perfect matching problem.
- Densest -subgraph
Given an integer , compute a set of size to maximize the weight of the edges inside .
- Max TSP
In the maximum traveling salesman problem, the objective is to compute a tour (cycle that visits each node in exactly once) to maximize .
A crucial but reasonably natural assumption that we make in this work is that the edge weights satisfy the triangle inequality, i.e., for , . For the specific kind of the problems that we study, the metric structure occurs in a number of well-motivated environments such as: social networks, where the property captures a specific notion of friendship, Euclidean metrics: each agent is a point in a metric space which denotes her skills or beliefs, and edit distances: each agent could be represented by a string over a finite alphabet (for e.g., a gene sequence) and the graph weights represent the edit or Levenshtein distances . The reader is asked to refer to  for additional details on these specific applications and a mathematical treatment of friendship in social networks.
Our framework and problem set models a multitude of interesting applications, and not surprisingly, all of the problems described above (with the metric assumption) have been the subject of a dense body of algorithmic work [5, 15, 17, 19]. In many of these applications, it becomes imperative that the algorithm provide good approximation guarantees even in the absence of precise numerical information regarding the graph weights. For instance, one can imagine partitioning a set of wedding guests to form a table assignment (-sum clustering) or selecting a diverse team of agents in order to tackle a complex task (dense subgraph).
In this work, we are interested in the design of algorithms that are both ordinal and truthful. Suppose that for any one of the above problems, we are given an instance described by a weighted graph; then an algorithm
for this problem is said to be ordinal if it has access only to a vector of preference orderings induced by the graph weights. That is, the input to this algorithm consists of a set ofpreference orderings reported by each of the agents, where the preference list corresponding to agent is a ranking over the agents in such that , if prefers to , then .
The algorithm is truthful if no single agent can improve their utility by submitting a preference ordering different from the ‘true ranking’ induced by the graph weights. Here, the utility of each agent is simply the total weight of the edges incident to which are chosen. These utilities have a natural interpretation with respect to the problems considered in this work. For instance, for matching problems, an agent’s utility corresponds to her affinity or weight to the agent to whom she is matched, and for densest subgraph as well as -sum clustering, the utility is her aggregate weight to the agents in the same team or cluster. Our objective in this paper is to design mechanisms that maximize the overall social welfare, i.e., the sum of the utilities of all the agents. Thus, the goal is to select a maximum-weight set of edges while knowing only ordinal preferences (instead of the true weights ), with even the ordinal preferences possibly being misrepresented by the self-interested agents.
Finally, is said to be an ordinal -approximation algorithm for if for any given instance along with the graph weights, the total objective value of the maximum weight solution with respect to the instance weights is at most a factor times the value of the solution returned by , when the input corresponds to the preference rankings induced by the weights. In other words, such algorithms produce solutions which are always a factor away from optimum, without actually knowing what the weights are. We conclude by pointing out that despite the extensive body of work on all of the problems described previously, hardly any of the proposed mechanisms satisfy either truthfulness or ordinality (see Related Work for exceptions), motivating the need for a new line of algorithmic thinking.
Our main results are summarized in Table 1. All of the non-matching problems that we study are NP-Hard even in the full information setting [15, 23, 18]. Our truthful ordinal algorithms provide constant approximation factors for a variety of problems in this setting, showing that even if only ordinal information is presented to the algorithm, and even if the agents can lie about their preferences, we can still form solutions efficiently with close to optimal utility. Note that as seen in Table 1, in  the authors already gave ordinal approximation algorithms for matching problems: those algorithms were not truthful, however, and achieving non-trivial approximation bounds while always giving players incentive to tell the truth requires significant additional work. For example, even the natural, greedy 2-approximation algorithm for Max -matching from  is not truthful.
|Truthful Ordinal||Non-Truthful Ordinal|
|Weighted Perfect Matching|||
In addition to considering truthful mechanisms, we also develop new approximation algorithms for the setting where the agents are not able to lie, and thus the algorithm knows their true preference ordering. By dropping the truthfulness constraint, we are able to obtain better approximation factors for clustering, densest subgraph, and max TSP. The improved results are enabled by more involved algorithmic techniques that invariably sacrifice truthfulness; they establish a clear separation between the performance of an unconstrained ordinal algorithm and one that is required to be truthful.
Techniques: Our proof techniques involve carefully stitching together greedy, random, and serial dictatorship based solutions. Understandably, and perhaps unavoidably for ordinal settings, the algorithmic paradigms that form the bedrock for our mechanisms are rather simple. However, beating the guarantees obtained by a naive application of these techniques involves a more intricate understanding of the interplay between the various approaches. For instance, our algorithm for the weighted perfect matching problem involves mixing between two simple -approximation algorithms (greedy, random) to achieve a -guarantee: towards this end, we establish new tradeoffs between greedy and random matchings showing that when one is far away from the optimum solution, the other one must provably be close to optimum.
Algorithms proposed in the vast matching literature usually belong to one of two classes: Ordinal algorithms that ignore agent utilities, and focus on (unquantifiable) axiomatic properties such as stability, truthfulness, or other notions of efficiency, and Optimization algorithms where the numerical utilities are fully specified. Algorithms belonging to the former class usually do not result in good approximations for the hidden optimum utilities, while techniques used in the latter tend to heavily rely on the knowledge of the exact edge weights and are not suitable for this setting. A notable exception to the above dichotomy is the class of optimization problems studying ordinal measures of efficiency [1, 11, 6, 20], for example, the average rank of an agent’s partner in the matching. Such settings usually involve the definition of ‘new utility functions’ based on given preferences, and thus are fundamentally different from our model where preexisting cardinal utilities give rise to ordinal preferences.
Broadly speaking, the truthful mechanisms in our work fall under the umbrella of ‘mechanism design without money’ [2, 8, 13, 16, 22], a recent line of work on designing strategyproof mechanisms for settings like ours, where monetary transfers are irrelevant. A majority of the papers in this domain deal with mechanisms that elicit agent utilities, specifically for one-sided matchings, assignments and facility location problems that are somewhat different from the graph problems we are interested in. The notable exceptions are the recent papers on truthful, ordinal mechanisms for one-sided matchings [16, 8] and general allocation problems . While  looks at normalized agent utilities and shows that no ordinal algorithm can provide an approximation factor better than ,  considers minimum cost metric matching under a resource augmentation framework. The main differences between our work and these two papers are (1) we consider two-sided matching instead of one-sided, as well as other clustering problems, as well as non-truthful algorithms with better approximation factors, and (2) we consider maximization objectives in which users attempt to maximize their utility instead of minimize their cost. The latter may seem like a small difference, but it completely changes the nature of these problems, allowing us to create many different truthful mechanisms and achieve constant-factor approximations. Finally,  looks at the problem of allocating goods to buyers in a ‘fair fashion’. In that paper, the focus is on maximizing a popular non-linear objective known as the maximin share, which is incompatible with our objective of social welfare maximization. That said, an interesting direction is to see if our techniques extend to other objectives.
As discussed in the Introduction, this paper improves on several results from . In , the authors focused on the problem of maximum-weight matching for the non-truthful setting, with the main result being an ordinal 1.6-approximation algorithm. In the current paper, we greatly extend the techniques from  so that they may be applied to other problems in addition to matching. Moreover, we introduce several new techniques for this setting in order to create truthful algorithms; such algorithms require a somewhat different approach and make much more sense for many of the settings that we are interested in. Other than , these are the first known truthful algorithms for matching and clustering with metric utilities.
Our work is similar in motivation to the growing body of research studying settings where the voter preferences are induced by a set of hidden utilities [3, 7, 10, 4, 9, 14]. The voting protocols in these papers are essentially ordinal approximation algorithms, albeit for a very specific problem of selecting the utility-maximizing candidate from a set of alternatives.
Truthful Ordinal Mechanisms
As mentioned previously, we are interested in designing incentive-compatible mechanisms that elicit ordinal preference information from the users, i.e., mechanisms where agents are incentivized to truthfully report their preferences in order to maximize their utility. We now formally define the notions of truthfulness pertinent to our setting. Throughout the rest of this paper, we will use to represent a true ordinal preference of agent (i.e., one that is induced by the utilities ), and to represent the preference ordering that agent submits to the mechanisms (which will be equal to if tells the truth).
(Truthful Mechanism) A deterministic mechanism is said to be truthful if for every , all , we have that , where is the utility guaranteed to agent by the mechanism.
(Universally Truthful Mechanisms) A randomized mechanism is said to be universally truthful if it is a probability distribution over truthful deterministic mechanisms.
Informally, in a universally truthful mechanism, a user is incentivized to be truthful even when she knows the exact realization of the random variables involved in determining the mechanism.
(Truthful in Expectation) A randomized mechanism is said to be truthful in expectation if an agent always maximizes her expected utility by truthfully reporting her preference ranking. The expectation is taken over the different outcomes of the mechanism.
All of our algorithms are universally truthful, not just in expectation. The reader is asked to refer to  for a useful discussion on the types of randomized mechanisms, and settings where universally truthful mechanisms are strongly preferred as opposed to the mechanisms that only guarantee truthfulness in expectation.
Approaches for Designing Truthful Matching Mechanisms
As a concrete first step towards designing truthful ordinal mechanisms, we introduce three high-level algorithmic paradigms that will form the backbone of all the results in this work. These paradigms are based on the popular algorithmic notions of Greedy, Serial Dictatorship, and Uniformly Random. For each of these paradigms, we develop approaches towards designing truthful mechanisms for the maximum matching problem. In Sections 3 and 4, we develop more sophisticated truthful mechanisms that build upon the simple paradigms presented here, leading to improved approximation factors.
Greedy via Undominated Edges:
Our first algorithm is the ordinal analogue of the classic greedy matching algorithm, that has been extensively applied across the matching literature. In order to better understand this algorithm, we first define the notion of an undominated edge.
(Undominated Edge) Given a set of edges, is said to be an undominated edge if for all and in , and .
We make two simple observations here regarding undominated edges based on which we define Algorithm 1.
Every edge set has at least one undominated edge. In particular, any maximum weight edge in is obviously an undominated edge.
Given an edge set , one can efficiently find at least one undominated edge using only the ordinal preference information .
It is not difficult to see that this algorithm gives a 2-approximation for Max-Weight Perfect Matching, and is truthful for that case. Unfortunately, for Max -Matching with smaller , it is no longer truthful, and thus none of the algorithms that use Greedy as a subroutine (such as the algorithms from ) are truthful.
Algorithm 1 is truthful for the Max -Matching problem only when .
We need to prove that for any given strategy profile adopted by the other players , player maximizes her utility when she is truthful, i.e., if is the true preference ordering of agent and is any set of preference orderings for the other agents, then for any . Our proof will proceed via contradiction and will make use of the following fundamental property: if Algorithm 1 (for some input) matches agent to during some iteration, then both and prefer each other to every other agent that is unmatched during the same round.
We introduce some notation: suppose that denotes the matching output by Algorithm 1 for input , and for every , is the agent to whom is matched to under . Let be the edge added to the matching in round of Algorithm 1, denote the round in which is matched to as round . Assume to the contrary that for input , is matched to an agent she prefers more than . Let the altered matching be referred to as , and let be the agent who is matched with in .
We begin by proving the following claim: For each , we have that . In other words, all the edges which are included into before is matched by Algorithm 1 must appear in both matchings no matter what does. Once we prove this claim, we are done, since is the highest-weight edge from to any node not in , so maximizes its utility by telling the truth and receiving utility equal to the weight of .
To prove the claim above, we proceed by induction. Note that if , then is trivially truthful, since is its top choice in the entire graph. Now suppose that we have shown the claim for edges . Let , and without loss of generality suppose that is matched in our algorithm constructing before . At the time that is matched with , it must be that is the top choice of from all available nodes. But, by the definition of our algorithm, is the top choice of that is not contained in . Since is not contained in due to our inductive hypothesis, this means that prefers over , and since is not matched yet, this means that and will become matched together in . Thus, is in as well. This completes the proof of our claim.
To see why this mechanism is not truthful for smaller , notice that agents which would not be matched in the first steps have incentive to lie and form undominated edges where none exist, all in order to be matched earlier. Assume that the algorithm uses a deterministic tie-breaking rule to choose between multiple undominated edges in each round. While this does not really alter the final output for the perfect matching problem, the tie-breaking rule may lead to certain undominated edges not getting selected for the final matching.
Fix and suppose that when the input preferences are truthful, agents , are not present in the matching returned by Algorithm 1. Moreover, suppose that (1) ’s first preference is , and (2) the deterministic tie-breaking always prefers over other edges (one can design preferences so that agents favoured by the tie-breaking are not selected for truthful inputs).
Clearly has incentive to alter its preferences to identify as its most preferred node and receive a utility of , which is more than its previous utility of zero. ∎
Can we use a similar approach to design algorithms for the other problems that we are interested in? For -sum clustering and Densest -subgraph, one can follow the approach taken in [17, 5], and use the above matching as an intermediate to compute -approximations for the above problems. For Max TSP, we can directly leverage the above algorithm by maintaining as a (forest of) path(s) instead of a matching in order to obtain a -approximate Hamiltonian tour. Unfortunately, as we show in the Appendix, these approaches do not lead to truthful algorithms at all.
Another popular approach to compute incentive compatible matchings (albeit usually for one-sided matchings [8, 16]) is serial dictatorship, which we formally define below for our two-sided matching setting.
Algorithm 2 is universally truthful for the Max -Matching problem for all .
Serial dictatorship is among the most prominent of algorithms to feature in this work: our primary approximation algorithms for Max -matching and Max TSP involve randomized versions of serial dictatorship.
Randomness A much simpler approach that is completely oblivious to the input preferences involves selecting a solution uniformly at random. Such an algorithm (described in Algorithm 3) is obviously truthful. Many of the techniques in this paper rely on carefully combining these three types of algorithms in order to produce good approximation factors while retaining truthfulness.
Algorithm 3 is universally truthful for the Max -matching problem for all .
3 Truthful Mechanisms for Matching
Weighted Perfect Matching
So far, we have looked at two simply approaches for designing truthful mechanisms (Greedy and Random) for the weighted perfect matching problem, both of which yield -approximations  to the optimum matching. Can we do any better? In , the authors use a complex interleaving of greedy and random approaches to extract a non-truthful -approximation algorithm. In this paper, we instead present a simpler algorithm and rather surprising result: a simple random combination of Algorithms 1 and 3 results in a -approximation to the optimum matching. The main insight driving this result is the fact that the random and greedy approaches are in some senses complementary to each other, i.e., on instances where the approximation guarantee for the greedy algorithm is close to , the random algorithm performs much better.
The following algorihm is a universally truthful mechanism for the weighted perfect matching problem that obtains a -approximation to the optimum matching.
Our proof mainly involves non-trivial lower bounds on the performance of the random matching which highlights its complementary nature to the greedy matching. As usual, we begin with notation that allows us to divide the greedy matching into several parts for easy analysis.
Dividing Greedy into Two Halves Suppose that is the output of the greedy algorithm for the given instance, and is the random matching for the same instance. We abuse notation and define , and . Recall that comprises of the top (max-weight) fifty percent of the edges in . We will some times refer to as the top half and the rest of the nodes as the bottom half. Next, define , and let denote . Observe that both and consist of exactly nodes. Finally, suppose that .
Sub-Dividing B We will now go one step further and divide the bottom half into two sub-parts, and , which will aid us in our analysis of the random matching. Define to be the top edges from , i.e., since consists only of edges. Finally, is the final part of the greedy matching, i.e., . As with our previous definitions, and will represent the nodes contained in and respectively.
We begin by highlighting some easy observations in order to get familiar with the various sub-matchings defined above.
consists of nodes and consists of nodes.
No edge in can have a weight larger than .
The first part of the Proposition comes from Lemma B.7. The last part is simply because this is the average of edge weights in .
The rest of the proof involves proving new lower bounds on the weight of the random matching as a function of . Specifically, we will fix the performance of the greedy matching (fix ) and then show that when is small, the random matching’s weight is close to . The reminder of the proof is just basic algebra to bring out the worst-case performance. Let us first formally state our trivial lower bound on the greedy matching.
The weight of the greedy matching is given by:
Before developing the machinery towards our lower bound for the random matching, we will first state our end-goal, which we will prove later. Essentially, our main claim provides an unconditional lower bound for the performance of the random matching as a well as a (conditional) bound for small , which will serve as the worst-case.
The weight of the random matching is always at least
Moreover, when , the following is a tighter lower bound for the random matching
Tackling the Random Matching for Different Cases
We will now prove three lemmas that will act as the main bridges to showing Claim 3.4. These lemmas provide insight on the random matching for different cases depending on the relative weights of and . First, define . We begin by studying the case when is smaller than , i.e., the weights of the edges in are somewhat evenly distributed across and . Moreover, since every edge in is larger than every edge in , the following is an easy lower bound on .
For any given instance where , we have that .
consists of edges whereas consists of edges. ∎
Therefore, the above lemma indicates that when , canot be larger than
Now we give the first of the three lemmas.
Suppose that for a given instance with , with . Then, for , we have that
From Lemma B.13, we get the following generic lower bound for since ,
Moreover, applying Lemma B.5 to , we also get that since there exists a matching () solely on the nodes inside having a weight of . Therefore, it suffices to prove an upper bound on . Recall that consists of exactly nodes, , and . So, directly applying Lemma B.11, we get that,
So, . Putting this inside the generic lower bound for , we complete the proof of this lemma. ∎
We now have a bound for the case when . Next, we provide a universal bound for the other case . Observe that in this case, . We leverage the low weight of to prove the following bound.
Suppose that for a given instance with , with . Then, we have that
Once again, we begin with a generic lower bound on (Corollary B.14) that depends on partitioning the node set into parts (. Notice that .
As with Lemma 3.6, we know that . Now for every edge in , note that the triangle inequality implies that for any node in , going to that node from an endpoint of and coming back to the other endpoint of is larger than the weight of . Summing these up, we get that . Using the fact that gives a slightly simplified version.
So now, it suffices to prove a lower bound on the negative quantities. From Lemma B.9, we get that .
Next, we have to provide an upper bound on in order to complete the proof. We know as per our definitions of , that each edge in the latter is no larger than the smallest edge in the former. Moreover, from Proposition 3.2, we know that is an upper bound on the weight of every edge inside . So, we can directly turn to Lemma B.10 applied specifically to to obtain
where . In conclusion, we have that
We are now ready to complete our (lower) bounds on the negative quantities
Plugging the final inequality into the simplified generic lower bound completes the proof. ∎
A careful inspection of the proof of Lemma 3.7 reveals that our lower bound is a bit loose in two places where we independently replaced with and respectively to provide a worst-case bound. Unfortunately, as a result, the lower bounds for the and cases do not align.
For our purposes however, it is enough to show that the two lower bounds apply when , which we prove below in the third of our lemmas in this subsection.
Suppose that for a given instance with with , we have with . Then, the following lower bound is true
From Lemma B.1, we know that the expression inside the curly parenthesis attains its maximum value for in the given range of . Therefore, substituting , we get
Directly plugging this upper bound into the simplified generic lower bound from Lemma 3.7 is enough to prove the statement in the Lemma. ∎
Final Leg: Proving the Actual Bound
Recall that we pick the random matching with probability and the greedy mathing with probability . Suppose we use to denote the weight of the matching returned by our algorithm. Then,
Since is fixed, it is not hard to see that the quantity is minimized at . Substituting , we get
In this case, we need to use a weaker lower bound for .
Using basic calculus, we observe the expression in the final line is a non-decreasing function of in the range and so, its minimum value is attained at . Substituting this value above, we get ∎
We now move on to the more general Max -matching problem, where the objective is to compute a maximum weight matching consisting only of edges. Our previous results do not carry over to this problem. While we know from  that the greedy algorithm is half-optimal, one can easily construct examples where this is not truthful. On the other hand, the random matching algorithm is truthful but its approximation factor can be as large as . Our main result in this section is based on the Random Serial Dictatorship algorithm that in some sense combines the best of greedy and random into a single algorithm. Such algorithms have received attention for other matching problems [8, 16]; ours is the first result showing that these algorithms can approximate the optimum matching up to a small constant factor for metric settings. Specifically, while serial dictatorship is usually easy to analyze, our algorithm greatly exploits the randomness to select good edges in expectation.
Definition: Random Serial Dictatorship is the same algorithm as Serial Dictatorship (Algorithm 2), except the agents from are picked uniformly at random.
Random serial dictatorship is a universally truthful mechanism that provides a -approximation for the Max -matching problem.
4 Truthful Mechanisms for Other Problems
In this section we present our truthful, ordinal algorithm for Densest -subgraph, which requires techniques somewhat different from the ones outlined in Section 2. While “conventional” approaches such as Greedy and Serial Dictatorship do lead to good approximations for this problem, they are not truthful, whereas random approaches are truthful but result in poor worst-case approximation factors. We combat this problem with a somewhat novel approach that combines the best of both worlds by designing a semi-oblivious algorithm that has the following property: if agent is included in the solution, then changing her preference ordering does not affect the mechanism’s output.
Algorithm 4 is a universally truthful mechanism that yields a -approximation for the Densest -Subgraph problem.
To see why this is truthful, note that for any particular choice of the anchor agent , the only case in which ’s preference ordering makes a difference is when is definitely not added to the final team. Therefore, by lying cannot influence her utility in the event that she is actually chosen.
Remark on size of Without loss of generality, we assume that so that does not become empty before . When , there is a trivial algorithm that yields a -approximation to the optimum densest subgraph (see Appendix D). Since we are interested in asymptotic performance bounds, we also assume that is even. For the rest of this proof, given any set , node , will denote the total weight of the edges inside , and .
Notation We begin by defining some notation pertinent to the analysis. Suppose that our algorithm proceeds in rounds such that in each round, exactly two nodes are added to our set , and at most nodes are removed from . Therefore, consists of nodes after rounds. For ease of notation, we will number the rounds instead of ; thus has nodes at the end of round . Further, define to be the random set of selected nodes after round , i.e., for any instantiation of this random set.
Next, let us examine the inner workings of the algorithm. Look at any round , the algorithm works by selecting a triplet , where is referred to as the anchor node, is a node selected uniformly at random, and is ’s most preferred agent in , (let be the random set of available nodes at the beginning of round ). For the rest of this proof, we will use to denote the random triplet of nodes selected in round . Notice that for a given (ordered) triplet , the algorithm adds to with probability half and to also with the same probability.
Finally, we use to denote the weight of the optimum solution to the Densest -subgraph problem when , and to be the expected weight of the solution output by our algorithm for the same cardinality, i.e., . Let represent the expected increase in the weight of the solution output by our algorithm from to , i.e., . We will prove by induction on even that . More specifically, we will show that for each , .
Proof by Induction:
(Base Case: ) .
The base case is quite straightforward. Suppose that is the heaviest edge in . Clearly, . Next, let be any two agents, and let denote ’s most preferred agent in . Then, we claim that .
The above claim can be proved in two cases: first, suppose that is indeed ’s favorite node in . Then, as per Lemma D.3, . In the second case, if ’s most preferred node in and do not coincide, the only possibility is that is ’s most preferred node in , and by the same lemma .
To complete the base case, consider any instantiation of the random triplet, . We have that with probability and otherwise. Therefore, for this instantiation . Taking the expectation over every such triplet, we get the desired base claim.
Inductive Claim: To Prove
Recall that denotes the random set of chosen nodes at the end of round . We know from the induction hypothesis that . Consider some specific instantiation of , call it , and for this instantiation, let denote some random triplet selected by the algorithm in round , i.e., we have a specific instantiation of and for our algorithm. As usual, for this triplet, , is the anchor node, is the random node and is ’s most preferred node in .
Suppose that is the increase in the expected weight of the solution returned by our algorithm during round for this specific instantiation of ., i.e., . Our proof will proceed as follows: we establish an upper bound for in terms of , and then take the expectation over all possible instantiations to get the actual bound.
Before starting with the proof of the inductive claim, we define some auxiliary notation that will allow us to process as a sequence of additions in each round, so that we can compare the addition to in round to that of our algorithm in the same round. Fix to be any two nodes in , and let . will act as a proxy to in our proofs. Notice that . Finally, in order to avoid messy notation, assume that is ’s (most) preferred node in . If this is not the case (and this can happen with a small probability), then ’s most preferred node in has to be . We deal with this case separately in Section 4 although the proof is quite similar.
We begin with a nice lower bound for . Suppose that .
(Lower Bound for our Algorithm)
Recall that . Simplifying the expression, we get
Consider the first two terms inside the square brackets. We can divide into and and simplify the two parts as follows,
The right most term in the RHS simply comes from the triangle inequality since for any , . Now for the second part, which also follows from the triangle inequality,
To wrap up the proof, we apply Lemma D.4 to to get . An additional can extracted from . Adding up the various parts completes the lemma. ∎
Before showing our upper bound on , we present a simple lemma that allows us to relate the weights of any given node to the members of a set in terms of and the weight of to the members of that set. Recall the definitions of .
Suppose that are as defined previously. Then,
(Part I) The proof proceeds as follows: remember that since is ’s most preferred node in , for any , . This includes . Moreover, for any , as per Lemma D.3.
(Part II) The proof of the second part is almost the same as the first, except that for any , we have that , once again as the product of Lemma D.3.
Upper Bound on to Complete the Inductive Claim
Now we express in terms of for the given instantiation.
Consider and remember that by definition . We can divide up in two ways.