Towards a Unified Theory of Sparsification for Matching Problems

11/05/2018 ∙ by Sepehr Assadi, et al. ∙ University of Pennsylvania 0

In this paper, we present a construction of a `matching sparsifier', that is, a sparse subgraph of the given graph that preserves large matchings approximately and is robust to modifications of the graph. We use this matching sparsifier to obtain several new algorithmic results for the maximum matching problem: * An almost (3/2)-approximation one-way communication protocol for the maximum matching problem, significantly simplifying the (3/2)-approximation protocol of Goel, Kapralov, and Khanna (SODA 2012) and extending it from bipartite graphs to general graphs. * An almost (3/2)-approximation algorithm for the stochastic matching problem, improving upon and significantly simplifying the previous 1.999-approximation algorithm of Assadi, Khanna, and Li (EC 2017). * An almost (3/2)-approximation algorithm for the fault-tolerant matching problem, which, to our knowledge, is the first non-trivial algorithm for this problem. Our matching sparsifier is obtained by proving new properties of the edge-degree constrained subgraph (EDCS) of Bernstein and Stein (ICALP 2015; SODA 2016)---designed in the context of maintaining matchings in dynamic graphs---that identifies EDCS as an excellent choice for a matching sparsifier. This leads to surprisingly simple and non-technical proofs of the above results in a unified way. Along the way, we also provide a much simpler proof of the fact that an EDCS is guaranteed to contain a large matching, which may be of independent interest.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

A common tool for dealing with massive graphs is sparsification. Roughly speaking, a sparsifier of a graph is a subgraph that (approximately) preserves certain properties of while having a smaller number of edges. Such sparsifiers have been studied in great detail for various properties: for example, a spanner [6, 29] or a distance preserver [18, 20] preserves pairwise distances, a cut sparsifier [26, 11, 22] preserves cut information, and a spectral sparsifier [32, 8] preserves spectral properties of the graph. An additional property that we often require of a graph sparsifier is robustness: it should continue to be a good sparsifier even as the graph changes. Some sparsifiers are robust by nature (e.g cut sparsifiers), but others (e.g spanners) are not, and for this reason there is an extensive literature on designing sparsifiers that can provide additional robustness guarantees.

In this paper, we study the problem of designing robust sparsifiers for the prominent problem of maximum matching. Multiple notions of sparsification for the matching problem have already been identified in the literature. One example is a subgraph that preserves the largest matching inside any given subset of vertices in approximately. This notion is also known as a matching cover or a matching skeleton [23, 27] in the literature and is closely related to the communication and streaming complexity of the matching problem. Another example of a sparsifier is a subgraph that can preserve the largest matching on random subsets of edges of , a notion closely related to the stochastic matching problem [15, 5]. An example of a robust sparsifier for matching is a fault-tolerant subgraph, namely a subgraph that continue to preserve large matchings in even after a fraction of the edges is deleted by an adversary. As far as we know, the fault-tolerant matching problem has not previously been studied, but it is a natural model to consider as it has received lots of attention in the context of spanners and distance preservers (see e.g.  [19, 28, 7, 17, 16]).

Our first contribution is a subgraph that we show is a robust matching sparsifier in all of the senses above. Our result is thus the first to unify these notions of sparsification for the maximum matching problem. In addition to unifying, our construction yields improved results for each individual notion of sparsification and the corresponding problems, namely, the one-way communication complexity of matching, stochastic matching, and fault-tolerant matching problems. Interestingly, our unified approach allows us to also provide much simpler proofs than all previously existing work for these problems. The subgraph we use as our sparsifier comes from a pair of papers by Bernstein and Stein on dynamic matching [13, 14]—they refer to this subgraph as an edge-degree constrained subgraph (EDCS for short). The EDCS was also very recently used in [2] to design sublinear algorithms for matching across several different models for massive graphs. Our applications of the EDCS in the current paper, as well as the new properties we prove for the EDCS, are quite different from those in  [13, 14, 2]. Our first contribution thus takes an existing subgraph, and then provides the first proofs that it satisfies the three notions of sparsification described above.

Our second contribution is a much simpler (and even slightly improved) proof of the main property of an EDCS in previous work proved in [13, 14], namely that an EDCS contains a large matching of the original graph. Our new proof significantly simplifies the analysis of [14] and allows for simple and self-contained proofs of the results in this paper.

Definition of the EDCS.

Before stating our results, we give a definition of the EDCS from [13, 14], as this is the subgraph we use for all of our results (see Section 2 for more details).

Definition 1 ([13]).

For any graph and integers , an edge-degree constrained subgraph (EDCS) is a subgraph of with the following two properties:

  1. For any edge : .

  2. For any edge : .

It is not hard to show that an EDCS of a graph always exists for any parameters and that it is sparse, i.e., only has edges. A key property of EDCS proven previously [13, 14] (and simplified in our paper) is that for any reasonable setting of the parameters (e.g. being sufficiently close to ), any EDCS of contains an (almost) approximate matching of .

1.1 Our Results and Techniques

We now give detailed definitions of the notions of sparsification and the corresponding problems addressed in this paper, as well as our results for each one. Our second contribution—a significantly simpler proof that an EDCS contains an almost -approximate matching—is left for Section 3.

One-Way Communication Complexity of Matching.

Consider the following two-player communication problem: Alice is given a graph and Bob holds a graph . The goal for Alice is to send a single message to Bob such that Bob outputs an approximate maximum matching in . What is the minimum length of the message, i.e., the one-way communication complexity, for achieving a certain fixed approximation ratio on all graphs? One can show that the message communicated by Alice to Bob is indeed a matching skeleton, namely a data structure (but not necessarily a subgraph), that allows Bob to find a large matching in a given subset of vertices in Alice’s input (see [23] for more details).

This problem was first studied by Goel, Kapralov, and Khanna [23] (see also the subsequent paper of Kapralov [25]), owing to its close connection to one-pass streaming algorithms for matching. Goel et al.  [23] designed an algorithm that achieves a -approximation in bipartite graphs using only communication and proved that any better than -approximation protocol requires communication even on bipartite graphs (see, e.g. [23, 4] for further details on this lower bound). A follow-up work by Lee and Singla [27] further generalized the algorithm of [23] to general graphs, albeit with a slightly worse approximation ratio of (compared to of [23]).

We extends the results in [23] to general graphs with almost no loss in approximation.

[backgroundcolor=lightgray!40,topline=false,rightline=false,leftline=false,bottomline=false,innertopmargin=2pt]

Result 1.

For any constant , the protocol where Alice computes an EDCS of her graph with and and sends it to Bob is a -approximation one-way communication protocol for the maximum matching problem with uses communication.

We remark that both the previous algorithm of [23] as well as its extension in [27] are quite involved and rely on a fairly complicated graph decomposition as well as an intricate primal-dual analysis. As such, we believe that the main contribution in Result 1 is in fact in providing a simple and self-contained proof of this result.

Stochastic Matching.

In the stochastic matching problem, we are given a graph

and a probability parameter

. A realization of is a subgraph obtained by picking each edge in independently with probability to include in . The goal in this problem is to find a subgraph of with max-degree bounded by a function of (independent of number of vertices), such that the size of maximum matching in realizations of is close to size of maximum matching in realizations of . It is immediate to see that in this problem is simply a sparsifier of which preserves large matchings on random subsets of edges.

This problem was first introduced by Blum et al.  [15] primarily to model the kidney exchange setting and has since been studied extensively in the literature [3, 5, 10, 34]. Early algorithms for this problem in [15, 3] (and the later ones for the weighted variant of the problem [10, 34]) all had approximation ratio at least , naturally raising the question that whether is the best approximation ratio achievable for this problem. Assadi, Khanna, and Li [5] ruled out this perplexing possibility by obtaining a slightly better than -approximation algorithm for this problem, namely an algorithm with approximation ratio close to (which improves to for small ).

We prove that using an EDCS results in a significantly improved algorithm for this problem.

[backgroundcolor=lightgray!40,topline=false,rightline=false,leftline=false,bottomline=false,innertopmargin=2pt]

Result 2.

For any constant , an of with and achieves a -approximation algorithm for the stochastic matching problem with a subgraph of maximum degree .

We remark that our bound on the maximum degree in Result 2 is optimal (up to an factor) for any constant-factor approximation algorithm (see [5]). In addition to significantly improving upon the previous best algorithm of [5], our Result 2 is much simpler than that of [5], in terms of the both the algorithm and (especially) the analysis.

Remark. Independently and concurrently, Behnezhad et al.  [9] also presented an algorithm for stochastic matching with a subgraph of max-degree that achieves an approximation of almost ( compared to in Result 2). They also provided an algorithm with approximation ratio strictly better than half for weighted stochastic matching (our result does not work for weighted graphs). In terms of techniques, our paper and [9] are entirely disjoint.

Fault-Tolerant Matching.

Let be an integer, be a graph, and be any subgraph of . We say that is an -approximation -tolerant subgraph of iff for any subset of size , the maximum matching in is an -approximation to maximum matching in – that is, is a robust sparsifier of . This definition is a natural analogy of other fault-tolerant subgraphs, such as fault-tolerant spanners and fault-tolerant distance preservers (see, e.g. [19, 28, 7, 17, 16]), to the maximum matching problem. Despite being such fundamental objects, quite surprisingly fault-tolerant subgraphs have not previously been studied for the matching problem.

We complete our discussion of applications of EDCS as a robust sparsifier by showing that it achieves an optimal size fault-tolerant subgraph for the matching problem.

[backgroundcolor=lightgray!40,topline=false,rightline=false,leftline=false,bottomline=false,innertopmargin=2pt]

Result 3.

For any constant and any integer , there exists a -approximation -tolerant subgraph of any given graph with edges in total.

The number of edges used in our fault-tolerant subgraph in Result 3 is clearly optimal (up to constant factors). In Appendix A.2, we show that by modifying the lower bound of [23] in the communication model, we can also prove that the approximation ratio of is optimal for any -tolerant subgraph with edges, hence proving that Result 3 is optimal in a strong sense. We also show that several natural strategies for this problem cannot achieve better than -approximation, hence motivating our more sophisticated approach toward this problem (see Appendix A.3).

The qualitative message of our work is clear: An EDCS is a robust matching sparsifier under all three notions of sparsification described earlier, which leads to simpler and improved algorithms for a wide range of problems involving sparsification for matching problems in a unified way.

Overall Proof Strategy

Recall that our algorithm in all of the results above is simply to compute an EDCS of the input graph (or in the communication problem). The analysis then depends on the specific notion of sparsification at hand, but the same high- level idea applies to all three cases. In each case, we have an original graph , and then a modified graph produced by changes to : is in the communication model, the realized subgraph in the stochastic matching, and the graph after adversarially removing edges in the fault-tolerant matching problem. Let be the EDCS that our algorithm computes in , and let be the graph that results from due to the modifications made to . If we could show that is an EDCS of then the proof would be complete, since we know that an EDCS is guaranteed to contain an almost -approximate matching. Unfortunately, in all the three problems that we study it might not be the case that is an EDCS of . Instead in each case we are able to exhibit subgraphs and such that is an EDCS of , and size of maximum matching of and differ by at most a factor. This guarantees an approximation ratio of almost (precisely what we achieve in all three results above), since the EDCS preserves the maximum matching in to within an almost -approximation and is a subgraph of .

Organization.

The rest of the paper is organized as follows. Section 2 includes notation, simple preliminaries, and existing work on the EDCS. In Section 3, we present a significantly simpler proof of the fact that an EDCS contains an almost -approximation matching (originally proved in [14]). Sections 45, and 6 prove the sparsification properties of the EDCS in, respectively, the one-way communication complexity of matching (Result 1), the stochastic matching problem (Result 2), and the fault-tolerant matching problem (Result 3). These three sections are designed to be self-contained (beside assuming the background in Section 2) to allow the reader to directly consider the part of most interest. The appendix contains some secondary observations.

2 Preliminaries and Notation

Notation.

For any integer , . For a graph and a set of vertices , denotes the neighbors of vertices in in and denotes the set of edges incident on . Similarly, for a set of edges , denotes the set of vertices incident on these edges. For any vertex , we use to denote the degree of in (we may drop the subscript in these definitions if it is clear from the context). We use to denote the size of the maximum matching in the graph .

Throughout the paper, we use the following two standard variants of the Chernoff bound.

Proposition 2.1 (Chernoff Bound).

Suppose are

independent random variables that take values in

. Let and assume . For any and integer ,

We also need the following basic variant of Lovasz Local Lemma (LLL).

Proposition 2.2 (Lovasz Local Lemma; cf. [21, 1]).

Let and . Suppose are events such that for all and each is mutually independent of all but (at most) other events . If then .

Hall’s Theorem.

We use the following standard extension of the Hall’s marriage theorem for characterizing maximum matching size in bipartite graphs.

Proposition 2.3 (Extended Hall’s marriage theorem; cf. [24]).

Let be any bipartite graph with . Then, where ranges over or . We refer to such set as a witness set.

Proposition 2.3 follows from Tutte-Berge formula for matching size in general graphs [33, 12] or a simple extension of the proof of Hall’s marriage theorem itself

Previously Known Properties of the EDCS

Recall the definition of an EDCS in Definition 1. It is not hard to show that an EDCS always exists as long as (see, e.g. [2]). For completeness, we repeat the proof in the Appendix A.1.

Proposition 2.4 (cf. [13, 14, 2]).

Any graph contains an EDCS for any parameters , which can be found in polynomial time.

The key property of an EDCS, originally proved in  [13, 14], is that it contains an almost -approximate matching.

Lemma 2.5 ([13, 14]).

Let be any graph and be a parameter. For parameters , , and , in any subgraph , .

Another particularly useful (technical) property of an EDCS is that it “balances” the degree of vertices and their neighbors in the EDCS; this property is implicit in [13] but we explicitly state and prove it here as it shows a main distinction in the properties of EDCS compared to more standard (and less robust) subgraphs in this context such as -matchings.

Proposition 2.6.

Let and be any subset of vertices. If average degree of in is then the average degree of from edges incident on is at most .

Proof.

Let be a subgraph of containing the edges incident on . Let and . We are interested in upper bounding the quantity . Firstly, by Property (P1) of EDCS, we have that We write the LHS in this equation as:

(as deg_H’(u)deg_H’(w) and each is minimized when the summands are equal.)

By plugging in this bound in LHS above, we obtain , finalizing the proof.       

3 A Simpler Proof of the Key Property of an EDCS

In this section we provide a much simpler proof of the key property that an EDCS contains an almost -approximate matching. This lemma was previously used in [13, 14, 2]. Our proof is self-contained to this section, and for general graphs, our new proof even improves the dependence of on parameter from to (roughly) , thus allowing for an even sparser EDCS.

The proof contains two steps. We first give a simple and streamlined proof that an EDCS contains a -approximate matching in bipartite graphs. Our proof in this part is similar to [13] but instead of modeling matchings as flows and using cut-flow duality, we directly work with matchings by using Hall’s theorem. The main part of the proof however is to extend this result to general graphs. For this, we give a simple reduction that extends the result on bipartite graphs to general graphs by taking advantage of the “robust” nature of EDCS. This allows us to bypass the complicated arguments in [14] specific to non-bipartite graphs and to obtain the result directly from the one for bipartite graphs (the paper of [14] explicitly acknowledges the complexity of the proof and asks for a more “natural” approach).

A Slightly Simpler Proof for Bipartite Graphs

Our new proof should be compared to Lemma 2 in Section 4.1 of the Arxiv version of [13].

Lemma 3.1.

Let be any bipartite graph and be a parameter. For , , and , in any subgraph , .

Proof.

Fix any and let be any of its witness sets in extended Hall’s marriage theorem of Proposition 2.3 and . Without loss of generality, let us assume is a subset of . Define , (see Figure 1). By Proposition 2.3,

(1)

On the other hand, since has a matching of size , we need to have a matching of size between and as otherwise by Proposition 2.3, would be a witness set in that implies the maximum matching of is smaller than (to see why the set of edges between and is a matching simply apply Proposition 2.3 to a subgraph of containing only a maximum matching of ). Let be the end points of this matching (see Figure 1). As edges in are all missing from , by Property (P2) of EDCS , we have that,

(2)

Consequently, as , the average degree of is . As such, by Proposition 2.6, the average degree of of (from ) is at most . Finally, note that as there are no edges between and in , and hence by Eq (1), . By double counting the number of edges between and , i.e., :

This implies that,

Reorganizing the terms above, finalizes the proof.       

(a) and form a Hall’s theorem witness set in the EDCS and .

(b) There is a matching of size between and (i.e., the set ) in .
Figure 1: The partitioning of vertices used in the proof of Lemma 3.1.

A Much Simpler Proof for Non-bipartite Graphs

Our new proof in this part should be compared to Lemma 5.1 on page 699 in [14]: see Appendix B of their paper for the full proof, as well Section 4 for an additional auxiliary claim needed.

Lemma 3.2.

Let be any graph and be a parameter. For , , and , in any subgraph , .

Proof.

The proof is based on the probabilistic method and Lovasz Local Lemma. Let be a maximum matching of size in . Consider the following randomly chosen bipartite subgraph of with respect to , where :

  • For any edge , with probability , belongs to and belongs to , and with probability , the opposite (the choices between different edges of are independent).

  • For any vertex not matched by , we assign to or uniformly at random (again, the choices are independent across vertices).

  • The set of edges in are all edges in with one end point in and the other one in .

Define . We argue that as is an EDCS for , also remains an EDCS for with non-zero probability. Formally,

Claim 3.3.

is an EDCS for and with probability strictly larger than zero (over the randomness of ).

Before we prove Claim 3.3, we argue why it implies Lemma 3.2. Let be chosen such that is an EDCS for parameters in Claim 3.3 (by Claim 3.3, such a choice of always exist). By construction of , and hence . On the other hand, is now a bipartite graph and is its EDCS with appropriate parameters. We can hence apply Lemma 3.1 and obtain that . As , , and hence , proving the assertion in the lemma statement. It thus only remains to prove Claim 3.3.

Proof of Claim 3.3.

Fix any vertex , let and be the neighbors of in . Let us assume is chosen in in (the other case is symmetric). Hence, degree of in is exactly equal to the number of vertices in that are chosen in . As such, by construction of , ( iff is incident on ). Moreover, if two vertices in are matched by , then exactly one of them appears as a neighbor to in and otherwise the choices are independent. Hence, by Chernoff bound (Proposition 2.1),

(as and hence )

Define as the event that . Note that depends only on the choice of vertices in and hence can depend on at most other events for vertices which are neighbors to (recall that for all , in by Property (P1) of EDCS). As such, we can apply Lovasz Local Lemma (Proposition 2.2) to argue that with probability strictly more than zero, happens. In the following, we condition on this event and argue that in this case, is an EDCS of with appropriate parameters. To do this, we only need to prove that both Property (P1) and Property (P2) hold for the EDCS (with the choice of and ).

We first prove Property (P1) of EDCS . Let be any edge in . By events and ,

where the second inequality is by Property (P1) of EDCS as belongs to as well. We now prove Property (P2) of EDCS . Let be any edge in . Again, by and ,

where the second inequality is by Property (P2) of EDCS as .

 

Lemma 3.2 now follows immediately from Claim 3.3 as argued above.

 

4 One-Way Communication Complexity of Matching

In the one-way communication model, Alice and Bob are given graphs and , respectively, and the goal is for Alice to send a small message to Bob such that Bob can output a large approximate matching in . In this section, we show that if Alice communicates an appropriate EDCS of , then Bob is able to output an almost -approximate matching.

Theorem 1 (Formalizing Result 1).

There exists a deterministic poly-time one-way communication protocol that given any , computes a -approximation to maximum matching using communication from Alice to Bob.

Theorem 1 is based on the following protocol:

[ enlarge top by=5pt, enlarge bottom by=5pt, breakable, drop shadow=black, boxsep=0pt, left=4pt, right=4pt, top=10pt, arc=0pt, boxrule=1pt,toprule=1pt, colback=white ]

A one-way communication protocol for maximum matching.

  1. Alice computes for and sends it to Bob.

  2. Bob computes a maximum matching in and outputs it as the solution.

By Proposition 2.4, the EDCS computed by Alice always exists and can be found in polynomial time. Moreover, by Property (P1) of EDCS , the total number of edges (and hence the message size) sent by Alice is . We now prove the correctness of the protocol which concludes the proof of Theorem 1.

Lemma 4.1.

.

Proof.

Let be a maximum matching in and and be its edges in and , respectively. Let and note that simply because belongs to . Define the following subgraph (and hence ): contains all edges in and any edge such that . In the following, we prove that , which finalizes the proof as .

We show that is an EDCS and apply Lemma 3.2 to argue that contains a -approximate matching of . We prove the EDCS properties of using the fact that for , as is obtained by adding a matching () to .

  • Property (P1) of EDCS : For an edge ,

    if then: (by Property (P1) of EDCS of )
    if then: (as M^⋆ is inserted to iff deg_H(u)deg_H(v))
  • Property (P2) of EDCS : For an edge ,

    if then: (by Property (P2) of EDCS of )
    if then: (as M^⋆ is not inserted to iff deg_H(u)deg_H(v))

As such, is an EDCS. By Lemma 3.2 and the choice of parameter , we obtain that , finalizing the proof.       

5 The Stochastic Matching Problem

Recall that in the stochastic matching problem, the goal is to compute a bounded-degree subgraph of a given graph , such that is a good approximation of , where is a realization of (i.e a subgraph where every edge is sampled with probability ), and . In this section, we formalize Result 2 by proving the following theorem.

Theorem 2 (Formalizing Result 2).

There exists a deterministic poly-time algorithm that given a graph and parameters with , computes a subgraph of with maximum degree such that the ratio of the expected size of a maximum matching in realizations of to realizations of is at most , i.e., .

We note that while in Theorem 2, we state the bound in expectation, the same result also holds with high probability as long as (i.e., just barely more than a constant), by concentration of maximum matching size in edge-sampled subgraphs (see, e.g. [2], Lemma 3.1). The algorithm in Theorem 2 simply computes an EDCS of the input graph as follows:

[ enlarge top by=5pt, enlarge bottom by=5pt, breakable, drop shadow=black, boxsep=0pt, left=4pt, right=4pt, top=10pt, arc=0pt, boxrule=1pt,toprule=1pt, colback=white ]

An algorithm for the stochastic matching problem.

Output the subgraph for , for large enough constant .

By Proposition 2.4, the EDCS in the above algorithm always exists and can be found in polynomial time. Moreover, by Property (P1) of EDCS , the total number of edges in this subgraph is . We now prove the bound on the approximation ratio which concludes the proof of Theorem 2 (by re-parametrizing to be a constant factor smaller).

Lemma 5.1.

Let denote a realization of ; then where the randomness is taken over the realization of .

Suppose first that were an EDCS of ; we would be immediately done in this case as we could have applied Lemma 3.2 directly and prove Lemma 5.1. Unfortunately, however, this might not be the case. Instead, we exhibit subgraphs and with the following properties:

  1. , where the expectation is taken over the realization of .

  2. is an EDCS for .

Showing these properties concludes the proof of Lemma 5.1, as for the EDCS in item (2) above, we have , so by Lemma 3.2 we get that . Combining this with item (1) then concludes .

It now remains to exhibit and that satisfy the main properties stated above. Note that for any vertex , we have by definition of a realization (and hence ). We now want to separate out vertices that deviate significantly from this expectation.

Definition 2.

Let contain all vertices for which . Similarly, let contain all vertices such that OR there exists an edge such that , i.e., if is neighbor to .

Claim 5.2.

and , where the expectation is over the realization of . As we a result we also have .

Before proving this claim, let us consider why it completes the proof of Lemma 5.1.

(a) Realized Graph .

(b) Subgraph .

(c) Subgraph .
Figure 2: Illustration of the sets and the subgraphs and in the proof of Lemma 5.1 on a bipartite graph . Here, (green) solid lines denote the edges of that appear in each subgraph and (red) dashed lines denote the edges of .
Proof of Lemma 5.1 (assuming Claim 5.2).

To prove Lemma 5.1 it is enough to show the existence of subgraphs and that satisfy the properties above. We define as follows: the vertex set is and the edge-set is the same as , except we remove all edges incident to and all edges that are incident to . We define to be the subgraph of induced by the vertex set , that is, contains all edges of except those incident to ; see Figure 2.

For item (1), note that differs from by vertices in , so . It is also clear that (as each edge in is sampled w.p. in ). By Claim 5.2,