The maximum matching problem is a classic problem in combinatorial optimization. For polynomial-time computation, efficient algorithms exist both for the unweighted (cardinality) version and the weighted version. However, in other models of computation, the weighted version turns out to be significantly harder, and better algorithms are known in the unweighted case. In fact, in some settings such as online algorithms, the weighted version isprovably much harder than the unweighted case. In other models, such as streaming and massively parallel computation (MPC), no such results are known. Instead the performance gap in the algorithms for unweighted and weighted matchings seems to arise due to a lack of techniques. The goal of this paper is to address this by developing new techniques for weighted matchings.
In the (semi-)streaming model the edges of the graph arrive one-by-one and the algorithm is restricted to use memory that is almost linear in the number of vertices. For unweighted graphs, the very basic greedy algorithm guarantees to return a -approximate maximum matching. It is a major open problem to improve upon this factor when the order of the stream is adversarial. In the random-edge-arrival setting — where the edges of the stream are presented in a random order — algorithms that are more advanced than the greedy algorithm overcome this barrier [KMM12]. In contrast, for weighted graphs a -approximation algorithm was given only recently for adversarial streams [PS17, GW19], and here we give the first algorithm that breaks the natural “greedy” barrier of for random-edge-arrival streams:
There is a -approximation algorithm for finding weighted matchings in the streaming model with random-edge-arrivals, where is an absolute constant.
As we elaborate below, the result is achieved via a general approach that reduces the task of finding weighted matchings to that of finding (short) unweighted augmenting paths. This allows us to incorporate some of the ideas present in the streaming algorithms for unweighted matchings to achieve our result. Our techniques, perhaps surprisingly, also simplify the previous algorithms for finding unweighted matchings, and give an improved guarantee for general graphs.
The idea to reduce to the problem of finding unweighted augmenting paths is rather versatile, and we use it to obtain a general reduction from weighted matchings to unweighted matchings as our second main result. We give implementations of this reduction in the models of multi-pass streaming and MPC that incur only a constant factor overhead in the complexity. In multi-pass streaming, the algorithm is (as for single-pass) restricted to use memory that is almost linear in the number of vertices and the complexity is measured in terms of the number of passes that the algorithm requires over the data stream. In MPC, parallel computation is modeled by parallel machines with sublinear memory (in the input size) and data can be transferred between machines only between two rounds of computation (see Section 2 for a precise definition). The complexity of an algorithm in the MPC model, also referred to as the round complexity, is then measured as the number of (communication) rounds used.
Both the streaming model and the MPC model, which encompasses many of today’s most successful parallel computing paradigms such as MapReduce and Hadoop, are motivated by the need for devising efficient algorithms for large problem instances. As data and the size of instances keep growing, this becomes ever more relevant and a large body of recent work has been devoted to these models. For the matching problem, McGregor [McG05] gave the first streaming algorithm for approximating unweighted matchings within a factor that runs in a constant number of passes (depending only on ); the dependency on was more recently improved for bipartite graphs [AG13, EKMS12]. McGregor’s techniques for unweighted matchings have been very influential. In particular, his general reduction technique can be used to transform any -approximation unweighted matching algorithm that uses MPC rounds into a approximation unweighted matching algorithm that uses rounds in the MPC model. This together with a sequence of recent papers [ABB19, CŁM18, GGK18], that give constant-factor approximation algorithms for unweighted matchings with improved round complexity, culminated in algorithms that find -approximate maximum unweighted matchings in rounds. However, as McGregor’s techniques apply to only unweighted matchings, it was not known how to achieve an analogous result in the presence of weights. In fact, McGregor raised as an open question whether his result can be generalized to weighted graphs. Our result answers this in the affirmative and gives a reduction that is lossless with respect to the approximation guarantee while only increasing the complexity by a constant factor. Moreover, our reduction is to bipartite graphs. Instantiating this with the aforementioned streaming and MPC algorithms for unweighted matchings yields the following222Throughout the paper, we denote by the number of vertices and by the number of edges.:
There exists an algorithm that in expectation finds a -approximate weighted matching that can be implemented
in rounds, machines per round, and memory per machine, where is the number of rounds used by a -approximation algorithm for bipartite unweighted matching using machines per round, and memory per machine in the MPC model, and
in passes and memory, where is the number of passes used by a -approximation algorithm for bipartite unweighted matching using memory, in the multi-pass streaming model,
Prior to this, the best known results for computing a -approximate weighted matching required super constant many passes over the stream in the streaming model [AG13] and rounds [AG18] in the MPC model. We remark that if we allow for memory per machine in the MPC model, then [AG18] gave an algorithm that uses only a constant number of rounds (depending on ). Achieving a similar result with near linear memory per machine is a major open question in the MPC literature; our results show that it is sufficient to concentrate on unweighted graphs as any progress on such graphs gives analogous progress in the weighted setting. We now give an outline of our approach.
1.1 Outline of Our Approach
Let be a matching in a graph with edge-weights . Recall that an alternating path is a path in that alternates between edges in and in . If the endpoints of are unmatched vertices or incident to edges in , then removing the -edges in and adding the other edges of gives a new matching. In other words, is a new matching. We say that we updated using the alternating path , and we further say that is augmenting if where we used the notation for a subset of edges . Also recall that an alternating cycle is a cycle that alternates between edges in and in , and is also a matching. We say that is augmenting if . Now a well-known structural result regarding approximate matchings is the following:[hidealllines=true, backgroundcolor=gray!15]
Fact 1.3.For any , if there is no augmenting path or cycle of length at most , then is a -approximate matching.
In particular, this says that in order to find a -approximate matching it is sufficient to find augmenting paths or cycles of length . This is indeed the most common route used to design efficient algorithms for finding approximate matchings: in the streaming model with random-edge-arrivals, [KMM12] finds augmenting paths of length and the MPC algorithms [ABB19, CŁM18] find augmenting paths of length . However, those approaches work only for unweighted graphs. The high level reason being that it is easy to characterize the augmenting paths in the unweighted setting: they simply must start and end in unmatched vertices. Such a simple classification of augmenting paths is not available in the weighted setting and the techniques of those papers do not apply. Nevertheless, we propose a general framework to overcome this obstacle that allows us to tap into the results and techniques developed for unweighted matchings. Informally, we reduce the problem of finding augmenting paths in the weighted setting to the unweighted setting.
The high level idea is simple: Consider the example depicted on the left in Fig. 1. The current matching consists of a single edge that is depicted by a solid line. The weights are written next to the edges and so (the edges are dashed). The maximum matching consists of and has weight . Furthermore, there are several alternating paths of length that are also augmenting. However, it is important to note that we cannot simply apply an algorithm for finding unweighted augmenting paths. Such an algorithm may find the alternating path which is augmenting in the unweighted sense but . To overcome this, we apply a filtering technique that we now explain in our simple example: First “guess” lower bounds on the weights of the edges incident to and in an augmenting path. Let and be those lower bounds. We then look for augmenting paths in the unweighted graph that keeps only those unmatched edges incident to and whose weights are above the guessed thresholds. Then to guarantee that an unweighted augmenting path that an algorithm finds is also an augmenting path in the weighted sense, we always set and such that . In the center and right part of Fig. 1 we depict two unweighted graphs obtained for different values of and (in the center with and to the right with ). Note that in both examples any unweighted augmenting path is also augmenting with respect to the weights.
While the implementation of the basic idea is simple in the above case, there are several challenges in general. Perhaps the most obvious one is that, for weighted matchings, may be a perfect matching but still far from optimal. And a perfect matching obviously has no unweighted augmenting paths! On a very high level, we overcome this issue by dropping edges in while making sure to set the guessed lower bounds (the ’s) so as to guarantee that any unweighted augmenting path is also a weighted augmenting path (even when taking the dropped edges into account).
In what follows, we describe in more detail the implementation of the above basic idea. We start with the simpler case, single-pass streaming with random edge arrivals, where we look only for augmenting paths of length . We then describe the technically more involved multi-pass streaming and MPC algorithms that consider long augmenting paths and cycles.
1.1.1 Single-pass Streaming with Random Edge Arrivals
In contrast to unweighted graphs where the basic greedy algorithm gives a -approximation, it was only very recently that a -approximation streaming algorithm was given for weighted matchings [PS17]. The algorithm of Paz and Schwartzman is based on the local ratio technique, which we now describe333The description of the local-ratio technique is adapted from a recent grant proposal submitted to the Swiss National Science Foundation by the last author.. On an input graph with edge-weights , the following simple local-ratio algorithm is known to return a -approximate weighted matching: Initially, let and for all . For each in an arbitrary order:
if , add to and increase and by .
Finally, obtain a matching by running the basic greedy algorithm on the edges in in the reverse order (i.e., by starting with the edge last added to ).
Since the above algorithm returns a -approximate matching irrespective of the order in which the edges are considered (in the for loop), it may appear immediate to use it in the streaming setting. The issue is that, if the edges arrive in an adversarial order, we may add all the edges to . For dense graphs, this would lead to a memory consumption of instead of the wanted memory usage which is (roughly) linear in the output size. The main technical challenge in [PS17] is to limit the number of edges added to ; this is why that algorithm obtains a -approximation, for any , instead of a -approximation.
McGregor and Vorotnikova observed that the technical issue in [PS17] disappears if we assume that edges arrive in a uniformly random order444Sofya Vorotnikova presented this result in the workshop “Communication Complexity and Applications, II (17w5147)” at the Banff International Research Station held in March 2017.. Indeed, we can then use basic probabilistic techniques (see, e.g., the “hiring problem” in [CLRS09]) to show that the expected (over the random arrival order) number of edges added to is . Even better, here we show that, in expectation, the following adaptation still adds only edges to : update the vertex potentials (the ’s) only for, say, of the stream and then, in the remaining of the stream, add all edges for which to (without updating the vertex potentials). This adaptation allows us to prove the following structural result:[hidealllines=true, backgroundcolor=gray!15] In a random-edge-arrival stream, either the local-ratio algorithm already obtains a (close) to -approximate matching after seeing a small fraction of the stream (think ), or the set (in the adaptation that freezes vertex potentials) contains a better than -approximation in the end of the stream.
The above allows us to concentrate on the case when we have a (close) to -approximate matching after seeing only of the stream. We can thus use the remaining to find enough augmenting paths to improve upon the initial -approximation. It is here that our filtering technique is used to reduce the task of finding weighted augmenting paths to unweighted ones. By creftype 1.3, it is sufficient to consider very short augmentations to improve upon an approximation guarantee of . Specifically, the considered augmentations are of two types:
Those consisting of a single edge to add satisfying , where denotes the weight of the edge of incident to vertex (and if no such edge exists)555To make sure that the weight of the matching increases significantly by an augmentation, the strict inequality needs to be satisfied with a slack. We avoid this technicality in the overview..
Those consisting of two new edges and that form a path or a cycle with at most three edges and , i.e., adding and removing increases the weight of the matching.
For concreteness, consider the graph in Fig. 2. The edges in are solid and dashed edges are yet to arrive in the stream. An example of the first type of augmentations is to add (and remove and ) which results in a gain because . Two examples of the second type of augmentations are the path and the cycle .
The augmentations of the first type are easy to find in a greedy manner. For the second type, we now describe how to use our filtering technique to reduce the problem to that of finding length three unweighted augmenting paths. Let Unw-3-Aug-Paths be a streaming algorithm for finding such unweighted augmenting paths. We first initialize Unw-3-Aug-Paths with a (random) matching obtained by including each edge in
with probability. As we explain shortly, corresponds to the edges from the second type of augmenting paths. Then, at the arrival of an edge , it is forwarded as an unweighted edge to Unw-3-Aug-Paths if
For an example of the forwarded edges for a specific , see the right part of Fig. 2.
Note that the -values are set so that any augmenting path found by Unw-3-Aug-Paths will also improve the matching in the weighted graph666We remark that there may be short augmentations that are beneficial in the weighted sense that are never present in the graph forwarded to Unw-3-Aug-Paths regardless of the choice of . An example would be with and . In this case, is not forwarded to Unw-3-Aug-Paths due to the filtering if ; and, in the other choices of , is not a length three unweighted augmenting path. However, as we prove in Section 3, those augmentations are safe to ignore in our goal to beat the approximation guarantee of . . Indeed, suppose that Unw-3-Aug-Paths finds the length three augmenting path where . Let and be the other edges in incident to and (if they exist). Then, by the selection of the -values, we have
as required. Hence, the -values are set so as to guarantee that the augmenting paths will improve the weighted matching if applied.
The reason for the random selection of is to make sure that any such beneficial weighted augmenting path is present as an unweighted augmenting path in the graph given to Unw-3-Aug-Paths with probability at least . This guarantees that there will be (in expectation) many length three unweighted augmenting paths corresponding to weighted augmentations (assuming the initial matching is no better than -approximate).
This completes the high level description of our single-pass streaming algorithm except for the following omission: all unweighted augmenting paths are equally beneficial while their weighted contributions may differ drastically. This may result in a situation where Unw-3-Aug-Paths returns a constant-fraction of the unweighted augmenting paths that have little value in the weighted graph. The solution is simple: we partition into weight classes by geometric grouping, run Unw-3-Aug-Paths for each weight class in parallel, and then select vertex-disjoint augmenting paths in a greedy fashion starting with the augmenting paths in the largest weight class. This ensures that many unweighted augmenting paths also translates into a significant improvement of the weighted matching. The formal and complete description of these techniques are given in Section 3.
1.1.2 Multi-pass streaming and MPC
In our approach for single-pass streaming, it was crucial to have an algorithm (local-ratio with frozen vertex potentials) that allowed us to reduce the problem to that of finding augmenting paths to a matching that is already (close) to -approximate. This is because, in a single-pass streaming setting, we can find a limited amount of augmenting paths leading to a limited improvement over the initial matching.
In multi-pass streaming and MPC, the setting is somewhat different. On the one hand, the above difficulty disappears because we can repeatedly find augmentations. In fact, we can even start with the empty matching. On the other hand, we now aim for the much stronger approximation guarantee of for any fixed . This results in a more complex filtering step as we now need to find augmenting paths and cycles of arbitrary length (depending on ). We remark that the challenge of finding long augmenting cycles is one of the difficulties that appears in the weighted case where previous techniques do not apply [McG05, AG13]. We overcome this and other challenges by giving a general reduction to the unweighted matching problem, which can be informally stated as follows:[hidealllines=true, backgroundcolor=gray!15] Let be the current matching and be an optimal matching of maximum weight. If then an -approximation algorithm for the unweighted matching problem on bipartite graphs can be used to find a collection of vertex-disjoint augmentations that in expectation increases the weight of by .
The reduction itself is efficient and can easily be implemented both in the multi-pass streaming and MPC models by incurring only a constant overhead in the complexity. Using the best-known approximation algorithms for the unweighted matching problem on bipartite graphs in these models then yields Theorem 1.2 by repeating the above times after starting with the empty matching .
[hidealllines=true, backgroundcolor=gray!15] Suppose the current matching satisfies . Then there must exist a collection of short (each consisting of edges) vertex-disjoint augmenting paths and cycles with total gain . Moreover, each augmentation has gain at least , i.e., proportional to its total weight.
Our goal now is to find a large fraction of these short weighted augmentations. For this, we first reduce the problem to that of finding such augmentations with for some fixed . This is similar to the concept of weight classes mentioned in the previous section and corresponds to the notion of augmentation classes in Section 4. Note that, by standard geometric grouping, we can reduce the number of choices of to be at most logarithmic. We can thus afford to run our algorithm for all choices of in parallel and then greedily select the augmentations starting with those of the highest weight augmentation class.
Now, for each augmentation class (i.e., for each choice of ), we give a reduction from finding weighted augmentations to finding unweighted ones by constructing a set of tailored graphs. This construction resembles some of the ideas used in the construction of [McG05], but they are not the same. The intuition behind our construction is as follows. Suppose that, for a fixed , we aim to find augmenting paths of length in the input graph . Then, as depicted in Fig. 3, we construct a new layered graph consisting of layers of vertices, (each layer is a copy of ), where the edge set of each layer consists of a subset of the edges in the current matching and the edges between layers are subsets of . The construction of is so that if we consider an alternating path in where is an edge in layer and is an edge between layer and , then, assuming they all correspond to distinct edges in , we can augment with to obtain the new matching . Moreover, the augmentation improves the matching, i.e., satisfies , if
To ensure that any alternating path in the unweighted graph satisfies (1) we use our filtering technique. For each layer , we have a parameter that filters the edges in that layer: we keep an edge in layer only if rounded up to the closest multiple of equals . Similarly, we have a parameter for each , and we keep an edge between layer and only if rounded down to the closest multiple of equals . Now by considering only those -values satisfying , we ensure that any augmenting path that is found improves the matching, i.e., (1) holds. Moreover, the rounding of edge-weights in the filtering step still keeps large (by weight) fraction of the augmentations in the original graph as the rounding error, which is less than for each edge, is very small compared to the length and total gain of the structural augmentations that we are looking for. It is thus enough to find the augmentations corresponding to each fixation of and -values. To bound the number of choices, note that we may assume that each -value is such that is a multiple of between and . Hence, as we need to consider augmentations of length only, we have, for a fixed and , that the total number of choices of and -values is a constant. They can thus all be considered in parallel. For each of these choices, we use the approximation algorithm for unweighted matchings to find a -approximate maximum unweighted matching in the corresponding layered graph and take the symmetric difference with the initial matched edges to find the desired unweighted augmentations. These augmentations are then translated back to weighted augmentations in the original graph.
Note that, unlike McGregor’s layered graphs, our layered graphs allow edges (both matched and unmatched) to be repeated in different layers, which is crucial in identifying weighted augmenting cycles. Furthermore, edges in each layer are filtered with respect to a given edge-weight arrangement, that ensures that the augmenting paths in our layered graphs correspond to weighted augmentations with positive gain. These differences result from the different purposes of the two constructions: McGregor’s construction aims to find unweighted augmenting paths efficiently, whereas our purpose is to reduce weighted augmentations to unweighted ones.
While, on a high level, this completes the description of our reduction, there are many interesting technical challenges to overcome. In the remaining part of this overview, we highlight two of these challenges.
Translating augmenting paths in layered graph to the original graph
From our high level description of the layered graph , there is no guarantee that an augmenting path in it corresponds to an augmentation with a positive gain in the original graph . First, there is no reason that an augmenting path in visits the layers from left-to-right as intended. In the formal definition of layered graphs (see Section 4.3), we take care of this and make sure777To be completely accurate, the edges and may not appear in the alternating path: does not appear if the vertex incident to in the first layer is not incident to a filtered edge in ; the case of is analogous. that any unweighted augmenting path in corresponds to an alternating path of the form , where is an edge in layer and is an edge between layer and . Intuitively, such an alternating path can be made an unweighted augmenting path by discarding the matching edges of the first and last layers. However, a second and more challenging issue is that such an alternating path (going from the left to the right layer) may contain repeated edges and thus do not correspond to an augmentation in . An example of this phenomena is as follows:
Here, we depict the weighted graph on the left and the “incorrect” layered graph to the right with and . The weighted graph has an augmentation that adds and removes and improves the weight of the matching by one. This augmentation is also present in the layered graph. However, an equally good augmentation in that graph from an unweighted perspective corresponds to the alternating path depicted in bold. In the original graph the bold edge set corresponds to the non-simple path . Such a non-simple path clearly does not correspond to an augmentation and, even worse, there is no augmentation with a positive gain in the support of the considered path.
Our main idea to overcome this issue is as follows. We first select a random bipartition and of the vertex set of . Then between two layers and , we keep only those edges that go from an -vertex in layer to an -vertex in layer . We emphasize that the edges going from an L-vertex to an R-vertex between two layers are not kept. For example, if we let and in the considered example then the layered graph (with the same -values) becomes:
In this example, the remaining alternating path that visits all layers (in the formal proof we further refine the layered graph to make sure that these are the only paths that are considered) corresponds to the augmentation in . However, in general, an alternating path may still not correspond to a simple path and an augmentation in since it may contain repetitions. However, the bipartition and the refinement of the layered graph can be seen to introduce an “orientation” of the edges in . This together with standard Eulerian techniques of directed graphs allow us to prove that any alternating path in the layered graph can be decomposed into a collection of alternating even-length cycles and an alternating path in , one of which is also augmenting. Finally, let us remark that the idea to consider a bipartition and of the vertex set of and to allow only those edges that are from an -vertex to an -vertex between consecutive layers has the additional benefit that the layered graph becomes bipartite. This is the reason that our reduction is from weighted matchings in general graphs to unweighted matchings in bipartite graphs.
Finding augmenting cycles
In the unweighted setting, matching algorithms do not have to consider cycles because alternating cycles cannot augment an existing matching. In contrast, algorithms for the weighted setting (at least the ones that try to iteratively improve an initial matching) have to somehow deal with augmenting cycles; weighted graphs can have perfect (unweighted) matchings whose weights are not close to the optimal and that can be improved only through augmenting cycles. For example, consider a 4-cycle with edge weights , where the edges of weight form an initial perfect matching of weight , but the optimal matching consists of edges of weight and has a total weight of . The only way to augment the weight here is to consider the whole cycle. The crucial property of our reduction is its ability to transform not only weighted augmenting paths, but also weighted augmenting cycles of the original graph into augmenting paths in the layered graphs.
Before explaining our solution, let us take a closer look at the above -cycle example. Let the edges of the -cycle be where is the current matching. Note that the cycle can be represented as an alternating path in the layered graph using three layers (consisting of the three edges of the matching with repeated once). However, such a representation of the augmenting cycle cannot be captured by our filtering technique due to the constraint which ensures that any alternating path in the layered graph can be translated into a weighted augmentation. The reason being that for to be present in the layered graph we would need for , and for which would contradict the above inequality. This approach is therefore not sufficient to find augmenting cycles and achieve a approximation guarantee. Specifically, the issue is due to the fact that we account for the edge weight of twice in the filtering process, once for and once more for . To overcome this issue, consider the -cycle with more general weights , where taking in place of gives an fractional gain in weight. What we need is to make sure that, even if we account for the same edge (or ) twice, the alternating path we get in the layered graph (“corresponding” to the cycle) is still gainful. For this, we blow-up the cycle length by repeating the same cycle times. I.e., we consider the cycle
Since we have repeated the edges many times, their gains add up so that it can account for the weight of considering one additional time. The considered cycle of length is thus present as a “repeated” alternating path in the layered graph (with the appropriate -values and bipartition) consisting of layers. In general, to make sure that we can find augmenting cycles of length we will consider the layered graph with up to layers.
1.2 Further Related Work
There is a large body of work devoted to (semi-)streaming algorithms for the maximum matching problem. For unweighted graphs, the basic greedy approach yields a -approximation, and for weighted graphs [PS17] recently gave a -approximation based on the local ratio technique. These are the best known algorithms that take a single pass over an adversarially ordered stream. Better algorithms are known if the stream is randomly ordered or if the algorithm can take multiple passes through the stream. In the random-edge-arrival case, [KMM12] first improved upon the approximation guarantee of in the unweighted case. Our results give better guarantees in that setting and also applies to the weighted setting. When considering multi-pass algorithms, [McG05] gave a -approximation algorithm using passes. Complementing this, [AG13] gave a deterministic -approximation algorithm using passes. As for hardness results, [Kap13] showed that no algorithm can achieve a better approximation guarantee than in the adversarial single pass streaming setting.
The study of algorithms for matchings in models of parallel computation dates back to the eighties. A seminal work of Luby [Lub86] shows how to construct a maximal independent set in PRAM rounds. When this algorithm is applied to the line graph of , it outputs a maximal matching of . Similar results, also in the context of PRAM, were obtained in [ABI86, II86, IS86].
Perfect maximum matchings were also a subject of study in the context of PRAM. In [Lov79] it is shown that the decision variant is in RNC. That implies that there is a PRAM algorithm that in rounds decides whether a graph has a perfect matching or not. [KUW86] were the first to prove that constructing perfect matchings is also in RNC. In [MVV87] the same result was proved, and they also introduced the isolation lemma that had a great impact on many other problems.
In [KSV10, GSZ11] it was shown that it is often possible to simulate one PRAM in MPC rounds with memory per machine, for any constant . This implies that the aforementioned PRAM results lead to MPC round complexity algorithms for computing maximal matchings. [LMSV11] developed an algorithm that computes maximal matchings in the MPC model in rounds when the memory per machine is , for any constant . In the regime of memory per machine, the algorithm given in [LMSV11] requires MPC rounds of computation. Another line of work focused on improving this round complexity. Namely, [CŁM18] and [ABB19, GGK18] show how to compute a constant-factor approximation of maximum unweighted matching in and MPC rounds, respectively, when the memory per machine is . As noted in [CŁM18], any -approximation algorithm for maximum unweighted matchings can be turned into a -approximation algorithm for weighted matchings by using the approach described in Section 4 of [LPP15]. This transformation increases the round complexity by .
In the regime of memory per machine, for any constant , a recent work [BFU18] shows how to find maximal matchings in rounds for graphs of arboricity . Also in this regime, [GU19] and [Ona18] provide algorithms for constructing maximal matchings for general graphs in MPC rounds. The algorithm of [GU19] requires and the algorithm of [Ona18] requires total memory.
We formalize the streaming and the MPC model now.
The (semi)-streaming model
The (semi-)streaming model for graph problems was introduced by Feigenbaum et al. [FKM05]. In this model, the edges of the input graph arrive one-by-one in the stream, and the algorithm is allowed to use memory at any time and may go over the stream one (single-pass) or more (multi-pass) times. Note that memory is needed just to store a valid matching.
The MPC model
The MPC model was introduced in [KSV10] and refined in later work [GSZ11, BKS13, ANOY14]. In this model, the computation is performed in synchronous rounds by machines. Each machine has bits of memory. At the beginning of a round the data, e.g., a graph, is partitioned across the machines with each machine receiving at most bits. During a round, each machine processes the received data locally. After the local computation on all the machines is over, each machine outputs messages of the total size being at most bits. The output of one round is used to guide the computation in the next round. In this model, each machine can send messages to any other machine, as long as at most bits are sent and received by each machine.
Let be the input graph. A natural assumption is that , i.e., it is possible to partition the entire graph across the machines. We do not assume any structure on how the graph is partitioned across the machines before the computation begins. In our work, we assume that . Furthermore, we consider the regime in which the memory per machine is nearly-linear in the vertex set, i.e., .
In the rest of this work, we show how to construct a -approximate maximum weighted matching. Edges that are in the matching will be appropriately tagged and potentially spread across multiple machines. These tags can be used to deliver all the matching edges to the same machine in MPC rounds.
Computation vs. communication complexity:
In this model, the complexity is measured by the number of rounds needed to execute a given algorithm. Although the computation complexity is de-emphasized in the MPC model, we note that our algorithms run in nearly-linear time.
3 Weighted Matching when Edges Arrive in a Random Order
In this section, we present a -approximation (semi-)streaming algorithm for the maximum weighted matching (MWM) problem in the random-edge-arrival setting, where is an absolute constant, thus proving Theorem 1.1. Our result computes a large weighted matching using unweighted augmentations. In that spirit, we provide the following lemma that gives us the streaming algorithm for unweighted augmentations.
There exists an unweighted streaming algorithm Unw-3-Aug-Paths with the following properties:
The algorithm is initialized with a matching and a parameter . Afterwards, a set of edges is fed to the algorithm one edge at a time.
Given that contains at least vertex disjoint -augmenting paths, the algorithm returns a set of at least vertex disjoint -augmenting paths. The algorithm uses space .
We mentioned in the introduction that, for an effective weighted-to-unweighted reduction in the streaming model, it is important to start with a “good” approximate matching so that we can augment it using -augmentations afterwards. We demonstrate these ideas on unweighted matchings first (Section 3.1), and show that they lead to an improved approximation ratio for both general and bipartite graphs. Later, in Section 3.2, we study these ideas in the context of weighted matchings.
3.1 Demonstration of Our Technique via Unweighted Matching
We give an algorithm that makes one pass over a uniformly random edge stream of a graph and computes a -approximate maximum unweighted matching. For the special case of triangle-free graphs (which includes bipartite graphs), we give a better analysis to get a -approximation.
We denote the input graph by , and use to indicate a matching of maximum cardinality. Assume that and a maximal matching are given. For , a connected component of that is a path of length is called an -augmenting path (the component is called nonaugmenting otherwise). We say that an edge in is -augmentable if it belongs to a -augmenting path, otherwise we say that it is non--augmentable. Also, for a vertex , let be ’s neighbor set, and for , let denote ’s neighbor set in the edges in the graph .
Lemma 3.2 (Lemma 1 in [Kmm12]).
Let , be a maximal matching in , and be a maximum unweighted matching in such that . Then the number of -augmentable edges in is at least , and the number of non--augmentable edges in is at most .
We give the proof in the appendix for completeness. See Appendix A. ∎
The algorithm is as follows. Compute a maximal matching on initial (which we will set later) fraction of the stream. Then we run three algorithms in parallel on the remaining fraction of the stream. In the first, we store all the edges into the variable that are among vertices left unmatched by . In the end, we augment by adding a maximum unweighted matching in . In the second, we continue to grow greedily to get . In the third, to get -augmentations with respect to , we invoke the Unw-3-Aug-Paths algorithm from Lemma 3.1 that accepts a matching and a stream of edges that contains augmenting paths of length with respect to . In this way we obtain a set of vertex disjoint -augmenting paths, which we then use to augment . We return the best of the three algorithms
It is clear that the second and the third algorithm use space. The following lemma shows that the first algorithm uses space.
With high probability it holds that .
Fix a vertex . Define to be the event that after processing edges from the stream it holds: is unmatched, and at least neighbors of are still unmatched. We will show that , after which the proof follows by union bound over all the vertices. We have
Therefore, as desired.
We divide the analysis of approximation ratio into two cases.
Case 1. :
Each edge of can intersect with at most two edges of , hence contains at least edges of that can be added to to get a matching of size at least .
Case 2. :
If , we are done, so assume that . In the second algorithm, is the maximal matching at the end of the stream. If , we are done, otherwise, by Lemma 3.2, there are at least -augmentable edges in , i.e., there are at least -augmentable edges in ; denote this set of edges by . In expectation, for at least fraction of , both the edges incident to them appear in the latter
fraction of the stream. This can be seen by having one indicator random variable per edge indenoting whether two edges incident on that edge appear in the latter fraction of the stream. Then we condition on the event that , which implies that has two edges, say and , incident on it. Since was added to the greedy matching , both and must appear after . Any of and appears in the latter fraction on the stream with probability under this conditioning. Then, by union bound, with probability at least both and appear in the latter fraction of the stream. Then we apply linearity of expectation over the sum of the indicator random variables.
Now, by Lemma 3.1, using , we recover at least augmenting paths in expectation. Using , after algebraic simplification, we get that the output size is at least , i.e., at least . Letting implies that our algorithm outputs a -approximate maximum unweighted matching, i.e., -approximation for .
For random-order edge-streams, there is a one-pass -space algorithm that computes a -approximation to maximum unweighted matching in expectation.
3.2 An Algorithm for Weighted Matching
Now we discuss the more general weighted case.
Let be a weighted graph with vertices and edges, and assume that the edges in are revealed to the algorithm in a uniformly random order. We further assume that the edge weights are positive integers and the maximum edge weight is . Let be a fixed maximum weighted matching in . For any matching of and a vertex , let denote the edge adjacent to the vertex in the matching . If some vertex is unmatched in , we assume that is connected to some artificial vertex with a zero-weight edge, whenever we use the notation .
Similarly to the algorithm in Section 3.1, we start by computing a -approximate maximum weighted matching within the first fraction of the edges () using the local-ratio technique. We recall this technique next. We consider each incoming edge , and as long as it has a positive weight, we push it into a stack and subtract its weight from each of the remaining edges incident to any of its endpoints and . To implement this approach in the streaming setting, for each vertex , we maintain a vertex potential . The potential tells how much weight should be subtracted from each incoming edge that is incident to . After running the local-ratio algorithm for the first fraction of the edges, computing greedily by popping the edges from the stack gives a -approximate matching for that portion of the stream. This is proved using local-ratio theorem (see the work of Paz and Schwartzman [PS17]). We also freeze the vertex potentials at this point.
Analogous to the unweighted case, we have three possible scenarios for :
In the best case, and we are done.
The weight , in which case we have only seen at most worth optimal matching edges so far, and the rest of the stream contains at least weight that can be added on top of .
This corresponds to having a large fraction of unmatched vertices in the unweighted case, where we could afford to store all the edges incident to those vertices and compute a maximum unweighted matching that did not conflict with . In the weighted case, we keep all edges in the second part of the stream that satisfy , where and are the frozen vertex potentials after seeing the first fraction of the edges. Note that we continue to keep the vertex potential frozen. (Think of the unmatched vertices in the unweighted case as vertices with zero potential.) Again using the random-edge-arrival property, we show that the number of such edges that we will have to store is small with high probability. At the end of the stream, we use an (exact) maximum matching on those edges together with the edges in the local-ratio stack from the first fraction of the stream to construct a matching.
The weight of the matching is between and . In the analogous unweighted case, we did two things. We continued to maintain a greedy matching (on unmatched vertices), and we tried to find augmenting paths of length three. For the weighted case we proceed similarly: We continue to compute a constant factor approximate matching for those edges such that , and akin to the unweighted -augmentations, we try to find the weighted -augmentations.
For the latter task, we randomly choose (guess) a set of edges from that we consider as the middle edges of weighted -augmentations. Here, by a weighted -augmentation, we mean a quintuple of edges that increase the weight of the matching when the edges , and are removed from , and the edges and are added to . (Although these are length five augmenting paths, we call them -augmentations because we reduce the problem of finding those to the problem of finding length three unweighted augmenting paths.) We partition the chosen middle edges into weight classes defined in terms of geometrically increasing weights, and for each of the weight classes we find -augmentations using an algorithm that finds unweighted -augmenting paths as a black-box.
Before we proceed to the complete algorithm, we give an algorithm to address the third case described above. In fact, this algorithm is the key contribution of this section: As the title of this paper suggests, this algorithm improves weighted matchings via unweighted augmentations.
3.2.1 Finding Weighted Augmenting Paths
Suppose that we have an initial matching such that . In this section, we describe how to augment using -augmentations to get an increase of weight that results in a matching of weight at least . To achieve this, in a black-box manner we use the algorithm Unw-3-Aug-Paths whose existence is guaranteed by Lemma 3.1.
Let be the set of edges whose weight is in the range , and let be the index such that . Thus (recall that the edge weights are positive integers and the maximum edge weight is , and any edge belongs to exactly one ). We refer to ’s as weight classes.
As described earlier, we would like to find both weighted -augmentations (i.e., single edges that could replace two incident edges in the current matching and give a significant gain in weight), and weighted -augmentations. We now give the outline of our algorithm, Wgt-Aug-Paths, in Algorithm 1 using the object-oriented notation, and we explain its usage and intuition behind its design below.