 # Stochastic Matching with Few Queries: New Algorithms and Tools

We consider the following stochastic matching problem on both weighted and unweighted graphs: A graph G(V, E) along with a parameter p ∈ (0, 1) is given in the input. Each edge of G is realized independently with probability p. The goal is to select a degree bounded (dependent only on p) subgraph H of G such that the expected size/weight of maximum realized matching of H is close to that of G. This model of stochastic matching has attracted significant attention over the recent years due to its various applications. The most fundamental open question is the best approximation factor achievable for such algorithms that, in the literature, are referred to as non-adaptive algorithms. Prior work has identified breaking (near) half-approximation as a barrier for both weighted and unweighted graphs. Our main results are as follows: -- We analyze a simple and clean algorithm and show that for unweighted graphs, it finds an (almost) 4√(2)-5 (≈ 0.6568) approximation by querying O( (1/p)/p) edges per vertex. This improves over the state-of-the-art 0.5001 approximate algorithm of Assadi et al. [EC'17]. -- We show that the same algorithm achieves a 0.501 approximation for weighted graphs by querying O( (1/p)/p) edges per vertex. This is the first algorithm to break 0.5 approximation barrier for weighted graphs. It also improves the per-vertex queries of the state-of-the-art by Yamaguchi and Maehara [SODA'18] and Behnezhad and Reyhani [EC'18]. Our algorithms are fundamentally different from prior works, yet are very simple and natural. For the analysis, we introduce a number of procedures that construct heavy fractional matchings. We consider the new algorithms and our analytical tools to be the main contributions of this paper.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

We consider the following stochastic matching problem on both weighted and unweighted graphs. In its most general form, an edge-weighted graph along with a parameter is given in the input and each edge of is realized independently with probability . We are unaware of the edge realizations yet our goal is to find a heavy realized matching. To do this, we can select a degree-bounded (i.e., dependent only on ) subgraph of , query all of its edges simultaneously, and report its maximum realized matching. Denoting the expected weight of the maximum realized matching of any subgraph of by , the goal is choose such that it maximizes — which is also known as the approximation factor.

The restriction on the number of queries per vertex comes from the fact that the querying process is often time consuming and/or expensive in the applications of stochastic matching. Without this restriction, the solution is trivial as one can simply query all the edges of and report the maximum matching among those that are realized.

The algorithms in this setting are categorized as non-adaptive since they query all the edges simultaneously without any prior knowledge about the realizations. In contrast, adaptive algorithms have multiple rounds of adaptivity and the queries conducted at each round can depend on the outcome of the prior queries. Non-adaptive algorithms are considered practically more desirable since the queries are not stalled behind each other. In fact, one can see a non-adaptive algorithm as an adaptive algorithm that is restricted to have only one round of adaptivity; therefore, it is not hard to see that it is generally much more complicated to design and analyze non-adaptive algorithms.

While -approximate adaptive algorithms are known, even for weighted graphs, the literature has identified breaking half approximation to be a barrier for non-adaptive algorithms [BDH15, AKL16, AKL17, YM18, BR18]. Prior to our work, no such algorithm was known for weighted graphs and even for unweighted graphs, the state-of-the-art non-adaptive algorithm of Assadi et al. [AKL17] achieves only a slightly better approximation factor of .

We introduce new algorithms and techniques to bypass these bounds. For unweighted graphs, we achieve a 0.6568 approximation and show that the same algorithm bypasses 0.5 approximation for weighted graphs. In both algorithms, we query only edges per-vertex. These results answer several open questions of the literature that we elaborate more on in the forthcoming paragraphs. Apart from the approximation factor, it is not hard to see that any algorithm achieving a constant approximation has to query edges per vertex (see e.g., [AKL16]). As such, the number of per-vertex queries conducted by our algorithms is optimal up to a factor of .

#### Prior work.

The stochastic matching problem has been intensively studied during the past decade due to its diverse applications from kidney exchange to labor markets and online dating (we overview these applications in Section 1.1). Directly related to the setting that we consider are the papers by Blum et al. [BDH15] (which introduced this variant of stochastic matching), Assadi et al. [AKL16, AKL17], Yamaguchi and Maehara [YM18], and Behnezhad and Reyhani [BR18]. Table 1 gives a brief survey of known results due to these papers as well as a comparison to our results. We give a more detailed description of the main differences below.

Blum et al. introduced the following algorithm:

[width=enhanced, boxsep=2pt, left=1pt, right=1pt, top=4pt, boxrule=1pt, arc=0pt, colback=white, colframe=black, breakable] Algorithm ([BDH15]): Pick a maximum matching from and remove all of its edges. Repeat this for iterations, then query the edges in simultaneously and report the maximum realized matching among them.

It is easy to see that , in Algorithm , determines the per-vertex queries. This means that it suffices to argue that a small value for is sufficient to get our desired approximation factors. Blum et al. [BDH15] showed that for unweighted graphs, setting is sufficient to get a approximation. Interestingly, the follow-up results were achieved by the same algorithms (with minor changes) and differed mainly in the analysis. Assadi et al. [AKL16] showed that setting suffices to achieve a approximation improving the exponential dependence on .222The algorithm of Assadi et al. [AKL16] also incorporates a sparsification step to ensure . Yamaguchi and Maehara [YM18] generalized these results to weighted graphs.333The generalization of Blum et al.’s algorithm to weighted graphs is simply to pick maximum weighted matchings in each round/iteration. They showed that it suffices to set to achieve the same approximation factor of where denotes the maximum integer edge weight. Behnezhad and Reyhani [BR18] further showed that the same approximation factor of can be achieved for weighted graphs by setting . While this removes the dependence on and , making the bound a constant, it has a worse dependence on than that of [YM18].

Observe that the approximation factor of all the algorithms mentioned above is the same. The only exception in the literature is the algorithm of Assadi et al. [AKL17] which achieves a approximation for unweighted graphs. Their algorithm first extracts a large -matching (which depends on the expected size of the realized matching) from the graph and then applies Algorithm  on the remaining graph. They interestingly show that the edges chosen by Algorithm  can be used to augment the realized matching among the edges of the -matching which leads to bypassing the half approximation barrier for unweighted graphs.

#### Our contribution.

Despite the theoretical guarantees of the literature for Algorithm , it has its drawbacks. Blum et al. [BDH14, Theorem 5.2] give examples on which it does not achieve better than a approximation. It also seems notoriously difficult (if not impossible) to analyze anything better than a approximation for Algorithm  alone. We consider another algorithm which is also very simple and natural: [width=enhanced, boxsep=2pt, left=1pt, right=1pt, top=4pt, boxrule=1pt, arc=0pt, colback=white, colframe=black, breakable] Algorithm (Formally as Algorithm 1): First draw realizations of independently. Then from each of these realizations , pick a maximum (weighted) matching . Finally, query the edges that appear in simultaneously and report the maximum realized matching among them.

Similar to Algorithm , here determines the number of per-vertex queries. We analyze Algorithm  for both weighted and unweighted graphs.

[width=enhanced, frame hidden, boxsep=3pt, left=1pt, right=1pt, top=1pt, boxrule=1pt, arc=0pt, colback=mylightgray, colframe=black, breakable ]

###### Result 1 (formally as Theorem 6.2).

For , Algorithm  achieves a approximation on weighted graphs.

Result 1 implies the first non-adaptive algorithm that breaks the approximation barrier for weighted graphs. The number of per-vertex queries of this result also improves that of approximations of [YM18] and [BR18].

[width=enhanced, frame hidden, boxsep=3pt, left=1pt, right=1pt, top=1pt, boxrule=1pt, arc=0pt, colback=mylightgray, colframe=black, breakable ]

###### Result 2 (formally as Theorem 5.3).

For , Algorithm  achieves a approximation on unweighted graphs.

Result 2 improves over the state-of-the-art approximate algorithm of Assadi et al. [AKL17].444For the case of unweighted graphs, in an independent work, Assadi and Bernstein [AB] give an (almost) approximation which is slightly better than our factor. Their algorithm, however, is highly tailored for unweighted graphs and gives no guarantee for the weighted case.

In our analysis, we devise different procedures, that given query outcomes, they construct large fractional matchings over the realized edges. Then based on the size of this fractional matching, we get that there must also be a large integral realized matching. We give more high-level ideas and intuitions about these procedures in Section 3.

### 1.1 Applications

The stochastic matching problem has a wide range of applications from kidney exchange to labor markets and online dating. In all these applications, the goal is to find a large (or heavy) matching and the main bottleneck is determining which edges exist in the graph. We overview some of these applications below.

#### Kidney exchange.

Transplant of a kidney from a living donor is possible if the recipient (patient) happens to be medically compatible with his/her donor. This is not always the case, however, kidney exchange provides a way to overcome this. In its simplest form with pairwise exchanges, two incompatible donor/patient pairs can exchange kidneys. That is, the donor of the first pair donates kidney to the patient of the second pair and vice versa. This gives rise to the notion of a compatibility graph where we have one vertex for each incompatible donor/patient pair and each edge determines the possibility of an exchange. Therefore, the pairwise exchanges that take place can be expressed as a matching of this graph. There is, however, one crucial problem. The medical records of the patients such as their blood- or tissue-types only rule out a subset of incompatibilities. For the rest, we need more accurate medical tests that are both costly and time consuming.

The stochastic matching setting helps in finding a large matching among the pairs who also pass the extra tests while conducting very few medical tests per pair. There is a rich literature on such algorithmic approaches for kidney exchange particularly in stochastic settings [ALG14, AAGK15, AAGR15, AS09, DPS12, DPS13, DS15, MO14, Ünv10]. We refer interested readers to the paper of [BDH15] for a more detailed discussion about the application of stochastic matching in kidney exchange.

#### Online labor markets.

Online labor markets facilitate working relationships between freelancers and employers. In such platforms, it quite often happens that the users (from either party) have more options than they can consider. We can represent this with a bipartite graph with freelancers on one side and employers on the other. The edges of the compatibility graph, again, determine possible matches. While the initial job descriptions rule out some of the edges, it is after an interview between an employer and the freelancer that they decide whether to work with each other. Stochastic matching, for such platforms, can be used to recommend interviews. This way, we ensure that with very few interviews, most of the users will find a desired match.

#### Further related work.

Multiple variants of stochastic matching have been considered by prior work. A well-studied setting, first introduced by Chen et al. [CIK09], is the query-commit model. In this model, the queried edges that happen to be realized have to be included in the final matching [Ada11, BGL12, CIK09, CTT12, GN13]. Another related setting is the model of [BGPS13] which allows to query only two edges per vertex. We refer to [BDH15] for a more extensive overview of other models relevant to the one we consider.

## 2 Preliminaries

#### Notation.

For any edge set , we denote by the weight of the maximum weighted matching in . We may also abuse notation throughout the paper and use to refer to the set of edges in the maximum weighted matching of . When it is clear from the context, we may use maximum matching instead of maximum weighted matching. For any , we use to denote the induced subgraph of over .

### 2.1 The Model of Stochastic Matching

We are given a graph with edge weights along with a fixed parameter . Each of the edges in is realized independently from other edges with probability . The realized graph includes an edge if and only if it is realized. We are not initially aware of the realized graph . Our goal, however, is to compute a heavy matching of . To do so, we can query each edge in and the outcome is whether the edge is realized.

For any , we denote by the expected weight of the realized matching in . The benchmark in the stochastic matching problem is the omniscient optimum matching , which we also denote by opt. A non-adaptive algorithm in this setting, has to pick a degree-bounded (dependent only on ) subgraph of such that , which determines the approximation factor, is maximized. If the algorithm is randomized, which is the case in our paper, it should succeed with high probability.555We note that throughout the paper, for simplicity, we analyze the approximation factor of our algorithms in expectation. However, it is easy to boost the success probability to by running several instances of the algorithm to obtain candidate solutions , and then reporting as the solution.

### 2.2 Background on the Matching Polytope

Fix a graph

. A vector

is a fractional matching of if for any , we have and for any we have . An integral matching can be seen as a fractional matching where for any we have . The matching polytope of , is the convex hull of all integral matchings of represented as above. Edmonds [Edm65] showed in 1965 that

is the solution set of linear program:

 xe≥0 ∀e∈E xv≤1 ∀v∈V x(U)≤⌊|U|/2⌋ ∀U⊆V with odd |U|

where denotes . Note that the first two constraints only ensure that is a valid fractional matching. Constraints of the third type are known as blossom inequalities. A corollary of Edmond’s theorem is the following:

###### Corollary 2.1.

Let be a fractional matching of an edge weighted graph that satisfies blossom inequalities, i.e., . Then has an integral matching where .

We can even relax the blossom inequalities and consider only subsets of size at most , and ensure that the weight of no fractional matching exceeds maximum weight of integral matchings by a larger than factor. This is captured by the following folklore lemma.

###### Lemma 2.2 (folklore).

Let be a fractional matching of an edge weighted graph where for any with , it satisfies . Then has an integral matching where .

###### Proof sketch.

Define . Since for any , one can show easily that satisfies all blossom inequalities. Therefore, by Corollary 2.1, there must exist an integral matching of weight at least that of which by definition is . ∎

We refer interested readers to Section 25.2 of [Sch03] for a comprehensive overview of the matching polytope.

## 3 Technical Overview

To give an intuition about the true differences between our algorithm (Algorithm ) and the standard non-adaptive algorithm of the literature (Algorithm ), we start by restating the bad example of Blum et al. [BDH14, Theorem 5.2] for Algorithm  and describing how Algorithm  overcomes it. We then proceed to give intuitions on how we analyze the performance of Algorithm .

#### A comparison of Algorithm A and Algorithm B.

Consider the graph of Figure 1-(a) whose vertex set is partitioned into six subsets , , , , , and , each of size . The edge set of the graph contains complete bipartite graphs between pairs , , , and and perfect matchings between pairs and . Assume also that the realization probability is . Figure 1: Figure (a) illustrates the input graph. Figure (b) illustrates a potential subset of queried edges by Algorithm A. Figure (c) illustrates the expected structure of queried edges of Algorithm B.

It is not hard to confirm that the expected omniscient optimum matching of is an almost perfect matching of size . It suffices to add the realized edges between and to opt which roughly matches half of the vertices of each of these sets in expectation and then find large realized matchings between the remaining vertices and those in and .

Recall that Algorithm  picks an arbitrary maximum matching in each iteration and removes it from the graph. Suppose that these matchings are as follows: The first matching contains the edges in , a perfect matching in , and a perfect matching in . Matching contains the edges in , a perfect matching in , and a perfect matching in . Each of the remaining matchings is the union of a perfect matching in and a perfect matching in . The queried edges by Algorithm  are illustrated in Figure 1-(b). Since for every vertex in or , only two edges are queried and , we expect fraction of these vertices to have no realized queried edges. This means that Algorithm  cannot construct a near perfect matching.

Since Algorithm  incorporates a randomization throughout the process, particularly in choosing realizations from which it picks matchings , bad cases such as the one described above cannot happen. In particular, for the graph of Figure 1, for every vertex in or , in roughly half of the realizations, they are matched to a vertex in and , thus we query edges for each of these vertices and it is not hard to show that for a constant depending only on and , Algorithm  achieves a approximation for this example (see Figure 1-(c)).

#### Roadmap for analyzing Algorithm B.

To convey the main intuitions behind the analysis, we make a few simplifying assumptions. First, assume that the input graph is unweighted. Denote the set of queried edges of Algorithm  by and further denote by those edges in that are realized. Our goal is to show that in expectation, there exists a matching of size in , or in other words, . To do this, by Lemma 2.2, it suffices to show that there exists a fractional matching of size in that also satisfies blossom inequalities. Let us further assume that is bipartite so that any fractional matching satisfies blossom inequalities automatically.

Denote by the probability that edge appears in the omniscient optimum matching.666We assume that given a realization, the edges that belong to the maximum matching are unique. This can be achieved by using a deterministic matching algorithm. Recall that in each iteration of Algorithm , we draw a realization and add its maximum matching to . Therefore, also denotes the probability that we sample edge in each iteration of Algorithm . One can easily confirm that for any vertex , we have . Therefore, one can think of ’s as a fractional matching with some other nice properties. Denote this fractional matching by . The reader soon notices the following useful properties of :

1. [label=(P0)]

2. For any edge , we have .
Proof sketch. Each edge is realized w.p.777Throughout, we use w.p. to abbreviate “with probability”. and thus appears in opt w.p. at most .

3. For any set , the expected matching of has size at least .
Proof sketch. Suffices for each realization of to consider matching .

We set a threshold for a sufficiently small constant and partition into two subsets of crucial edges and non-crucial edges . Figure 2 illustrates the values of over a simple example for which . In this example, each wavy edge on the side that is realized appears in opt, thus they all have and are crucial. The edges in between are significantly less likely to be in opt and for all of them , thus they are all considered non-crucial.

Note that is merely a function of the graph’s structure and is independent of our algorithms. Our goal is to show that within only iterations, Algorithm  achieves our desired guarantee. To do this, we prove two canonical lemmas.

Crucial edges lemma (Formally as Lemma 4.5).   Algorithm  samples almost all crucial edges. Therefore, by (P2), the expected matching has size at least where is any desirably small constant ( and are interdependent).

For non-crucial edges, the argument above does not work. The reason is that, as illustrated in Figure 2, the number of non-crucial edges connected to each vertex can be much more than the maximum degree of (which determines the number of per-vertex queries), thus, we can only sample a small portion of non-crucial edges which means can be arbitrarily smaller than . Instead, we take a different approach for non-crucial edges.

Non-crucial edges lemma (Formally as Lemma 4.7).   One can construct a fractional matching over the realized non-crucial edges of (i.e., over the edges in ) whose size is at least . Moreover, for any vertex , is no more than where we call the non-crucial budget of each vertex.

The precise proof of the non-crucial edges lemma is out of the scope of this section. However, it relies critically on the fact that of non-crucial edges is small. For example, if we use the same technique to construct a fractional matching for the crucial edges, we only end up with a fractional matching of size .

The combination of the two lemmas above immediately implies a approximation. For this, one can easily show that , and thus, either or . For the former case, we can use the crucial edges lemma to argue that we get an almost approximation and for the latter we can use the non-crucial edges lemma. However, as mentioned before, our goal is to provide a much better approximation guarantee than . Therefore, we have to show that the realized portions of the crucial and non-crucial edges can be augmented to construct a much larger matching. To do this, we have to devise more involved procedures that construct large fractional matchings over the realized edges of by combining both crucial and non-crucial edges. Note that these procedures are merely analytical tools and our algorithm is still Algorithm .

For unweighted graphs, the procedure that we use — formalized as Procedure 5 — is roughly as follows: We first use the non-crucial edges lemma to construct a fractional matching of size on the non-crucial edges without “looking” at the realization of crucial edges. Independently, we reveal realized crucial edges, and pick a large realized matching among them.888For technical details, matching is not simply the largest realized matching of crucial edges and has to be drawn according to a specific distribution. See Procedure 5 for more details. Then in our fractional matching , we allocate the maximum possible fractional matching value to the edges in while ensuring that remains a valid fractional matching.

In Theorem 5.3, we give an analysis that shows Procedure 5 in expectation constructs a fractional matching of size . This implies that Algorithm  achieves an (almost) approximation. We note that in the analysis, the second property of non-crucial edges lemma, where we show the non-crucial budget of each vertex is not violated by the constructed fractional matching plays an important role.

While we have no upper bound on the best provable approximation factor for Algorithm , we show that at least for Procedure 5, our analysis is tight. That is, we give an example in Lemma 5.5 for which the fractional matching constructed by Procedure 5 has size no more than .

#### Generalization to weighted graphs.

In generalizing our results to weighted graphs, we follow the same approach in partitioning the edges into crucial and non-crucial subsets. In fact, both the crucial and non-crucial edges lemmas can be adapted seamlessly to the weighted graphs leading to a simple (almost) half approximation as described above. However, we show that a large class of procedures (including Procedure 5) achieve no more than a approximation for weighted graphs. The authors find this strikingly surprising which further highlights the true challenge in beating half approximation for weighted graphs. As a result, the procedure that we use to bypass half approximation for weighted graphs (formalized as Procedure 6) is much more intricate and achieves an approximation factor of only (see Theorem 6.2).

## 4 The Algorithm

In this section, we introduce a non-adaptive algorithm formalized as Algorithm 1 as well as a number of analytical tools that we use in analyzing it for weighted and unweighted graphs. We note that for the sake of brevity, we did not attempt to optimize the constant factors in the description of Algorithm 1.

The main challenge in analyzing Algorithm 1 comes from the fact that the realizations that are picked may be very different from the actual realization of on which the algorithm has to perform well. Take, for instance, the maximum matching of that we add to during the first iteration of Algorithm 1. Since the realization is drawn from the same distribution that the actual realization is drawn from, one can argue that is as large as in expectation. However, the problem is that only fraction of the edges in are expected to appear in . This means that the realized matching found by round 1 guarantees only an approximation factor of which can be arbitrarily small. To achieve our desired approximation factor, we need to argue that the realized edges of can be combined with each other to construct a heavy matching. To show this, we introduce a procedure that constructs a large fractional matching over the realized edges of and use this to argue that there must exist a heavy realized matching among the edges in .

For simplicity of the analysis, we assume that for any realization of , the maximum weighted matching denoted by is unique. This can be guaranteed by either using a deterministic algorithm for finding the matching or initially perturbing the edge weights by sufficiently small factors so that the maximum weighted matching becomes unique. Having this, we start with the following definition.

###### Definition 4.1.

For any edge , we denote by the probability with which appears in the (unique) maximum weighted matching of realization . We refer to as the matching probability of edge . Moreover, for any edge subset , we denote by the sum of matching probabilities of the edges in .

We further use to denote and use to denote . We call (resp. ) the expected matching weight of (resp. ).

Now, based on their matching probabilities, we partition the edges into two sets of crucial and non-crucial edges.

###### Definition 4.2 (Crucial and non-crucial edges).

For threshold , we call any edge with a non-crucial edge and any edge with a crucial edge. We denote by the set of all non-crucial edges in and denote by the set of all crucial edges in .

We start with a couple of simple observations that will help both in gaining more insights on the definitions above and will be useful in our proofs later.

###### Proof.

By definition, we know . Since and , we have . ∎

###### Observation 4.4.

An edge is chosen to be in set by Algorithm 1 with probability exactly .

###### Proof.

In each iteration of Algorithm 1 edge appears in the maximum weighted matching with probability exactly . Since Algorithm 1 is composed of independent iterations (i.e., the realizations picked at different rounds are independent of each other), the probability that edge is not picked in any of these rounds is and therefore it appears in with probability . ∎

As demonstrated by Observation 4.4, the crucial edges have a higher chance of appearing in the sample . In fact, each crucial edge is sampled in each iteration of Algorithm 1 with probability at least and the number of iterations of Algorithm 1 is much larger than ; thus we expect almost every crucial edge to be sampled in . We formalize this intuition in the following lemma whose proof we defer to Appendix A.2.

###### Lemma 4.5 (crucial edges lemma).

Let be the sample obtained by Algorithm 1. Then, we have

.

###### Proof.

Consider , which is clearly a valid realized matching of . It suffices to show that . Note that any edge that appears in will appear in , therefore, each edge appears in with probability . This means that as desired. ∎

The combination of Lemma 4.5 and Observation 4.6 implies that Algorithm 1 achieves an expected matching of weight at least . This implies that if is sufficiently close to opt (which is equvialent to by Observation 4.3), Algorithm 1 obtains a good approximation. However, it might be the case that indeed the expected weight of the crucial edges is very small or even with being close to opt. To handle this, we need a different argument for non-crucial edges. The challenge is that the matching probability of a non-crucial edge can be arbitrarily small, and may even depend on . Consider for example the complete bipartite graph with all edge weights of 1 (i.e., the graph is unweighted). One can show that the expected matching of is as large as with high probability (see e.g., [BDH15]) while the matching probability of every edge999Here for the sake of this example, we assume that the algorithm to obtain the maximum matching of a realization of is not biased towards including any specific edge. in is roughly . Therefore, since is of constant degree, will not be even a constant fraction of and we cannot use Observation 4.6 to argue that the is large.

To alleviate the above-mentioned problem, we need to be able to get a large matching among the non-crucial edges too. This is the issue that we address next.

#### A lemma for non-crucial edges.

We describe a procedure – formalized as Procedure 4 – to construct a heavy fractional matching on the realized portion of the non-crucial edges of which also enjoys some other properties of interest. For simplicity of notation, we use to denote .

[width=enhanced, boxsep=2pt, left=1pt, right=1pt, top=4pt, boxrule=1pt, arc=0pt, colback=white, colframe=black, breakable]Procedure 1. Constructs a fractional matching on non-crucial realized edges of .   For any edge initially set . Then update as follows:

1. [label=(0)]

2. For any realized sampled non-crucial edge (i.e., ), set where denotes the fraction of iterations of Algorithm 1 in which edge is part of the picked matching .

3. Initially set the scaling-factor of each edge to be . Then loop over the vertices in an arbitrary order and for any incident to , update

 se←min{se,max{qNv,ϵ}/~xNv},

where denotes the non-crucial-weight of vertex .

4. Scale down the fractional matching in the following way: for any edge , set .

The following lemma highlights the properties of the procedure above.

###### Lemma 4.7 (non-crucial edges lemma).

The fractional matching obtained by Procedure 4 has the following properties:

1. For any with , fills only fraction of its blossom inequality. That is,

 xN(U)≤ϵ⌊|U|/2⌋∀U⊆V:|U|≤1/ϵ.
2. The non-crucial budgets of the vertices are (almost) preserved. More precisely,

 xNv≤max{qNv,ϵ}∀v∈V.
3. The expected weight of the fractional matching is sufficiently close to that of non-crucial edges, i.e.,

 E[∑e∈Sp∩NxNe⋅we]≥(1−10ϵ)φ(N).

#### The intuition behind Procedure 4.

Observe that the fractional matching constructed by Procedure 4 relies critically on , the fraction of iterations in which edge is sampled by the algorithm. Recall that the probability with which Algorithm 1 samples an edge is precisely equal to . Therefore it is not hard to see that . Similar to , we can see the collection of ’s on all edges as a fractional matching. In this regard, since , we have . Despite these similarities, note that by definition, is non-zero only on the edges sampled by Algorithm 1. This is desirable since we want to construct a large fractional matching only on the sampled edges. However, we further want our fractional matching to be non-zero only on the realized sampled edges. To do this, the final fractional matching that we construct is roughly as follows: is if is realized and it is 0 otherwise. Since each edge is realized with probability , we have . Note, however, that we have to make sure that is a valid fractional matching. That is, should not assign a fractional matching of larger than 1 to any vertex. (Properties 1 and 2 even impose stricter restrictions) To do this, we may have to manually scale down the value of after observing the realization. However, we need to argue that this does not hurt the total size of it by a significant factor. For this, we use the fact that for most non-crucial edges is very small due to its value being close to which is at most for all non-crucial edges. This, combined with the independence of edge realizations, indicates e.g., that it is very unlikely that exceeds 1 by a larger than factor. Note that unfortunately the same procedure does not provide a good approximation on the crucial edges. The reason is that for crucial edges, can be as large as and the probability that exceeds 1 will not negligible.

As for the proof of Lemma 4.7, note that the first and the second properties are directly satisfied by Procedure 4. To see this, observe that for any edge , we have . This means that for any subset of the vertices, we have

 xN(U) ≤ϵ3⋅(|U|2)=ϵ2⋅|U|⋅(|U|−1)2,

which implies for any with , that

 xN(U)

completing the proof of property 1. Property 2 is also simple to prove. In fact, steps 2 and 3 of Procedure 4 are solely written to satisfy this property. To see this, take a vertex , if the non-crucial budget of is preserved since the scaling-factors are no more than 1. Otherwise, by the end of step 2 we ensure that for any edge incident to we have . Thus, once completing step 3, we have

 xNv=~xNv⋅se≤~xNv⋅max{qNv,ϵ}/~xNv=max{qNv,ϵ},

which is the desired bound for property 2. It only remains to prove that the fractional matching assigned to the realized sampled non-crucial edges is large as required by property 3. The proof of this part is rather technical and to prevent interruptions to the flow of the paper, we defer it to Appendix A.

#### Implications.

By coupling Lemma 4.5 and Lemma 4.7 we immediately get an analysis that ensures Algorithm 1 obtains an (almost) approximation. To see this, recall by Observation 4.3 that , thus, either or . If , then Lemma 4.5 implies that the expected matching weight of our sample is at least . On the other hand, if , the fractional matching obtained by Lemma 4.7 which also satisfies blossom inequalities, implies that an integral matching of size at least must exist in the realization.

###### Corollary 4.8.

For any desirably small , Algorithm 1 provides a approximation for weighted graphs by querying edges per vertex.

Note that Corollary 4.8 already improves the number of per-vertex queries of known results for weighted graphs due to [BR18, YM18]. Our goal, however, is to provide a much better guarantee on the approximation factor. Suppose for example, that . In this case, to achieve any approximation factor better than , we need to argue that the crucial edges and the non-crucial edges can augment each other to obtain a matching that is much heavier than what they achieve individually. This is the issue that we address in the next two sections.

## 5 Beyond Half Approximation – Unweighted Graphs

In this section, we devise a process that constructs a large fractional matching on the realized graph by assigning values to both crucial and non-crucial edges. For non-crucial edges, we follow Procedure 4 in obtaining the fractional matching. For crucial edges, however, we take a different approach in constructing the fractional matching. Before describing the actual procedure, we emphasize on the following property of Procedure 4 which is necessary for augmenting it with crucial edges.

###### Observation 5.1.

Procedure 4 does not look at how the crucial edges are realized.

Intuitively, the observation above tells us that the large fractional matching that we obtain on realized non-crucial edges does not adversarially affect the realization of crucial edges since Procedure 4 is essentially unaware of the realization of crucial edges. As such, if we are able to construct a large realized fractional matching on the crucial edges, that also (1) does not violate the crucial budget of the vertices, or the blossom inequalities, and that (2) does not “look” at the realization of the non-crucial edges, we can plug the two fractional matchings together to obtain a valid fractional matching that combines both non-crucial and crucial edges. This is, unfortunately, not possible on the crucial edges and the main obstacle is preserving the per-vertex budgets.

To illustrate the above-mentioned problem, consider a graph with vertices and edges where each vertex is connected to exactly one edge, i.e., the graph is a matching of size . Any of these edges that is realized will be part of the realized matching, thus, for any edge in this graph we have ; which means they are all crucial edges and we have . Note that the crucial budget of each of the vertices is . Therefore, if we want to preserve these crucial budgets on the realized crucial edges, the fractional value that we assign to each realized edge would be at most (instead of 1); implying that the expected fractional matching that we get would have a total weight of in expectation which is only a fraction of .

Recall that preserving the crucial/non-crucial per-vertex budgets was to ensure that once we combine the crucial and non-crucial fractional matchings, the total fractional matching connected to each vertex does not exceed 1. To achieve this, a slightly weaker constraint is also sufficient. Consider a vertex with non-crucial budget and crucial budget . If (i.e., ) is much smaller than 1, we can allow the crucial fractional matching to assign a value of (roughly) up to to the edges connected to . This, for instance, resolves the issue of the example in the previous paragraph. Thus, it only remains to argue that one can find a large such fractional matching on realized crucial edges. We formalize the procedure for doing this as Procedure 5.

[width=enhanced, boxsep=2pt, left=1pt, right=1pt, top=4pt, boxrule=1pt, arc=0pt, colback=white, colframe=black, breakable]Procedure 2. Constructing a fractional matching for unweighted graphs on the realized crucial edges of .   Input: The realized portion of the sampled crucial edges.

For any matching define the appearance-probability of to be the probability with which is the portion of that appears in the omniscient optimum, given the realization of the crucial edges. Formally,

 q(μ|RC)=Pr[μ=(M(Ep)∩Sp∩C)∣∣Ep∩C=RC].

Among all matchings in , we draw one according to the appearance-probabilities. Let us denote this matching by . For any edge , set

 xCe←(1−ϵ)min{1−qNv,1−qNu},

and for any other edge we set .

We first show that by combining Procedures 4 and 5 we can obtain a approximation for unweighted graphs. Define fractional matching as follows

 xe:=xNe∀e∈N,xe:=xCe∀e∈C. (1)
###### Claim 5.2.

is a valid fractional matching that satisfies blossom inequalities of size up to .

###### Proof.

Fix any arbitrary subset of size at most . Lemma 4.7 guarantees that the fractional matching on non-crucial edges of has size at most . On the other hand, since is an integral matching, it has at most edges in . Since the fractional matching that we assign each edge of is at most , overall the total size of the fractional matching assigned to the edges in cannot be more than . ∎

###### Theorem 5.3.

If is unweighted, the constructed fractional matching of Procedure 5 has size . Therefore, Algorithm 1, in expectation, achieves an approximation factor of at least .

###### Proof.

Let us denote by the size of our fractional matching . We know by definition that . It can be deducted by property 3 of Lemma 4.7 that

 E[∑e∈NxNe]≥(1−ϵ)φ(N)=(1−ϵ)q(N), (2)

where the latter equality is due to the assumption that the graph is unweighted. Our goal, now, is to show that is also large. Take a crucial edge , we know that Algorithm 1 picks with probability at least since is a crucial edge. Assuming that is picked by Algorithm 1, is part of the matching picked by Procedure 5 with probability at least . And if is part of , the fractional matching that will be assigned to it is . Thus, for any crucial edge , we have

 E[xCe]=(1−ϵ)⋅qe⋅(1−ϵ)min{1−qNv,1−qNu}≥(1−2ϵ)qe⋅min{1−qNv,1−qNu}.

To get rid of the minimization above, we make the crucial edges directed towards their endpoint with the higher non-crucial budget. Formally, a crucial edge is directed towards its endpoint if and in case of a tie (i.e., if ), we break it arbitrarily. For any vertex we denote its incoming crucial edges by and use to denote the total matching probabilities of the edges that are directed towards . With these definitions, we have

 E[∑e∈CxCe] =∑v(1−2ϵ)(1−qNv)qC−v =(1−2ϵ)∑v(qC−v−qNvqC−v) =(1−2ϵ)∑vqC−v−(1−2ϵ)∑vqNvqC−v =(1−2ϵ)q(C)−(1−2ϵ)∑vqNvqC−v. (3)

Combining (2) and (5) we get

 E[\textscalg]=E[∑e∈NxNe]+E[∑e∈CxCe] ≥(1−ϵ)q(N)+(1−2ϵ)q(C)−(1−2ϵ)∑vqNvqC−v, ≥(1−2ϵ)(q(N)+q(C)−∑vqNvqC−v).

On the other hand, recall that , thus we have

 E[\textscalg]\textscopt≥(1−2ϵ)(q(N)+q(C)−∑vqNvqC−v)q(N)+q(C)≥(1−2ϵ)(1−∑vqNvqC−vq(N)+q(C)). (4)

Note that since each crucial edge is directed towards exactly one of its endpoints, we have . On the other hand, we have since the matching probability of each non-crucial edge will contribute both to and . Combining these two observations, we have

 q(N)+q(C)=∑vqC−v+qNv2. (5)

Combining (4) and (5) we get

 E[\textscalg]\textscopt≥(1−2ϵ)(1−∑vqNvqC−v∑vqC−v+qNv2). (6)

We use the following mathematical lemma to show the desired bound on this ratio.

###### Lemma 5.4.

Given any set of numbers and such that

1. [label=()]

2. , , and for any , and

3. ,

we have

For any vertex we have and and clearly since is a valid fractional matching and the amount of matching incident to each vertex is at most 1, therefore, condition (i) of Lemma 5.4 is satisfied. Furthermore, condition (ii) of Lemma 5.4 also holds so long as which is always the case unless the graph is empty, thus we have

 ∑vqNvqC−v∑vqC−v+qNv2≤6−4√2,therefore,1−∑vqNvqC−v∑vqC−v+qNv2≥1−(6−4√2)=4√2−5.

Replacing this in Inequality (6) we get or equivalently the desired bound in Theorem 5.3 that . ∎

We next show that our analysis in Theorem 5.3 for the fractional matching constructed via the above-mentioned procedures is tight.

###### Lemma 5.5.

There exists a bipartite unweighted graph , for which the fractional matching construct via Procedures 4 and 5 has an approximation factor of less than .

###### Proof. Figure 3: An unweighted bipartite graph for which the fractional matching composed of Procedures 4 and 5 does not provide a better than 4√2−5 approximation.

For a sufficiently large , construct a graph (refer to Figure 3 for the illustration of the graph) with four sets of vertices, i.e., the graph has vertices in total. There is a complete bipartite graph between the vertices in and . There is also a perfect matching between and and a perfect matching between and . Moreover, we set the realization probability of the graph to be . The optimal way of constructing a matching in a realization of is to first add all the realized edges between and or and to the matching; and then complement it via the realized edges between the unmatched vertices in and . Since there is a complete bipartite graph between the unmatched vertices in and , one can find a realized matching that is almost perfect. That is, this realized matching matches fraction of the unmatched vertices in and . Thus, overall, we have

 E[\textscopt] =p×2Lmatching between A and B or % between A′ and B′+(1−o(1))(1−p)L% matching between B and B′ ≥(1+p−o(1))L ≥(√2−o(1))L.