 # Lower bounds for maximal matchings and maximal independent sets

There are distributed graph algorithms for finding maximal matchings and maximal independent sets in O(Δ + ^* n) communication rounds; here n is the number of nodes and Δ is the maximum degree. The lower bound by Linial (1992) shows that the dependency on n is optimal: these problems cannot be solved in o(^* n) rounds even if Δ = 2. However, the dependency on Δ is a long-standing open question, and there is currently an exponential gap between the upper and lower bounds. We prove that the upper bounds are tight. We show that maximal matchings and maximal independent sets cannot be found in o(Δ + n / n) rounds. Our lower bound holds for deterministic and randomized distributed algorithms in the LOCAL model of distributed computing.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

There are four classic problems that have been studied extensively in distributed graph algorithms since the very beginning of the field in the 1980s : maximal independent set (MIS), maximal matching (MM), vertex coloring with colors, and edge coloring with colors; here is the maximum degree of the graph. All of these problems are trivial to solve with a greedy centralized algorithm, but their distributed computational complexity has remained an open question.

In this work, we resolve the distributed complexity of MIS and MM in the region . In this region, for the LOCAL model [27, 33] of distributed computing, the fastest known algorithms for these problems are:

• [noitemsep]

• MM is possible in rounds .

• MIS is possible in rounds .

Nowadays we know how to find a vertex or edge coloring with colors in rounds [14, 4]. Hence the current algorithms for both MIS and MM are conceptually very simple: color the vertices or edges with colors, and then construct an independent set or matching by going through all color classes one by one. The second part is responsible for the term in the running time, and previously we had no idea if this is necessary.

#### Prior work.

Already in the 1990s we had a complete understanding of the term :

• [noitemsep]

• MM is not possible in rounds for any [26, 27, 30].

• MIS is not possible in rounds for any [26, 27, 30].

Here the upper bounds are deterministic and the lower bounds hold also for randomized algorithms.

However, we have had no lower bounds that would exclude the existence of algorithms of complexity for either of these problems [5, 34]. For regular graphs we have not even been able to exclude the possibility of solving both of these problems in time , while for the general case the best lower bound as a function of was [23, 24, 25].

#### Contributions.

We close the gap and prove that the current upper bounds are tight. There is no algorithm of complexity for MM or MIS. More precisely, our main result is:

There is no randomized distributed algorithm that solves MM or MIS in

rounds in the LOCAL model (with high probability).

There is no deterministic distributed algorithm that solves MM or MIS in rounds in the LOCAL model.

As corollaries, we have a new separation and a new equivalence in the region:

• [noitemsep]

• MM and MIS are strictly harder than -vertex coloring and -edge coloring.

• MM and MIS are exactly as hard as greedy coloring. Figure 1: Distributed algorithms for maximal matching: upper bounds (blue dots) and lower bounds (orange regions). Filled dots are randomized algorithms and filled regions are lower bounds for randomized algorithms; white dots are deterministic algorithms and white regions are lower bounds for deterministic algorithms. The running time is represented here in the form O(f(Δ)+g(n)), the horizontal axis represents the f(Δ) term, and the vertical axis represents the g(n) term.

#### Plan.

We will present a simpler version of our new linear-in- lower bound in Section 3. There we will look at a restricted setting of distributed computing—deterministic algorithms in the port-numbering model—and explain the key ideas using that. In Section 4 we will then see how to extend the result to randomized algorithms in the usual LOCAL model of computing.

## 2 Related work

#### MIS and MM.

For MIS and MM, as well as for other classical symmetry-breaking problems, there are three major families of algorithms:

• [noitemsep]

• Deterministic algorithms for a small , with a complexity of the form .

• Deterministic algorithms for a large , with superlogarithmic complexities as a function of .

• Randomized algorithms for a large , with sublogarithmic complexities as a function of .

We summarize the state of the art in Table 1 and Figure 1; see e.g. Alon et al. , Luby [28, 29], Israeli and Itai , Hanckowiak et al. [20, 19], and Barenboim et al. [6, 8] for more prior work on maximal matchings and maximal independent sets.

Previously, it was not known if any of these algorithms are optimal. In essence, there have been only two lower bound results:

• Linial [26, 27] and Naor  show that there is no deterministic or randomized algorithm for MM or MIS that runs in rounds, even if we have .

• Kuhn et al. [23, 24, 25] show that there is no deterministic or randomized algorithm for MM or MIS that runs in rounds.

Hence, for example, when we look at the fastest MM algorithms, dependency on in is optimal, and dependency on in is near-optimal. However, could we get the best of the both worlds and solve MM or MIS in e.g.  rounds?

In this work we show that the answer is no. There is no algorithm for either of these problems that runs in time , and hence certainly no algorithm that runs in time . The current upper bounds for the case of a small are optimal. Moreover, our work shows there is not much room for improvement in the dependency on in the algorithm by Fischer , either.

With this result we resolve Open Problem 11.6 in Barenboim and Elkin , and present a proof for the conjecture of Göös et al. .

#### Coloring.

It is interesting to compare MIS and MM with the classical distributed coloring problems: vertex coloring with colors and edge coloring with colors . As recently as in 2014, the fastest algorithms for all of these problems in the “small ” region had the same complexity as MIS and MM, rounds . However, in 2015 the paths diverged: Barenboim  and Fraigniaud et al.  have presented algorithms for graph coloring in rounds, and hence we now know that coloring is strictly easier than MM or MIS in the small region.

However, there is a variant of -vertex coloring that is closely related to MIS: greedy coloring . Greedy coloring is trivially at least as hard as MIS, as color class in any greedy coloring gives an MIS. On the other hand, greedy coloring is possible in time , as we can turn an -vertex coloring into a greedy coloring in rounds (and this was actually already known to be tight). Now our work shows that greedy coloring is exactly as hard as MIS. In a sense this is counterintuitive: finding just color class of a greedy coloring is already asymptotically as hard as finding the entire greedy coloring.

#### Restricted lower bounds.

While no linear-in- lower bounds for MM or MIS were known previously for the usual LOCAL model of distributed computing, there was a tight bound for a toy model of distributed computing: deterministic algorithms in the edge coloring model. Here we are given a proper edge coloring of the graph with colors, the nodes are anonymous, and the nodes can use the edge colors to refer to their neighbors. In this setting there is a trivial algorithm that finds an MM in rounds: go through color classes one by one and greedily add edges that are not adjacent to anything added so far. It turns out there is a matching lower bound: no algorithm solves this problem in rounds in the same model .

The same technique was later used to study maximal fractional matchings in the LOCAL model of computing. This is a problem that can be solved in rounds (independent of ) in the usual LOCAL model , and there was a matching lower bound that shows that the same problem cannot be solved in rounds (independent of ) in the same model .

While these lower bounds were seen as a promising indicator that there might be a linear-in- lower bound in a more general setting, the previous techniques turned out to be a dead end. In particular, they did not tell anything nontrivial about the complexity of MM or MIS in the usual LOCAL model. Now we know that an entirely different kind of approach was needed—even though the present work shares some coauthors with [21, 18], the techniques of the present work are entirely unrelated to those.

#### Speedup simulation technique.

The technique that we use in this work is based on speedup simulation. In essence, the idea is that we assume we have an algorithm that solves a problem in rounds, and then we construct a new algorithm that solves another problem in rounds. A node in algorithm gathers its radius- neighborhood, considers all possible ways of extending it to a radius- neighborhood, simulates for each such extension, and then uses the output of to choose its own output. Now if we can iterate the speedup step for times (without reaching a trivial problem), we know that the original problem requires at least rounds to solve.

This approach was first used by Linial [26, 27] and Naor  to prove that graph coloring in cycles requires rounds. This was more recently used to prove lower bounds for sinkless orientations, algorithmic Lovász local lemma, and -coloring [10, 12], as well as to prove lower bounds for weak coloring [9, 3].

In principle, the approach can be used with any locally checkable graph problem, in a mechanical manner . However, if one starts with a natural problem (e.g. MIS, MM, or graph coloring) and applies the speedup simulation in a mechanical manner, the end result is typically a problem that does not have any natural interpretation or simple description, and it gets quickly exponentially worse. The key technical challenge that the present work overcomes is the construction of a sequence of nontrivial problems such that each of them has a relatively simple description and we can nevertheless apply speedup simulation for any pair of them.

The formalism that we use is closely related to —in essence, we generalize the formalism from graphs to hypergraphs, then represent the hypergraph as a bipartite graph, and we arrive at the formalism that we use in the present work to study maximal matchings in bipartite graphs.

## 3 Deterministic lower bound

Consider the following setting: We have a -regular bipartite graph; the nodes in one part are white and in the other part black. Each node has a port numbering for the incident edges; the endpoints of the edges incident to a node are numbered in some arbitrary order with . See Figure 2 for an illustration. Figure 2: A 3-regular bipartite port-numbered network and a maximal matching.

The graph represents the topology of a communication network: each node is a computer, and each edge is a communication link. Initially each computer only knows its own color (black or white) and the number of ports (); the computers are otherwise identical. Computation proceeds in synchronous communication rounds—in each round, each node can send an arbitrary message to each of its neighbors, then receive a message from each of its neighbors, and update its own state. After some communication rounds, all nodes have to stop and announce their own part of the solution; here is the running time of the algorithm.

We are interested in algorithms for finding a maximal matching; eventually each node has to know whether it is matched and in which port. There is a very simple algorithm that solves this in rounds : In iteration , unmatched white nodes send a proposal to their port number , and black nodes accept the first proposal that they receive, breaking ties using port numbers. See Figure 3 for an example. Figure 3: The proposal algorithm finds a maximal matching in O(Δ) rounds—orange arrows are accepted proposals and blue arrows are rejected proposals.

Hence bipartite maximal matching can be solved in rounds in -regular two-colored graphs, and the running time is independent of the number of nodes. Surprisingly, nobody has been able to tell if this algorithm is optimal, or anywhere close to optimal. There are no algorithms that break the linear-in- barrier (without introducing some dependency on in the running time), and there are no nontrivial lower bounds—we have not been able to exclude even the possibility of solving maximal matchings in this setting in e.g.  rounds, independently of and . If we look at a bit more general setting of graphs of degree at most (instead of -regular graphs), there is a lower bound of [23, 24, 25], but there is still an exponential gap between the upper and the lower bound.

In this section we show that the trivial proposal algorithm is indeed optimal: there is no algorithm that finds a maximal matching in rounds in this setting. We will later extend the result to more interesting models of computing, but for now we will stick to the case of deterministic algorithms in the port-numbering model, as it is sufficient to explain all key ideas.

### 3.1 Lower bound idea

Our plan is to prove a lower bound using a speedup simulation argument [26, 27, 30, 10, 12, 9, 3]. The idea is to define a sequence of graph problems such that if we have an algorithm that solves in rounds, we can construct an algorithm that solves strictly faster, in rounds. Put otherwise, we show that solving takes at least one round more than solving . Then if we can additionally show that is still a nontrivial problem that cannot be solved in zero rounds, we know that the complexity of is at least rounds.

Now we would like to let be the maximal matching problem, and identify a suitable sequence of relaxations of the maximal matching problem. A promising candidate might be, e.g., the following problem that we call here a -matching for brevity.

###### Definition 1 (k-matching).

Given a graph , a set of edges is a -matching if

1. [noitemsep]

2. every node is incident to at most edges of ,

3. if a node is not incident to any edge of , then all of its neighbors are incident to at least one edge of .

Note that with this definition, a -matching is exactly the same thing as a maximal matching. Also it seems that finding a -matching is easier for larger values of . For example, we could modify the proposal algorithm so that in each iteration white nodes send proposals in parallel, and this way find a -matching in rounds. Could we define that is the problem of finding an -matching, and try to prove that given an algorithm for finding an -matching, we can construct a strictly faster algorithm for finding an -matching?

A direct attack along these lines does not seem to work, but this serves nevertheless as a useful guidance that will point us in the right direction.

### 3.2 Formalism and notation

We will first make the setting as simple and localized as possible. It will be convenient to study graph problems that are of the following form—we call these edge labeling problems:

1. The task is to label the edges of the bipartite graph with symbols from some alphabet .

2. A problem specification is a pair , where is the set of feasible labellings of the edges incident to a white node, and is the set of feasible labellings for the edges incident to a black node.

Here we will assume that feasibility of a solution does not depend on the port numbering. Hence each member of and is a multiset that contains elements from alphabet . For example, if we have and , then indicates that a white node is happy if it is incident to exactly or edges with label .

However, for brevity we will here represent multisets as words, and write e.g. . We emphasize that the order of the elements does not matter here, and we could equally well write e.g. . Now that and are languages over alphabet , we can conveniently use regular expressions to represent them. When are symbols of the alphabet, we use the shorthand notation . With this notation, we can represent the above example concisely as , or , or even .

#### Example: encoding maximal matchings.

The most natural way to encode maximal matchings would be to use e.g. labels and on the edges, with to indicate an edge in the matching. However, this is not compatible with the above formalism: we would have to have and to allow for unmatched nodes, but then we would also permit a trivial all- solution. To correctly capture the notion of maximality, we will use three labels, , with the following rules:

 W =MOΔ−1∣∣PΔ, (1) B

For a matched white node, one edge is labeled with an (matched) and all other edges are labeled with an (other). However, for an unmatched white node, all incident edges have to be labeled with a (pointer); the intuition is that points to a matched black neighbor. The rules for the black nodes ensure that pointers do not point to unmatched black nodes (a implies exactly one ), and that black nodes are unmatched only if all white neighbors are matched (all incident edges labeled with s). See Figure 4 for an illustration.

#### White and black algorithms.

Let be an edge labeling problem. We say that is a white algorithm that solves if in each white node outputs a labeling of its incident edges, and such a labeling forms a feasible solution to . Black nodes produce an empty output.

Conversely, in a black algorithm, each black node outputs the labels of its incident edges, and white nodes produce an empty output. See Figure 4 for illustrations. Note that a black algorithm can be easily turned into a white algorithm if we use one additional communication round, and vice versa.

#### Infinite trees vs. finite regular graphs.

It will be convenient to first present the proof for the case of infinite -regular trees. In essence, we will show that any algorithm that finds a maximal matching in rounds will fail around some node in some infinite -regular tree (for some specific port numbering). Then it is also easy to construct a finite -regular graph such that the radius- neighborhood of in (including the port numbering) is isomorphic to the radius- neighborhood of some node in , and therefore will also fail around in .

### 3.3 Parametrized problem family

We will now introduce a parametrized family of problems , where . The problem is defined so that is equivalent to maximal matchings (1) and the problem becomes easier when we increase or . We will use the alphabet , where , , and have a role similar to maximal matchings and acts as a wildcard. We define , where

 WΔ(x,y) (2) BΔ(x,y)

where .

The following partial order represents the “strength” of the symbols from the perspective of black nodes:

 (3)

The interpretation is that from the perspective of , symbol is feasible wherever or is feasible, and is feasible wherever is feasible. Furthermore, all relations are strict in the sense that e.g. replacing an with an may lead to a word not in .

Here are three examples of problems in family , with some intuition (from the perspective of a white algorithm):

• : Maximal matching. Note that we cannot use symbol at all, as they do not appear in .

• : Unmatched white nodes will use instead of once—note that by (3) this is always feasible for the black node at the other end of the edge and sometimes helpful. Unmatched black nodes can accept instead of once.

• : All white nodes will use instead of or once—again, this is always feasible and sometimes helpful. All black nodes can accept anything from one port.

In essence, resembles a problem in which we can violate maximality, while in we can violate the packing constraints.

### 3.4 Speedup simulation Figure 5: Speedup simulation, for T=2 and Δ=3. The radius-(T−1) neighborhood of u is U, and the radius-T neighborhood of vi is Vi=U∪Di. The key observation is that Di and Dj are disjoint for i≠j.

Assume that is a white algorithm that solves in rounds in trees, for a sufficiently large . Throughout this section, let .

#### Algorithm A1.

We will first construct a black algorithm that runs in time , as follows:

Each black node gathers its radius- neighborhood ; see Figure 5. Let the white neighbors of be . Let be the radius- neighborhood of , and let be the part of that does not see. For each , go through all possible inputs that one can assign to ; here the only unknown part is the port numbering that we have in the region . Then simulate for each possible input and see how labels the edge . Let be the set of labels that assigns to edge for some input . Algorithm labels edge with set .

#### Algorithm A2.

Now since the output alphabet of is , the new output alphabet of consists of its 15 nonempty subsets. We construct another black algorithm with alphabet that simulates and then maps the output of as follows (see Figure 6 for an example):

 {X} ↦\definecolor[named]pgfstrokecolorrgb.75,.75,.75\pgfsys@color@gray@stroke.75\pgfsys@color@gray@fill.75\definecolor[named]pgfstrokecolorrgb0,0,0\pgfsys@color@gray@stroke0\pgfsys@color@gray@fill0X, {M},{M,X} ↦\definecolor[named]pgfstrokecolorrgb.75,.75,.75\pgfsys@color@gray@stroke.75\pgfsys@color@gray@fill.75\definecolor[named]pgfstrokecolorrgb0,0,0\pgfsys@color@gray@stroke0\pgfsys@color@gray@fill0MX, {O},{O,X} ↦\definecolor[named]pgfstrokecolorrgb.75,.75,.75\pgfsys@color@gray@stroke.75\pgfsys@color@gray@fill.75\definecolor[named]pgfstrokecolorrgb0,0,0\pgfsys@color@gray@stroke0\pgfsys@color@gray@fill0OX, {M,O},{M,O,X} ↦\definecolor[named]pgfstrokecolorrgb.75,.75,.75\pgfsys@color@gray@stroke.75\pgfsys@color@gray@fill.75\definecolor[named]pgfstrokecolorrgb0,0,0\pgfsys@color@gray@stroke0\pgfsys@color@gray@fill0MOX, {P},{P,O},{P,X},{P,O,X} ↦\definecolor[named]pgfstrokecolorrgb.75,.75,.75\pgfsys@color@gray@stroke.75\pgfsys@color@gray@fill.75\definecolor[named]pgfstrokecolorrgb0,0,0\pgfsys@color@gray@stroke0\pgfsys@color@gray@fill0POX, {M,P},{M,P,O},{M,P,X},{M,P,O,X} ↦\definecolor[named]pgfstrokecolorrgb.75,.75,.75\pgfsys@color@gray@stroke.75\pgfsys@color@gray@fill.75\definecolor[named]pgfstrokecolorrgb0,0,0\pgfsys@color@gray@stroke0\pgfsys@color@gray@fill0MPOX.

Here the intuition is that we first make each set maximal w.r.t. (3): for example, whenever we have a set with a , we also add an , and whenever we have a set with an , we also add an . This results in only six maximal sets, and then we replace e.g. the maximal set with the label .

#### Output of A2.

Let us analyze the output of for a black node. Fix a black node and its neighborhood . The key property is that regions in Figure 6 do not overlap—hence if there is some input in which is “bad”, and another input in which is “bad”, we can also construct an input in which both and are simultaneously “bad”. We make the following observations:

1. There can be at most edges incident to with a label in . If there were such edges, say , then it means we could fix such that outputs for , and simultaneously fix such that outputs for , etc. But this would violate the property that solves , as all words of contain at most copies of .

2. If there are at least edges with a label in , then there has to be at least one edge with a label in . Otherwise we could choose so that outputs on edges, and there is no or . But all words of with at least copies of contain also at least one or .

#### Algorithm A3.

We construct yet another black algorithm that modifies the output of so that we replace labels only with larger labels according to the following partial order, which represents subset inclusion:

 (4)

There are two cases:

1. There are at most copies of