DeepAI

# Node and Edge Averaged Complexities of Local Graph Problems

The node-averaged complexity of a distributed algorithm running on a graph G=(V,E) is the average over the times at which the nodes V of G finish their computation and commit to their outputs. We study the node-averaged complexity for some distributed symmetry breaking problems and provide the following results (among others): - The randomized node-averaged complexity of computing a maximal independent set (MIS) in n-node graphs of maximum degree Δ is at least Ω(min{logΔ/loglogΔ,√(log n/loglog n)}). This bound is obtained by a novel adaptation of the well-known KMW lower bound [JACM'16]. As a side result, we obtain the same lower bound for the worst-case randomized round complexity for computing an MIS in trees – this essentially answers open problem 11.15 in the book of Barenboim and Elkin and resolves the complexity of MIS on trees up to an O(√(loglog n)) factor. We also show that, (2,2)-ruling sets, which are a minimal relaxation of MIS, have O(1) randomized node-averaged complexity. - For maximal matching, we show that while the randomized node-averaged complexity is Ω(min{logΔ/loglogΔ,√(log n/loglog n)}), the randomized edge-averaged complexity is O(1). Further, we show that the deterministic edge-averaged complexity of maximal matching is O(log^2Δ + log^* n) and the deterministic node-averaged complexity of maximal matching is O(log^3Δ + log^* n). - Finally, we consider the problem of computing a sinkless orientation of a graph. The deterministic worst-case complexity of the problem is known to be Θ(log n), even on bounded-degree graphs. We show that the problem can be solved deterministically with node-averaged complexity O(log^* n), while keeping the worst-case complexity in O(log n).

• 22 publications
• 34 publications
• 37 publications
• 23 publications
04/18/2022

### Sleeping is Superefficient: MIS in Exponentially Better Awake Complexity

Maximal Independent Set (MIS) is one of the central and most well-studie...
03/30/2021

### The randomized local computation complexity of the Lovász local lemma

The Local Computation Algorithm (LCA) model is a popular model in the fi...
11/03/2020

### Search Problems in Trees with Symmetries: near optimal traversal strategies for individualization-refinement algorithms

We define a search problem on trees that closely captures the backtracki...
11/03/2022

### Distributed Maximal Matching and Maximal Independent Set on Hypergraphs

We investigate the distributed complexity of maximal matching and maxima...
12/17/2020

### Quantum Algorithm for Lexicographically Minimal String Rotation

Lexicographically minimal string rotation (LMSR) is a problem to find th...
10/03/2022

### Local Computation of Maximal Independent Set

We present a randomized Local Computation Algorithm (LCA) with query com...
08/29/2022

### Randomized Approximation Schemes for the Tutte Polynomial and Random Clustering in Subdense and Superdense Graphs

Extending the work of Alon, Frieze abnd Welsh, we show that there are ra...

## 1 Introduction

The main focus throughout the past four decades of studying distributed algorithms for graph problems has traditionally been on the worst-case round complexity. That is, the round complexity of the algorithm is defined to be the number of rounds until all nodes in the network terminate. This has proved fruitful in many contexts. For example, it allows to cleanly talk about the locality of a graph problem: e.g., any deterministic -round algorithm for a given problem shows that any node’s output can be determined by a function of the topology induced by the -hop neighborhood of the node [linial1987LOCAL, peleg00]. More recently [feuilloley2020long, barenboim2019distributed, chatterjee2020sleeping], starting with an initial exploration of Feuilloley [feuilloley2020long], there has been interest in going beyond this worst-cast measure. As a primary example, this involves asking “what is the average termination time among the nodes?” There are different arguments for why understanding such node-averaged complexities is valuable in different contexts. It is indicative of the run-time of a typical node [feuilloley2020long] and it also implies sharper bounds on the overall energy spent in a network [chatterjee2020sleeping]. In this paper, we investigate the node and edge averaged complexity of some of the most prominent problems in the literature on distributed graph algorithms.

### 1.1 Our Contributions

#### Maximal Independent Set.

The maximal independent set (MIS) problem is one of the central problems in the area of distributed graph algorithms. Feuilloley [feuilloley2020long] showed that Linial’s lower bound for computing an MIS on -node cycles even applies to node-averaged round complexity, as long as we stick to deterministic algorithms. In contrast, even though Linial’s worst-case round complexity [linial1987LOCAL] even holds for randomized algorithms [Naor91], when switching to node-averaged complexity, randomized algorithms can easily break this barrier. Indeed for any constant degree graph, one can obtain a randomized MIS algorithm with node-averaged round complexity in a straightforward way: among many others, Luby’s algorithm [alon86, luby86]

will remove each node with a constant probability in a single phase of the algorithm and it thus has a node-averaged complexity of

on constant degree graphs.

One of the main open questions in the study of distributed node-averaged complexity is whether this bound is achievable in general graphs—this was for instance recently mentioned explicitly by Chatterjee et al. in [chatterjee2020sleeping]. It is however worth noting that this particular question is in some sense much older: Luby’s analaysis [luby86] shows that his MIS algorithm removes a constant fraction of the edges per iteration, and it has been open since then whether a distributed MIS algorithm can also remove a constant fraction of the nodes in rounds.

As one of our main contributions, we refute the possibility of a randomized MIS algorithm with node-averaged complexity . In particular, we give a modification of the KMW lower bound by Kuhn, Moscibroda, and Wattenhofer [lowerbound, kuhn16_jacm] to show that their bound also applies to the node-averaged complexity of computing an MIS. That is, there is a family of graphs for which the node-averaged complexity of any randomized distributed MIS algorithm is lower bounded by . We also comment that a randomized MIS algorithm from prior work by Bar-Yehuda, Censor-Hillel, Ghaffari, and Schwartzman [bar2017distributed] has a node-averaged complexity of . Hence, at least for graphs of maximum degree , the node-averaged complexity of MIS is now completely resolved. For larger , the problem remains open. This is however also true for the worst-case round complexity and our work closes the gap essentially to where it currently also is for the worst-case complexity of MIS.

As a side result, we also obtain the same lower bound for the worst-case MIS complexity in trees, i.e., any randomized MIS algorithm in trees requires time at least . For , this improves on a recent randomized lower bound for the same problem [balliu2021hideandseek]. The lower bound also almost matches the best known randomized MIS algorithm in trees, which has a worst-case complexity of  [ghaffari2016MIS], and we thus nearly resolve Open Problem 11.15 in the book of Barenboim and Elkin [barenboim15].

#### Maximal Matching.

A basic problem that is closely related to MIS is the problem of computing a maximal matching. We show that also for maximal matching, the randomized node-averaged complexity has an lower bound. In this case, the bound follows almost immediately from the KMW lower bound construction that has been used for the approximate maximum matching problem in [kuhn16_jacm]. Note, however, that a more apt comparison with MIS would be to consider the edge-averaged complexity of maximal matching. This is because when computing a matching, the output is on the edges rather than on the nodes.

Recall that for a graph , its line graph is the graph where we put one vertex for each edge of and we connect two of those vertices of if their corresponding edges in share an endpoint. Any maximal matching of a graph is simply an MIS of the line graph of . Consequently, the node-averaged complexity of this MIS problem is equal to the edge-averaged complexity of the maximal matching problem. We show that, unlike in the general case, the MIS problem on line graphs has an node-averaged complexity. Concretely, a close variant of Luby’s randomized classic algorithm [alon86, luby86, IsraelI86] provides a maximal matching algorithm with edge-averaged complexity .

We also provide results for the deterministic averaged complexity of maximal matching by giving an algorithm that achieves an edge-averaged complexity and an node-averaged complexity. The algorithm is obtained by adapting deterministic matching algorithms developed in [fischer2020improved, ahmadi2018distributed]. We note that the current best deterministic worst-case complexity of maximal matching is  [fischer2020improved] and any improvement on the edge-averaged complexity would thus most likely also improve the state of the art of the worst-case round complexity.

#### Ruling Set.

Faced with the lower bound on the node-averaged complexity of the MIS problem, it is natural to wonder if any relaxation of the problem admits a better node-averaged complexity. A natural relaxation of MIS that has been studied quite intensively is ruling sets. For positive integers and , an -ruling set is a set of nodes such that any two nodes in are at distance at least , and for every node not in , there is a node in at distance at most  [awerbuch89]. An MIS is therefore a -ruling set. We show that, perhaps surprisingly, even relaxing the MIS problem only slightly to the problem of computing a -ruling set completely avoids the lower bound. The -ruling set problem (i.e., the problem of computing an independent set such that any node not in has a node in within distance ) admits a randomized algorithm with a node-averaged complexity of . It is plausible that in many applications of maximal independent sets (e.g., if an MIS algorithm is used as a subroutine in a higher-level algorithm), one could also work with the weaker -ruling sets. Doing this might lead to an algorithm that is considerably faster from a node-averaged perspective.

We also study the node-averaged complexity of deterministic ruling set algorithms. We give algorithms with node-averaged complexity to compute -ruling sets and -ruling sets. Contrast this with the worst-case deterministic round complexity measure: The best known deterministic algorithm for computing a -ruling set has a round complexity of  [schneider2013symmetry] and it is known that any deterministic -ruling set algorithm requires at least rounds [balliu2021hideandseek].

#### Sinkless Orientation.

Finally, we investigate the node-averaged complexity of the sinkless orientation problem. While the worst-case time of deterministic algorithms for computing a sinkless orientation is  [brandt2016lower, chang16, ghaffari2017orinetation], we show that there is a deterministic distributed sinkless orientation algorithm with node-averaged round complexity .

### 1.2 Other Related Work

As discussed, the explicit study of node-averaged complexity was initiated by Feuilloley in [feuilloley2020long]. He in particular proved that the deterministic distributed node-averaged complexity of locally checkable labeling (LCL) problems on -node cycles is asymptotically equal to the worst-case deterministic complexity of the same problems. This implies that the deterministic -round lower bound for coloring cycles with colors and for MIS and maximal matching on cycles extends to the node-averaged and to the edge-averaged complexities of those problems. Subsequently, Barenboim and Tzur [barenboim2019distributed] studied the node-averaged complexity of different variants of the distributed vertex coloring problem. They in particular analyzed the problem as a function of the arboricity of the graph and gave various trade-offs between the achievable number of colors and node-averaged complexity. They are also the first to explicitly observe that the randomized node-averaged complexity of the -vertex coloring problem is .

One of the practical motivations to look at node-averaged complexity is to optimize the overall energy usage of a distributed system. If we assume that the energy used by a node in a distributed algorithm is proportional to the number of rounds in which the node participates, the node-averaged complexity can be used as a measure for the total energy spent by all nodes (normalized by the total number of nodes). In this context, Chatterjee, Gmyr, and Panduarangan [chatterjee2020sleeping] introduced the notion of node-averaged awake complexity. In their setting, nodes are allowed to only participate in a subset of the rounds of an algorithm’s execution and to sleep in the remaining rounds. The awake complexity of the node is then measured by the number of rounds in which the node is participating in the protocol (i.e., sending and/or receiving messages) and the node-averaged awake complexity denotes the average number of rounds in which nodes are awake during an algorithm. In [chatterjee2020sleeping], it is shown that there is a randomized a distributed MIS algorithm with node-averaged awake complexity . In [chatterjee2020sleeping], it is left as an open question whether it is also possible to obtain a distributed MIS algorithm with (regular) node-averaged complexity . As discussed above, we prove that this is not the case. The study of distributed node awake complexity for local graph problems was continued in a recent paper by Barenboim and Maimon [BarenboimM21]. This paper however studies the worst-case node awake complexity. They show that every decidable problem can be solved by distributed algorithm with node awake complexity and that for a natural family of problems, one can obtain a node awake complexity of . Notions that are closely related to the notion of node awake complexity of [chatterjee2020sleeping, BarenboimM21] have also been studied in the context of radio network (mostly from a worst-case complexity point of view), see e.g., [NakanoO00a, jurdzinski2002efficient, jurdzinski2002energy, KardasKP13, BenderKPY18, chang2019exponential, chang2020energy].

While there is not a lot of work that explicitly studies the node or edge-averaged complexity of distributed algorithms, many existing distributed algorithms implicitly provide averaged complexity bounds that are stronger than the respective worst-case complexity bounds. This for example leads to the following node-averaged complexities of the -coloring problem. The randomized algorithms of [luby1993removing, johansson99] are based on the following idea. The nodes pick random colors from the set of available colors (i.e., nodes need to pick a color that has not yet been assigned to a neighbor) and in both cases, it is shown that in each such coloring round, every uncolored node becomes colored with constant probability. This directly implies that the node-averaged complexity of those algorithms is . Further, in a recent paper, Ghaffari and Kuhn [GK21] give a deterministic distributed algorithm to compute a -coloring (and more generally a -list coloring) in rounds. The core of the algorithm is a method to color a constant fraction of the nodes of a graph in rounds, which directly implies that the deterministic node-averaged complexity of -coloring (and -list coloring) is . Note that an improvement to this bound for -list coloring would immediately also improve the best known deterministic worst-case complexity.

In addition, most modern randomized distributed graph algorithms are based on the idea of graph shattering, see, e.g., [barenboim2016locality, harris2016distributed, ghaffari2016MIS, fischer2017sublogarithmic, GS17, chang2018optimal, ghaffari2018derandomizing]. In a first shattering phase, one uses a randomized algorithm that succeeds at each node with probability at least such that (essentially), the graph induced by the unsolved nodes consists of only small components. Those components are then solved in a second post-shattering phase by using the best known deterministic algorithm to complete the given partial solution. Since the shattering phase solves the given problem for all, except at most a fraction of all the nodes, the complexity of the shattering phase is typically an upper bound on the node-averaged complexity. Usually, the round complexity of the shattering phase is expressed as a function of rather than as a function of and it is often also much faster than the deterministic post-shattering phase. The randomized sinkless orientation algorithm of  [GS17] for example implies that the randomized node-averaged complexity of computing a sinkless orientation is .

### 1.3 Organization of the Paper

In Section 2, we formally define the notions of node and edge averaged complexities that we use in this paper. In Section 3, present our upper bounds on the node and edge averaged complexities of maximal matching, -ruling sets, and sinkless orientation. For space reasons, some of the arguments in Section 3 are only sketched and the full proofs are deferred to Appendix B. In Section 4, we then provide our lower bounds and thus in particular our lower bound on the node-averaged complexity of MIS in general graphs and on the worst-case complexity of MIS in trees. Also in Section 4, some of the technical arguments appear in the appendix, in Appendix C.

## 2 Model and Definitions

We primarily focus on the model [linial1987LOCAL, peleg00], where a network is modeled as an undirected graph and typically, every node is equipped with a unique -bit identifier. Time is divided into synchronous rounds: In every round, every node of can send an arbitrary message to each neighbor and receive the messages sent to it by the neighbors. In the closely related model [peleg00], messages are required to consist of at most bits.

We study graph problems in which upon terminating, each node and/or each edge of a graph must compute some output. We consider the individual node and edge complexities of a distributed algorithm for a given graph problem. In particular, we are interested in the time required for individual nodes or edges to compute and commit to their outputs. For a node , we say that has completed its computation as soon as and all its incident edges have committed to their outputs and we say that an edge has completed its computation as soon as and both its nodes and have committed to their outputs. For example, in a vertex coloring algorithm, a node has completed its computation as soon as ’s color is fixed and an edge has completed its computation as soon as the colors of and are fixed. In an edge coloring algorithm, a node has completed its computation as soon as the colors of all incident edges have been determined and an edge has completed its computation as soon as the color of is fixed. Given a distributed algorithm on and a node , we define to be the number of rounds after which completes its computation when running the algorithm . Similarly, for an edge , we define to be the number of rounds after which completes its computation. Note that if is a randomized algorithm, then and

are random variables. In the following, for convenience, we generally define

and as random variables. In the case of a deterministic algorithm , the variables only take on one specific value (with probability ). We can now define the node and edge averaged complexities of a distributed algorithm .

###### Definition 1 (Node and Edge Averaged Complexities).

We define the following average complexity measures for a distributed algorithm on a family of graphs . We define the node-averaged complexity () and the edge-averaged complexity () as follows.

 AVGV(A) := maxG∈G1|V|⋅E⎡⎣∑v∈V(G)TGv(A)⎤⎦ = maxG∈G1|V|⋅∑v∈V(G)E[TGv(A)] AVGE(A) := maxG∈G1|E|⋅E⎡⎣∑e∈E(G)TGe(A)⎤⎦ = maxG∈G1|E|⋅∑e∈E(G)E[TGe(A)]

The respective complexity of a given graph problem is defined as the corresponding complexity, minimized over all algorithms that solve the given graph problem. We note that there are other complexity notions that are between Definition 1 and the standard worst-case complexity notion. We provide a brief discussion of this in Appendix A.

#### Computation vs. Termination Time:

After completing the computation, a node or edge might still be involved in communication to help other nodes determine their outputs. We say that a node has terminated once it has completed its computation and it also does not send any further messages. Similarly, an edge has terminated once it has completed its computation and there is no more messages sent over the edge. Instead of defining and as the number of rounds until or finishes its computation, we could also define it as the number of rounds until or terminates. In fact, in the literature about averaged complexity of distributed algorithms, both definitions have been used. The initial work by Feuilloley [feuilloley2020long] uses the definition that we use in the present paper. The subsequent work by Barenboim and Tzur [barenboim2019distributed] uses the stronger definition, where the complexity of a node/edge is defined as its termination time.

From a practical point of view, both notions of averaged complexity seem relevant. On the one hand, once a node has computed its output, it can continue with any further computation that is based on this output, even if the node still has to continue communicating to help other nodes. On the other hand, node-averaged termination time might be more natural especially if we aim to minimize the total energy spent in a distributed system. From a purely combinatorial / graph-theoretic point of view, when using the model, the node computation time definition has a particularly clean interpretation: An -round algorithm can always equivalently be seen as an algorithm, where every node first collects its complete -hop neighborhood and it then computes its output as a function of this information.111In the case of randomized algorithms, we have to assume that nodes choose all private random bits at the beginning before sending the first message. More generally, if the computation time of a node is , it means that node can compute its output as a function of its -hop neighborhood in the graph and the node-averaged complexity in the model is therefore equal to the average radius to which the nodes must know the graph in order to compute their outputs. We note that while for algorithms, a termination time bound is stronger than an equal computation time bound, the opposite is true for lower bounds, and the definition we use therefore makes our lower bounds stronger. Further, although we use computation time in our definition, for all our algorithms, it is not hard to see that they also provide the same bounds if using average termination time instead of average computation time. In all our algorithms, nodes also stop participating in the algorithm at most one round after knowing their outputs.

## 3 Algorithms

### 3.1 MIS and Ruling Set

#### Mis

It is well-known that Luby’s randomized MIS algorithm removes of the edges, per iteration [luby86]. Hence, if we define the MIS problem as a labeling problem, with binary indicator labels for vertices indicating those that are in the selected MIS, and we declare an edge terminated when the label of at least one of its two endpoint nodes is fixed, then Luby’s algorithm has edge-averaged complexity .222Note however that if we use the edge-averaged complexity as defined in Definition 1 and require both nodes of an edge to be decided, then this is not true. In fact, in this case, we prove a lower bound on the edge-averaged complexity in Theorem 16. In contrast, the node-averaged complexity of MIS had remained elusive for a number of years, and indeed it was mentioned as an open question throughout the literature whether an node-averaged complexity is possible, see e.g., [chatterjee2020sleeping]. The best known upper bounds are the trivial that follows from Luby’s worst-case round complexity analysis and a bound that follows from the work of Bar-Yehuda, Censor-Hillel, Ghaffari, and Schwartzman [bar2017distributed]. They give a randomized MIS algorithm for which they show that within this time, each node is removed with at least a constant probability (and indeed a better probability that exceeds , cf. Theorem 3.1 in [bar2017distributed]).

In Section 4, we show that the node-averaged complexity of MIS cannot be , and indeed the above bound is tight for small . Concretely, we prove that in a certain graph family with maximum degree and nodes, the node-averaged complexity of MIS is . That is, asymptotically the same bounds as the celebrated worst-case lower bound Kuhn, Moscibroda, and Wattenhofer [kuhn16_jacm] also hold for the node-averaged complexity.

#### Ruling Set.

Faced with the above strong lower bound for the node-averaged complexity of MIS, it is natural to ask whether any reasonable relaxation of the problem admits better node-averaged complexity. One of the most standard relaxations of MIS is ruling set. An -ruling set asks for a set of nodes such that any two nodes in have distance at least and any node not in has a node in within distance . Thus, MIS is equivalent to a -ruling set. Interestingly, we show that the seemingly minimal relaxation to -ruling set drops the node-averaged complexity to .

###### Theorem 2.

There is a randomized distributed algorithm in the model that computes a -ruling set and has node-averaged complexity .

###### Proof.

The algorithm works as follows. Each node independently marks itself with probability . A marked node joins the ruling set if and only if it has no marked higher priority neighbor . A neighbor of is higher priority if or if and . Nodes that are within distance of nodes in are deleted and we recurse on the remaining graph.

To prove the theorem, we show that per iteration, in expectation, a constant fraction of the nodes is deleted. Fix one iteration: We call a node good if , where we define . To show that in expectation a constant fraction of the nodes is deleted, we show that at least of the nodes are good, and moreover that each good node is deleted with a constant probability.

We next show that at least half of the nodes are good. Let be the set of bad nodes (i.e., the set of nodes that are not good). To upper bound , we do the following for each node . We charge the “badness” of to the neighbors of by assigning value to each neighbor of . Because is bad, we have for each neighbor of . Each node therefore gets charged less than by neighboring bad nodes. Because every bad node distributes a total charge of , this means that at most half of the nodes can be bad.

To finish the proof, we show that a good node is deleted with constant probability. Let be the (inclusive) -hop neighborhood of . We need to show that with constant probability at least one node in joins the ruling set . First, note that with constant probability at least one node in is marked. This is simply because is good. Now if is the set of marked nodes in , we let be the set of nodes in which have no higher priority neighbor in . Note that if , then also . Now, assume that and consider some node . Node enters the ruling set unless a higher priority neighbor outside is marked. There are at most such neighbors and because they must have degree , each of them is marked with probability . The probability that none of the higher priority neighbors of is marked is thus at least a constant. ∎

###### Theorem 3.

There is a deterministic distributed algorithm in the model that computes a -ruling set in any -node graph with maximum degree and has node-averaged complexity . The algorithm can also be modified to produce a -ruling set with the same node-averaged complexity.

###### Proof Sketch.

The proof is deferred to Appendix B. The basic idea is to apply iterations as follows: we use a simple dominating set algorithm that runs in rounds and computes a dominating set of size at most . All nodes outside point to and terminate, and we continue to the next iteration with only nodes . After iterations, at most nodes remain and we can compute an MIS of them in time using known algorithms [BEK15]. To get a -ruling set, we stop after only iterations, when the number of remaining nodes has dropped to , and we invoke a -round MIS algorithm [rozhonghaffari20] among the remaining nodes. ∎

### 3.2 Maximal Matching

###### Theorem 4.

There is a randomized distributed algorithm in the model that computes a maximal matching, which has edge-averaged complexity , and a worst-case round complexity of , with high probability.

###### Proof Sketch.

The result readily follows from a variant of Luby’s algorithm [luby86], adapted for maximal matching. The full proof is deferred to Appendix B. The algorithm is to mark each edge with probability and to then add marked edges to the maximal matching if no other incident edge is marked. We then remove the matched vertices and repeat in the remaining graph. We show that per iteration, in expectation a constant fraction of the edges get removed. This part is somewhat analogous to the analysis of Luby’s MIS algorithm [luby86]. In particular, we call each node , with degree , good if at least of its neighbors have degree at most , we show that at least of the edges are incident to good nodes, and each good node is matched with at least a positive constant probability. Hence, for at least of edges, each of them has a constant probability of being removed, which means in expectation a constant fraction of the edges gets removed. See Appendix B for the proof. ∎

We comment that the above statement is also implicit in the classical maximal matching result of Israeli and Itai [IsraelI86]333We thank an anonymous PODC’22 reviewer for bringing this to our attention..

###### Theorem 5.

There is a deterministic algorithm that computes a maximal matching and has edge-averaged complexity , node-averaged complexity , and worst-case complexity .

###### Proof.

Let us describe one iteration of the algorithm, which takes rounds and removes a constant fraction of the nodes. Let be the set of all edges in this iteration and consider the fractional matching where we assign to each edge the fractional value . Notice that this is indeed a valid fractional matching in the sense that for each node we have . Let us define for each edge a weight . Then, the aforementioned fractional matching has total weight Using the deterministic rounding algorithm of Ahmadi, Kuhn, and Oshman [ahmadi2018distributed] for weighted matchings, we can compute an integral matching in rounds whose weight is at least a constant factor of the fractional matching that we start with. That is, we get an integral matching with weight . We add all the edges of this matching to our output maximal matching, and we then remove all the edges incident to matched nodes, and continue to the next iteration. Notice that for each edge that gets added to the matching, we can say it killed the edges that share an endpoint with , and this way each edge is killed at most twice, once by each endpoint. Hence, for any integral matching with weight , removing its matched vertices removes at least edges from the graph. Therefore, with our rounding of the fractional matching that had weight , we have found an integral matching whose addition removes at least edges. Since per iteration we spend rounds and remove a fraction of the edges, we conclude that the edge-averaged complexity is .

Note that the term is not needed in each repetition and it suffices to have it only in the first repetition. More concretely, in the algorithm of [ahmadi2018distributed], this term changes to an term if we are already given a coloring of the nodes [ahmadi2018distributed] and we can compute that initially before all the repetitions in rounds using Linial’s classic algorithm[linial1987LOCAL]. Hence, after having spent this rounds at the start, each repetition takes rounds and removes fraction of the edges. This directly also shows that after rounds, all the edges are removed and thus the algorithm has terminated.

We next argue about the node-averaged complexity. If we repeat the algorithm for iterations, for a total round complexity of , then the total number of edges in the graph is reduced by a factor . Hence, the total number of remaining nodes (which must have degree at least one) is decreased by at least a factor . The reason is that if we had nodes before, we had at most edges, and after the reduction of edges by a factor, the number of remaining edges is at most and thus we have at most nodes of degree at least , i.e., at least nodes have degree and are thus removed. This means that in rounds, the number of nodes reduces by a factor and thus the node-averaged complexity is . ∎

### 3.3 Sinkless Orientation

The randomized sinkless orientation algorithm of Ghaffari and Su [GS17] already has node-averaged complexity .444This statement is directly correct for the algorithm that they provide for graphs with a minimum degree of at least . We believe that their extension to graphs with min-degree in can also be adapted to have this node-averaged complexity, basically by replacing their deterministic ruling set subroutine with a randomized one. We next show a deterministic algorithm that achieves an node-averaged complexity. Note that the worst-case complexity of this problem has an lower bound even in -regular graphs [brandt2016lower].

###### Theorem 6.

There is a deterministic distributed model algorithm to compute a sinkless orientation of any -node graph with minimum degree , with node-averaged complexity and worst-case complexity .

###### Proof Sketch.

We sketch the high-level idea for the special class of a high-girth graph of constant maximum degree. As a first step, we compute an MIS of for a sufficiently large constant and use this set to cluster the graph. This produces clusters of diameter such that the complete -hop neighborhood of any node is contained in its cluster. Note that this clustering can be computed in rounds. We then build a virtual graph between the cluster centers (i.e., the nodes in ) such that each node is connected to three other cluster centers through disjoint paths, where those paths form the edges of this virtual graph. This is possible if the graph has a minimum degree and the girth is at least for a sufficiently large constant . If we compute a sinkless orientation of this virtual graph, we can obtain a sinkless orientation of by orienting the paths according to the sinkless orientation on the virtual graph and orienting all other edges in towards the nodes that participate in the virtual graph. In this way, we essentially reduce the problem of computing a sinkless orientation on to computing a sinkless orientation on a virtual graph with and where communication between neighbors costs rounds on . All nodes that are not part of the virtual graph are decided after rounds. If the constant is chosen large enough, and we can keep the node-averaged complexity in . A full proof that also works for general graphs appears in Appendix B. ∎

## 4 Lower Bounds

In this section, we prove lower bounds for the MIS problem. More precisely, we show a lower bound on the average complexity of MIS. As a key technical tool, we modify the KMW construction [kuhn16_jacm] and obtain also a lower bound on the worst-case complexity of MIS, that holds already on trees. In fact, the original KMW lower bound applies to the problem of finding a good approximation for the minimum vertex cover, and through a chain of reductions, it is shown that this implies a lower bound for MIS on line graphs. We modify the KMW lower bound construction to directly provide a lower bound for MIS on trees, that holds even for randomized algorithms, and then we show that this result also implies a lower bound for the average complexity of the MIS problem. The technical aspects of our proof follow the simpler version of the proof of the KMW lower bound shown by Coupette and Lenzen [breezing].

### 4.1 Summary of the KMW Lower Bound

On a high level, the KMW lower bound is obtained by showing that there exists a family of graphs satisfying that:

• [noitemsep]

• There are two sets of nodes and that have the same view up to some distance ;

• Both and are independent sets;

• Every node of has neighbor in , and every node of has neighbors in , for some parameter ;

• is much larger than , and it contains the majority of the nodes of the graph.

In these graphs, since is an independent set that contains the majority of the nodes, if we want to cover all edges, we could just select all nodes except for the ones of , and obtain a very small solution to the vertex cover problem. But since nodes of and have the same view up to distance , then, they either spend more than rounds to understand in which set they are, or they must have the same probability of joining the vertex cover. This probability of being selected should be at least if we want every edge between and to be covered, implying that any algorithm running in at most rounds fails to produce a small solution, because in expectation nodes of get selected.

More in detail, in order to obtain a lower bound for vertex cover approximations, [kuhn16_jacm] follows this approach:
Define Cluster Tree Skeletons. These trees define some properties that each graph in the family should satisfy, and they are parametrized by a value . In general, given a cluster tree , there could be many graphs that satisfy the properties required by . The required properties are the following.

• [noitemsep]

• Each node of a cluster tree corresponds to an independent set of nodes in the graph .

• Each edge in this tree is labeled with two values, and , that dictate that nodes of the sets corresponding to and in the graph must be connected with a -biregular graph.

Show Isomorphism. Cluster trees are defined in a specific way that allows proving the following. Let be a cluster tree, and let be a graph derived from . It is shown that, there are two “special” nodes in that correspond to two “special” clusters and of , such that, if has girth at least , then nodes of and have isomorphic views, up to distance .
High-Girth Graphs. It is shown that it is indeed possible to build a graph that satisfies the requirements of and that has girth of at least . On a high level, this is shown by first constructing a low girth graph , and then taking a high-girth lift of it.
Obtain Lower Bounds. It is then shown that having the same view up to distance implies that, for a randomized algorithm running in graphs where IDs are assigned uniformly at random, nodes of and must have the same probability of joining a vertex cover, implying that many nodes of (which is shown to contain the majority of the nodes of ) must join the solution, while there exists a very small solution, not containing any node of .

For a more detailed summary of the KMW lower bound, we refer the reader to Section 1.1 of [breezing]—in this paper, Coupette and Lenzen provide a new and easier proof for the KMW lower bound.

### 4.2 Our Plan

In this work, we define our cluster tree skeletons in a very similar way as in Kuhn, Moscibroda, and Wattenhofer [kuhn16_jacm], but with a small and essential difference. Also, we slightly change the properties that a graph derived from a cluster tree should satisfy. On a high level, in our construction, every node of corresponds to a set of nodes of , which, differently from the original construction, is not independent anymore, with the only exception being the “special” set of nodes, which remains an independent set. We then show that, in each cluster of containing nodes that are neighbors of nodes of , no large independent set exists, and that our construction is such that at least half of the nodes of must join the independent set in any solution for the MIS problem. In this way, we obtain that any algorithm running in rounds, on the one hand, cannot produce a large independent set in , because it simply does not exist, and on the other hand, in order to guarantee the maximality constraint, it must produce a large independent set in . If nodes of and have the same view up to distance , the above is a contradicting requirement, preventing any -round algorithm to exist.

As already mentioned before, the technical aspect of our proof is heavily based on the simplified version of the proof of the KMW lower bound shown by Coupette and Lenzen [breezing]. In fact, we follow the same approach to prove the isomorphism between nodes of and . We then deviate from this proof in order to show that high-girth graphs satisfying the required properties exist: we need to make sure that, in each cluster of containing nodes that are neighbors of nodes of , no large independent set exists, while also making sure that the girth is at least . Actually, we do not achieve exactly this result: we first build a low girth graph satisfying the independence requirements, and then we make use of known graph-theoretical results [ALM02] to lift it into a graph that also satisfies the independence requirements, and such that each node has a large probability of not seeing any cycle within distance . Hence, the obtained graph may not have a high girth, but we prove that this weaker property is sufficient for our purposes.

### 4.3 Cluster Trees

A cluster tree skeleton () is a tree parametrized by an integer , that we use to compactly describe a family of graphs called cluster tree graphs. We now show how is defined. We will later present the family of graphs described by a cluster tree skeleton .

#### The Cluster Tree Skeleton.

Differently from [breezing], our cluster tree skeletons contain self loops, hence they are trees only when ignoring self loops. In fact, in our definition of , every node except one, will have a self loop. We denote with a directed edge labeled with some parameter . Given a parameter , the cluster tree skeleton is defined inductively as follows (see Figure 1 for an example).

• Base case: , where , . The node is called internal, while is a leaf. The node is the parent of , while has no parent.

• Inductive step: given , we define as follows. To each internal node of , we connect a leaf by adding the edges and . Moreover, we add the edge . Then, let be a leaf of that is connected to its parent through the edge . We connect to each such node a node for each by adding the edges and . Moreover, we add the edge . Node is now internal in , and it is the parent of all the added leaves .

#### The Graph Family.

Given , the graphs are all the ones satisfying the following:

• For each node , there is a set of nodes in ;

• Let be two nodes of , and let , be two set of nodes in that represent and , respectively. Also, let be the directed edge from to labeled with a parameter . Then, in , all nodes in must have exactly neighbors in .

• There are no additional edges in .

#### Observations on CTk and Gk.

Note that is defined such that, if two (different) nodes are connected through an edge , then there also exists some edge for some . This implies that, by fixing the size of a single set , the size of all the other sets, and the maximum degree of the graph, are determined by the labels of the edges of . However, the exact way to connect nodes of different sets is not prescribed by the structure of , and hence there is freedom in realizing those connections. We will later show that it is possible to construct such that most of the nodes do not see cycles within distance , while the maximum degree and the total number of nodes is not too large.

We now observe some properties on the structure of .

###### Observation 7 (Structure of CTk).

Each node of is either internal or leaf.

1. [noitemsep]

2. Every node has an edge for some . We define , that is, represents the exponent of the self loop of .

3. Each node , except for , has a parent , and has edges , , , for some that satisfies for internal nodes, and for leaves.

4. Let be an internal node, and let . Node has children that can be reached with edges for all .

5. Node has children that can be reached with edges for all .

We denote with , for , the node satisfying . We now label the edges of , with a label that depends on the label of the edge of from which comes from.

###### Definition 8 (Labeling of the Edges of Gk).

For every edge of , let , and . Let be the edge of connecting and . We mark the edge with if or . Similarly as in the case of , we may refer to this edge of with . If , we additionally mark each edge with the label .

###### Observation 9 (Number of Neighbors with a Specific Label in Gk).

Every node of that corresponds to an internal node of has exactly outgoing edges marked , for all . Every node of that corresponds to a leaf of gets exactly outgoing edges labeled , for exactly one .

###### Observation 10.

Let be a node of that corresponds to an internal node of . Let be an arbitrary edge incident to . Then corresponds to a node in if and only if .

### 4.4 Isomorphism Assuming No Short Cycles

We now prove that for all graphs , for all pair of nodes , if their radius- neighborhood does not contain any cycle, then and have the same radius- view.

###### Theorem 11 (k-hop Indistinguishability of Nodes Corresponding to c0 and c1).

Let be a cluster tree graph, and consider two nodes and that satisfy that and are trees. Then and have the same view up to distance .

The proof is deferred to Appendix C. We will follow the same approach of [breezing], that is, we provide an algorithm that defines an isomorphism, and then we prove that the algorithm is correct.

### 4.5 Lifting

We next show how to obtain a graph such that most nodes in see no short cycles and such that the graph induced by any of the clusters (except cluster ) has no large independent set. The high-level idea of this construction is as follows. We start from some base graph and then we show that can be computed as a random lift of . We first introduce the necessary graph-theoretic terminology.

For a graph and a graph , a covering map from to is a graph homomorphism such that for every node , the neighbors of are bijectively mapped to the neighbors of . That is, if has neighbors in , then has (different) neighbors in . For a graph , a graph for which a covering map from to exists is called a lift of . We will refer to the set of preimages of a node as the fiber of in . If is connected, then all fibers have the same cardinality, i.e., for every , there is the same number of nodes for which . Even if is not connected, a lift of is typically constructed such that all fibers have the same cardinality. If all fibers of a lift of have the same cardinality , then is called the order of the lift . Note that if we have a graph , then any lift of is also a graph in .

There are different ways to construct lifts with desirable properties for a given base graph. As discussed, our lower bound proof is based on the minimum vertex cover lower bound of [kuhn16_jacm]. In [kuhn16_jacm], the lift of a base graph (for the family of graphs used in [kuhn16_jacm]) is computed in a deterministic way such that has girth at least (i.e., such that any -hop neighborhood is tree-like). We cannot use the same deterministic lift construction because we also need to ensure that the induced graphs of the neighboring clusters of in have no large independent sets. We instead use a random lift construction that was described in [ALM02]. Given a base graph and an integer , we can obtain a random lift of order of as follows. For every , contains nodes and for every edge , the two fibers and are connected by a uniformly random perfect matching (which is chosen independently for different edges). We obtain the following, the proof of which is deferred to Appendix C.

###### Lemma 12.

Let be a base graph with maximum degree , let be a positive integer and let be a random lift of order of as described above. Then, for every integer and every node , the probability that is contained in a cycle of length at most is upper bounded by . Further, for a set of nodes , let be the set of nodes consisting of the union of all fibers of the nodes in . If is a complete graph of size , then for every integer , with probability at least , where is the independence number of the graph .

### 4.6 (Almost) High-Girth Graphs in Gk

We prove that there are graphs that belong to such that, for each node, the probability that it sees a cycle within distance is small. We first show how to construct a low girth graph , and then we use Lemma 12 to show that we can use it to construct graphs that satisfy the above.

#### A Base Graph.

We now construct a family of low girth graph , parametrized by an even integer , and we prove that these graphs belong to . The graph , as a function of , is defined as follows.

• Let be the hop distance of a node of from . Note that . For each , let , be the exponent of the label of its self loop. We define to be a set of nodes of size nodes, connected as follows. Start from disjoint cliques of size , . Note that is an even integer. For every , add to the edges of a perfect matching between the nodes of and . In this way, we obtain that every node in has exactly neighbors inside , and hence the requirements of the graph family defined in Section 4.3 are satisfied for the edge .

• contains nodes, and they form an independent set.

• Let be a node at distance from , and be a neighbor of at distance from . By construction, they are connected to each other with edges and , for some . Observe that has size and has size . We group the nodes of into groups of size , and the nodes of into groups of size . We obtain that and are both split into groups. Note that is an integer. We take a perfect matching between the groups, and for each matched pair of groups, we connect its nodes as . We obtain that every node in has neighbors in , and every node in has neighbors in , and hence the requirements of the graph family defined in Section 4.3 are satisfied for the edges and .

We summarize our result as follows. The proofs of the following statements appear in Section C.3

###### Lemma 13.

For all integers and even integers satisfying , there exists a graph , satisfying:

• [noitemsep]

• For all , let . The graph satisfies that .

• The total number of nodes is and the maximum degree is at most .

#### Applying the Lifting.

We apply Lemma 12 to the graph of Lemma 13 to produce an almost high-girth version of it.

###### Lemma 14.

For any integer , and even satisfying there is a graph satisfying that:

• [noitemsep]

• The number of nodes is and the maximum degree is at most .

• For all , for all , the probability that is contained in a cycle of length at most is upper bounded by .

• For all , let , and let . The graph satisfies that for every integer , with probability at least .

We now fix the parameters and to obtain a friendlier version of Lemma 14.

###### Corollary 15.

Let be an integer and and be an even integer satisfying . For infinitely many values of , there is a graph in satisfying the following:

• [noitemsep]

• The number of nodes is and the maximum degree is at most .

• The probability that a node is contained in a cycle of length at most is at most .

• For each satisfying that , for some constant .

### 4.7 Lower Bounds for MIS

In the remaining, we prove the following theorem.

###### Theorem 16.

The randomized node and edge averaged complexities of the MIS problem in general graphs and the randomized worst-case complexity of MIS on trees are both .

###### Proof.

Consider the family of graphs described in Corollary 15, for an arbitrary parameter , and . Let be the neighbors of , where is the one reached with the edge . Note that . Let . For each it holds that at most nodes can join the independent set, and these nodes can cover at most nodes of , where the last equality holds because the ratio between the sizes of and is . Hence, in total, nodes in cover at most nodes of . For , we obtain that at most nodes of are covered. Note that nodes of cannot cover other nodes of (since is an independent set), and hence if a node of is not covered by any node of then it must join the independent set. Hence, at least nodes of must join the independent set.

Every node of has probability at least to see, in rounds, a tree-like neighborhood, and in that case, by Theorem 11, they have the same view of those nodes of that also see a tree-like neighborhood within distance , and since those nodes must have probability at most of joining the independent set, then we obtain that, in expectation, a fraction