1 Introduction
Clustering has become an essential tool in dealing with large data sets. The goal of clustering data is to identify disjoint, dense regions such that the space between them is sparse. When working with graphs, this translates to partitioning the vertex set into clusters with relatively few edges between clusters such that the clusters satisfy a particular property. One can for example demand that the clusters have low diameter [Awerbuch85, AKPW95, Bartal96, MPX13], high conductance [GR99, KVV04, ST11, CHKM12, CHZ19, SW19, CS20], or low effective resistance diameter [AALG18]. In this paper, we focus on the low diameter decomposition and its connection to spanners. Low diameter decompositions are formally defined as follows.
Definition 1.
Let be a weighted graph. A probabilistic low diameter decomposition of is a partition of the vertex set into subsets , called clusters, such that

each cluster has strong diameter at most , i.e., for all ^{2}^{2}2For , we write for the graph induced by , i.e., .;

the probability that an edge is an intercluster edge is at most , i.e., for , the probability that and for is at most .
In an unweighted graph, another typical definition of the low diameter decomposition replaces the second condition with an upper bound on the number of intercluster edges [MPX13]. In this fashion, a probabilistic low diameter decomposition has intercluster edges in expectation.
Originally, low diameter decompositions were developed for distributed models, where they have been proven useful by reducing communication significantly in certain situations [Awerbuch85, AGLP89]. Later, they also have shown to be fruitful in other models; they have been applied in shortest path approximations [Cohen94], cut sparsifiers [LR99], and tree embeddings with low stretch [AKPW95, Bartal96, Bartal98].
The clustering technique used for computing low diameter decompositions has implicitly been used to develop sparse spanners [BS07, MPVX15, EN19] and synchronizers [Awerbuch85, APPS92]. The main idea is to create the clusters, and add some, but not all, of the intercluster edges. In a sense, the intercluster edges are sparsified. We formalize this concept as follows.
Definition 2.
Let be an unweighted graph. A sparsified low diameter decomposition of is a partition of the vertex set into subsets , called clusters, together with a set of edges such that

each cluster has strong diameter at most , i.e., for all ;

for every edge, one of its endpoints has an edge from into the cluster of the other endpoint, i.e., , we have either for some or for some ^{3}^{3}3For , we write for the cluster containing ..

.
Moreover, we say that a sparsified low diameter decomposition is treesupported if for each cluster we have a cluster center and a tree of height at most spanning the cluster. All these trees together are called the supportforest.
Our main result is a clustering algorithm that produces a sparsified low diameter decomposition.
Theorem 3.
There exists an algorithm, such that for each unweighted graph and parameter it outputs a treesupported sparsified low diameter decomposition, with in expectation. The algorithm runs in rounds in the CONGEST model, and in depth and work in the PRAM model.
An important feature of this result is that the bounds on the strong diameter and number of rounds are not probabilistic; they are independent of the random choices in the algorithm. We show two applications of this theorem: constructing spanners and constructing synchronizers.
Spanners
Given a graph , we say that is a spanner of stretch , if , for every . It is straightforward that a treesupported sparsified low diameter decomposition gives a spanner of size and stretch , for details we refer to Section 3.1. This gives us the following corollary.
Corollary 4.
There exists an algorithm, such that for each unweighted graph and parameter it outputs a spanner of stretch . The expected size of is at most . The algorithm runs in rounds in the CONGEST model, and in depth and work in the PRAM model.
Spanners themselves have been useful in computing approximate shortest paths [ABCP93, Cohen98], distance oracles and labeling schemes [TZ05, Peleg00], and routing [PU89]. A simple greedy algorithm [AGDJS93] gives a spanner of stretch and of size , which is an optimal tradeoff under the girth conjecture [FS16]. However, its fastest known implementation in the RAM model takes time [RZ04]. Halperin and Zwick [HZ93] gave a lineartime algorithm to construct spanners with an optimal tradeoff for unweighted graphs in the RAM model. However, this algorithm does not adapt well to distributed and parallel models of computation. This problem can be overcome by exploiting the aforementioned relation with sparsified low diameter decompositions. This was (implicitly) done by Baswana and Sen [BS07], who provide an algorithm that computes a spanner of stretch and of size in rounds. The state of the art is by Elkin and Neiman [EN19], which builds off [MPVX15], and is also based on low diameter decompositions. They provide an algorithm that with probability computes a spanner of expected size in rounds. Standard techniques of boosting the failure probability to something inverse polynomial (or ‘with high probability’) will require a logarithmic overhead. Alternatively, one can view the algorithm of Elkin and Neiman as an algorithm that outputs an spanner of expected size in rounds, such that with probability we have that .
Corollary 4
improves on the result of Elkin and Neiman by making the bounds on the stretch and the running time independent of the random choices in the algorithm. In particular, the algorithm of ElkinNeiman involves sampling vertex values from an exponential distribution. The exponential distribution introduces an (as we show) unnecessary amount of randomness; we demonstrate that the geometric distribution suffices. We replace the extra random bits the exponential distribution provides by a tiebreaking rule on the vertex
s, which we believe contributes to a more intuitive construction.Synchronizers
The second application of Theorem 3 is in constructing synchronizers in the CONGEST model. A synchronizer gives a procedure to run a synchronous algorithm on an asynchronous network. More precisely, the goal is to run any synchronous round message complexity CONGEST model algorithm on an asynchronous network with minimal time and message overhead. The first results on synchronizers are by Awerbuch [Awerbuch85], called synchronizers , , and . Subsequently, these results were improved by Awerbuch and Peleg [AP90], and Awerbuch et al. [APPS92], both having time and message complexity.
The synchronizer from [Awerbuch85] essentially consists of running a combination of the simple synchronizers and on a sparsified low diameter decompositions. In that case, the bound on the sparsified intercluster edges goes into the bound for the communication overhead of the synchronizer and the strong diameter bound goes into the bound for the time overhead of the synchronizer. Applying synchronizer on our clustering, we obtain the following result.
Theorem 5.
There exists an algorithm that, given , can run any synchronous round message complexity CONGEST model algorithm on an asynchronous CONGEST network. In expectation, the algorithm uses a total of messages. Provided that each message incurs a delay of at most one time unit, it takes rounds. The initialization phase takes time, using messages.
The running time claimed in this theorem is independent of the random choices in our algorithm, which is a direct result of Theorem 3. The previous sparsified low diameter decompositions (implicit in [EN19]) would provide similar bounds on the running time, but only with constant probability.
Low Diameter Decompositions
Perhaps unsurprisingly, we show that, with the right choice of parameters, our clustering algorithm can also compute unsparsified low diameter decompositions.
Theorem 6.
There exists an algorithm, such that for each graph with integer weights and parameter it outputs a low diameter decomposition, whose components are clusters of strong diameter of at most . Moreover, each edge is an intercluster edge with probability at most . The algorithm runs in rounds in the CONGEST model, and in depth and work in the PRAM model.
Similar to our spanner algorithm, the bounds on the running time and strong diameter hold independent of the random choices within the algorithm, as opposed to the previous state of the art [MPX13], where they only hold with high probability. In the low diameter decomposition as discussed above, the tradeoff between and diameter bound is essentially optimal [Bartal96].
Technical Overview
Our clustering algorithm follows an approach known as ball growing, related to the probabilistic partitions of [Bartal96, Bartal98]. In a sequential setting, this consists of picking a vertex, and repeatedly adding the neighbors of the current vertices to the ball. This stops when a certain bound is reached, such as a bound on the diameter of the ball or on the number of edges between the current ball and the remainder of the graph. The algorithm repeats this procedure with the remainder of the graph until this is empty. Miller, Peng, and Xu [MPX13] showed that this can be parallelized by letting each vertex create its own ball, but after a certain start time delay. In [MPX13], this has been done by sampling the delays from the exponential distribution, which leads to the aforementioned probabilistic diameter guarantee, as the exponential distribution can take infinitely high values – albeit with small probability. Furthermore, multiple authors (see e.g. [FG19, MPX13]) argue that one can round the sampled values from the exponential distribution for most of the algorithm and solely use that the fractional values of the sampled value induce a random permutation of the nodes. In this paper, we show that even fewer random bits are needed: we do not require a random permutation of the nodes. We demonstrate that a tiebreaking rule based on the s is enough.
We sample with a cappedoff geometric distribution, also used in [LS93, APPS92]. As opposed to the standard geometric distribution, the cappedoff version can only take a finite number of values. We believe this leads to a more direct proof of the spanner algorithm of [EN19] and of the decomposition algorithm of [MPX13]. Moreover, by making the sparsifier low diameter decomposition explicit, the application to synchronizers is almost immediate. In the remainder of this paper, we will not think of the sampled values as start time delays, but as the distance to some conceptual source , similar to the view in [MPX13]. The rest of the clustering algorithm then consists of computing a shortest path tree rooted at , which is easily calculated, both in the CONGEST and PRAM model. The clusters consists of the trees that remain when we disconnect the shortest path tree by removing the root .
As an anonymous reviewer pointed out, in the case of low diameter decompositions, the algorithm of Miller et al. [MPX13] admits an alternative approach. We can exploit the fact that the exponential delays are bounded with high probability. In case the delays exceed the bound, we could return a suboptimal clustering, without any central communication. As this only happens with low probability, it does not impact the expected number of intercluster edges. Note however, that the spanner construction of Elkin and Neiman [EN19] is not in this highprobability regime, therefore this straightforward approach would not work. We additionally believe that, beyond the result itself, our algorithm provides a more streamlined view.
2 The Clustering Algorithm
Let be a graph with integer weights . Let and be parameters, to be chosen according to the application of our algorithm. In the following, we provide an algorithm for computing a clustering, where the strong diameter of these clusters will be . In particular, we will show that each cluster is treesupported by a tree of height . The number of intercluster edges depends on both and , and can be bounded in two ways. The first approach, detailed in Section 3, shows we have a sparsified low diameter decomposition. Here, for each vertex we compute the expected number of edges in the sparsified set of intercluster edges, which gives a bound that does not depend on , but only on and . The second approach, detailed in Section 4, shows we have a probabilistic low diameter decomposition, by computing the probability that any edge is an intercluster edge.
2.1 Construction
First we conceptually add a node to the graph to form the graph . The node will function as an artificial source for a shortest path tree. Each vertex will have a distance to in depending on some random offset. Hereto, each vertex samples a value from the capped exponential distribution , defined by
This distribution corresponds to the model where we repeat at most Bernoulli trials, and measure how many trials occur (strictly) before the first success, or whether there is no success in the first trials. We check that GeomCap
is indeed a probability distribution on
:As the intuition suggests, GeomCap has a memoryless property as long as the cap is not reached, i.e., for . The proof is completely analogous to the proof of the memoryless property of the geometric distribution.
For each vertex , we conceptually add an edge to , with weight . We define , which is the minimal path length, for a path from to over . Now we have that the distance from to equals . We call this the level of , ranging from (closest to ) to (furthest from ). Moreover, we define to be the predecessor of on an arbitrary but fixed shortest path from to . Next, we construct a shortest path tree rooted at . When necessary, we do tiebreaking according to s: let be such that and for all satisfying . Then we connect to the shortest path tree using the edge . Moreover, we add to the cluster of and write for the corresponding cluster center. Intuitively, the clusters correspond to the connected components that remain when we remove the source from the created shortest path tree. The formal argument for this can be found in the proof of Lemma 7. The computation of this shortest path tree is modelspecific, we provide details in Section 2.3.
The algorithm outputs the shortest path tree , and for each , the center of its cluster center and its level. The knowledge of cluster centers immediately gives a clustering, where – by the remark above – each cluster has radius at most . In Section 3, we show how to construct a set of edges from the cluster centers and levels, such that is a spanner.
In the above, we only need an arbitrary ordering of the vertices. If we assume that each vertex has a unique identifier, , we can provide an alternative way of constructing the same shortest path tree. We construct a graph where , and compute a shortest path tree rooted at . This embeds the tiebreaking rule in the weight of the added edges, and thus in the distances. For generality – and suitable implementation in distributed models with limited bandwidth – the remainder of this paper relies on the former characterization using the tiebreaking rule.
2.2 TreeSupport
Next, we will show that the created clusters are treesupported by a tree of height . We have already chosen cluster centers, and we will show that we can identify trees rooted at these centers that satisfy the treesupport condition.
Lemma 7.
Each cluster is treesupported by a tree of height at most .
Proof.
Let be a vertex, which is part of the cluster centered at . We show that there is a path from to contained in this cluster, which has length at most .We proceed to show by induction on that there is a path from to contained in their cluster, which has length at most . The base case, , is trivial. Let be the predecessor of on some path from to of length . It suffices to show that is in the same cluster, then the result follows from the induction hypothesis. By definition of , we have that
By the triangle inequality we have . Combining this, we see . As the distance is minimal, by definition we have . Now suppose that is part of some cluster . Then we have and . However, this implies that . Hence by the tiebreaking for we have and thus . ∎
As an immediate corollary, we obtain a bound on the strong diameter of the clusters.
Corollary 8.
Each cluster has a strong diameter of .
2.3 Implementation and Running Time
For the RAM model, the implementation is straightforward and can be done in linear time [Thorup99]. The implementation in distributed and parallel models requires a little more attention. For both models, the computational aspect is very similar to prior work [MPX13, EN19].
2.3.1 Distributed Model
The algorithm as presented, can be implemented efficiently both in the LOCAL and in the CONGEST model. It runs in rounds as follows. In the initialization phase, each vertex samples its value and sets its initial distance to the conceptual vertex as . In the first round of communication, sends the tuple to its neighbors. In each round, updates its distance to according to received messages. It then broadcasts the tuple of its updated distance and the corresponding to the first vertex on the path from to . Note that at the end of the algorithm, each node knows its own level and cluster center, and the level and cluster center of each of its neighbors.
When the algorithm is applied with (if , we can simply return the connected components of the graph as clusters), we maintain a bound on the message size of , so there are no digit precision consideration for the CONGEST model. Moreover, each vertex has distance at most to , the algorithm terminates within rounds.^{4}^{4}4The ‘’ appears, as nodes in the lowest level have distance to the source .
2.3.2 PRAM Model
The implementation in the PRAM model is slightly different to the CONGEST model. Instead of broadcasts by each vertex in each round, a vertex updates its distance only once: either after one of its neighbors updated its distance, or after time it sets its distance to . The total required depth differs on the exact model of parallelism, it is in the CRCW model of parallel computation. To show this, we follow the general lines of [KS97], but we have to be careful: during the shortest path computation, we might need to apply our tiebreaking rule, i.e., finding the minimum among all options. Note that in the PRAM model, we can assume without loss of generality that the s are labeled to in the adjacency list representation. Finding the minimum can be done with high probability in depth and work, as we can sort a list of integers between and in depth and work [GMV91]. If we exceed the depth bound, we stop and output the trivial clustering consisting of singletons. This clustering clearly satisfies the diameter bound, and as we only output it with low probability, it has no effect on the expected number of intercluster edges. So we can conclude that the additional sorting overhead for the tiebreaking is a factor . The algorithm has total work , where the contribution of comes from sampling from the geometric distribution. In this paper, this factor vanishes as we always have such that .
3 Constructing a Sparsified Low Diameter Decomposition
In this section, we show how the clustering algorithm leads to a sparsified low diameter decomposition. The procedure is as follows: given , we set , , and compute a clustering using the algorithm of Section 2. We denote for the sparsified set of intercluster edges. Intuitively, for each vertex , we add an edge to for each cluster in which we have a neighbor on one level below, or a neighbor on the same level as with the of the cluster center smaller than the of center of the cluster of .
Lemma 9.
There exists a set of edges of expected size , such that for every edge, one of its endpoints has an edge from into the cluster of the other endpoint.
Proof.
We define, , where consists of the following edges
First, we show that satisfies the property stated in the lemma, then we consider its size. Let , without loss of generality, we assume , and in case of equality we assume . We will show that there is an edge to the cluster of . First of all, notice that by the triangle inequality. If , then . Because of minimality of , we have , and thus by definition of . If , we have . Moreover, we have . So again it follows that by definition of .
Now, we turn to the expected size of . By linearity of expectation, we have . We will show that for each the expected size of is at most . For each , we potentially add an edge to . First, we calculate the probability that at least such vertices
contribute an edge. Hereto, we look at the random variables
. According to these random variables, we order all vertices: , such that for we satisfy one of the following properties
;

and .
We calculate , i.e., the probability that . We do this by conditioning on . We observe
So we calculate for . By definition of , this can only hold if either ’s closest neighbors in the clusters centered at and are on the same level (in which case we have , as and ) or the neighbor from the cluster centered at is at a level lower and . Note that the level of the closest neighbor in the cluster of or corresponds to the distance or respectively. As the allowed distances depend on the of , we split the vertices according to :
If , then we know . If both and are in , we must have . So for every , we are looking at
where the last equality holds as we only rewrote the condition using the order of the ’s. We fill in the definition of and use that the probability of the event we are looking at is independent of :
When , this equals , by the memoryless property of the geometric distribution. To distinguish this, we partition the vertices with into two set
Now for , we obtain by the same reasoning
As before, when , this equals , by the memoryless property of the geometric distribution. And again, we partition into two sets
If we define and , we can summarize our results as
Next, we split the expected value of depending on and :
We bound the first summand with , independent of
. Hereto, we observe that for any nonnegative discrete random variable
we haveUsing this, we obtain
where the last equality holds by definition of . For the second summand, we look at all simultaneously.
where the last equality holds by the law of total probability. We bound this as follows
where the last equality holds by definition of . In total, this gives us . ∎
Theorem 3 (Restated).
There exists an algorithm, such that for each unweighted graph and parameter it outputs a treesupported sparsified low diameter decomposition, with in expectation. The algorithm runs in rounds in the CONGEST model, and in depth and work in the PRAM model.
3.1 Constructing a Spanner
Now, we can construct a spanner from the tree supported low diameter decomposition in the following manner. Let denote the support forest, and let denote the set as given in Lemma 9. We define the spanner . As any forest has at most edges, the expected size of is at most . Actually, one could also show that in our construction of , but this would not impact the asymptotic size bound. To show that is a spanner, we show its of limited stretch.
Lemma 10.
is a spanner of stretch .
Proof.
We will show that for every edge , there exists a path from to in of length at most . Consequently we have that for every , hence is a spanner of stretch .
Let . By definition of , one of the endpoints has an edge in into the cluster of the other endpoint. Without loss of generality, let there be an edge with in the cluster of . By Corollary 8, there is path of length at most from to , so in total we have a path of length at most from to to . ∎
Corollary 4 (Restated).
There exists an algorithm, such that for each unweighted graph and parameter it outputs a spanner of stretch . The expected size of is at most . The algorithm runs in rounds in the CONGEST model, and in depth and work in the PRAM model.
3.2 Constructing a Synchronizer
Suppose we are given a synchronous CONGEST model algorithm, but we want to run it on an asynchronous CONGEST network. That is, the messages sent in the network can now have arbitrary delays and, in an eventdriven manner, nodes become active each time they receive a message. For the purpose of analyzing the time complexity of the algorithm, it is often assumed that the delay is at most one unit of time, however, the algorithm should behave correctly under any finite delays. In this situation, a node should start simulating its next (synchronous) round when it has received all the messages from the previous round from its neighbors. The problem is that it cannot tell the difference between the situation if a message from a particular neighbor has not arrived yet or if this same neighbor is not sending any message in that round at all. We say that a node is safe if all the messages it has sent have arrived at their destination. In order to determine whether all neighboring nodes are safe, additional messages are sent. The procedure governing these additional messages is called the synchronizer. There are two things to take into account when analyzing synchronizers. First, the time overhead: how much time is needed to send the additional messages for each synchronous round. Second, the messagecomplexity (or communication) overhead: how many additional messages are sent. For more details on synchronizers see e.g. [Lynch96, KS11].
Let us first consider two simple synchronizers: synchronizer and synchronizer , see [Awerbuch85]. In synchronizer , when a node receives a message from a neighbor, it sends back an ‘acknowledge’ message. When a node has received acknowledge messages for all its sent messages, it marks itself safe and reports this to all its neighbors. The synchronizer uses, for each simulated synchronous round, additional time, and messages.
Synchronizer will produce a different trade off between time and message overhead. It uses an initialization phase in which it creates a rooted spanning tree, where the root is declared the leader. Now after sending messages of a certain synchronous round, again nodes that receive messages reply with an acknowledge message to each. Nodes that are safe, and whose children in the constructed tree are also safe communicate this to their parent in the tree. Eventually the leader will learn that the whole graph is safe, and will broadcast this along the spanning tree. Synchronizer uses time and messages per synchronous round.
Now we are ready to consider a little more involved example, called synchronizer , see [Awerbuch85]. This synchronizer makes use of clustering, where within each cluster synchronizer is used and between clusters synchronizer is used. In the LOCAL model, the cluster centers can simply select a communication link for each neighboring cluster to communicate individually with the neighboring cluster centers [Awerbuch85]. However, in the CONGEST model, communicating information about neighboring clusters to the cluster center might lead to congestion problems. Using a slightly more careful analysis, the procedure can be adapted to the CONGEST model.
Lemma 11 (Implicit in [Awerbuch85]).
Given a round synchronous CONGEST model algorithm for constructing a sparsified low diameter decomposition, any synchronous round message complexity CONGEST model algorithm can be run on an asynchronous CONGEST network with a total of messages, and, provided that each message incurs a delay of at most one time unit, in time . The initialization phase takes time, using messages.
For a sketch of the algorithm, we refer to Appendix A. If we plug in our clustering, we obtain the following theorem.
Theorem 5 (Restated).
There exists an algorithm that, given , can run any synchronous round message complexity CONGEST model algorithm on an asynchronous CONGEST network. In expectation, the algorithm uses a total of messages. Provided that each message incurs a delay of at most one time unit, it takes rounds. The initialization phase takes time, using messages.
4 Constructing a Low Diameter Decomposition
In this section, we show that for an integer weighted graph the computed clustering is a probabilistic low diameter decomposition. To be precise, if we set , and we obtain a low diameter decomposition. By Corollary 8, we know that each of the clusters has a strong diameter of at most . Now, we show that the probability that an edge is an intercluster edge is at most . We use a general proof structure from [MPX13], but make it more streamlined; we avoid an artificially constructed ‘midpoint’ on the edge . Further, our proof borrows the idea of conditioning on the event from Xu [Xu17], which we adapt to our situation.
Lemma 12.
For , the probability that and belong to different clusters is at most .
Proof.
Suppose is an intercluster edge. Without loss of generality, we assume . By the triangle inequality, we have , hence we have . Using this, we can upper bound the probability that an edge is an intercluster edge by the probability that this inequality holds. Note that we can assume , otherwise the statement is trivially true.
We want to condition on the cluster center that satisfied , and on the cluster center that minimizes . Moreover, we ask that these cluster respect the tiebreaking rule, i.e., both have minimal among all centers with equal distance. Finally, we condition on the value of , which we set to . We call this event , which we formally define to hold when the following four conditions are satisfied

;

for we either have , or we have and ;

;

for we either have , or we have and .
Now, we condition on and use the law of total probability:
For simplicity, we omit the bounds for the sum over , which is a finite sum as we always have . The factor two appears because the event assumes , hence we gain a factor two by symmetry of and . We look at the first probability more closely. We can loosen some of the event’s restrictions, and just maintain , as the event we examine is independent of conditions 2, 3, and 4 of the event . We obtain