DeepAI
Log In Sign Up

An Improved Random Shift Algorithm for Spanners and Low Diameter Decompositions

11/17/2021
by   Sebastian Forster, et al.
0

Spanners have been shown to be a powerful tool in graph algorithms. Many spanner constructions use a certain type of clustering at their core, where each cluster has small diameter and there are relatively few spanner edges between clusters. In this paper, we provide a clustering algorithm that, given k≥ 2, can be used to compute a spanner of stretch 2k-1 and expected size O(n^1+1/k) in k rounds in the CONGEST model. This improves upon the state of the art (by Elkin, and Neiman [TALG'19]) by making the bounds on both running time and stretch independent of the random choices of the algorithm, whereas they only hold with high probability in previous results. Spanners are used in certain synchronizers, thus our improvement directly carries over to such synchronizers. Furthermore, for keeping the total number of inter-cluster edges small in low diameter decompositions, our clustering algorithm provides the following guarantees. Given β∈ (0,1], we compute a low diameter decomposition with diameter bound O(log n/β) such that each edge e∈ E is an inter-cluster edge with probability at most β· w(e) in O(log n/β) rounds in the CONGEST model. Again, this improves upon the state of the art (by Miller, Peng, and Xu [SPAA'13]) by making the bounds on both running time and diameter independent of the random choices of the algorithm, whereas they only hold with high probability in previous results.

READ FULL TEXT VIEW PDF

page 1

page 2

page 3

page 4

04/18/2022

Deterministic Low-Diameter Decompositions for Weighted Graphs and Distributed and Parallel Applications

This paper presents new deterministic and distributed low-diameter decom...
05/21/2019

Parallel Reachability in Almost Linear Work and Square Root Depth

In this paper we provide a parallel algorithm that given any n-node m-ed...
07/02/2021

An Improved Fixed-Parameter Algorithm for 2-Club Cluster Edge Deletion

A 2-club is a graph of diameter at most two. In the decision version of ...
04/08/2019

Distributed Edge Connectivity in Sublinear Time

We present the first sublinear-time algorithm for a distributed message-...
09/19/2019

Low Diameter Graph Decompositions by Approximate Distance Computation

In many models for large-scale computation, decomposition of the problem...
01/28/2021

A Spectral Approach to Polytope Diameter

We prove upper bounds on the graph diameters of polytopes in two setting...
07/06/2009

Apply Local Clustering Method to Improve the Running Speed of Ant Colony Optimization

Ant Colony Optimization (ACO) has time complexity O(t*m*N*N), and its ty...

1 Introduction

Clustering has become an essential tool in dealing with large data sets. The goal of clustering data is to identify disjoint, dense regions such that the space between them is sparse. When working with graphs, this translates to partitioning the vertex set into clusters with relatively few edges between clusters such that the clusters satisfy a particular property. One can for example demand that the clusters have low diameter [Awerbuch85, AKPW95, Bartal96, MPX13], high conductance [GR99, KVV04, ST11, CHKM12, CHZ19, SW19, CS20], or low effective resistance diameter [AALG18]. In this paper, we focus on the low diameter decomposition and its connection to spanners. Low diameter decompositions are formally defined as follows.

Definition 1.

Let be a weighted graph. A probabilistic -low diameter decomposition of is a partition of the vertex set into subsets , called clusters, such that

  • each cluster has strong diameter at most , i.e., for all 222For , we write for the graph induced by , i.e., .;

  • the probability that an edge is an inter-cluster edge is at most , i.e., for , the probability that and for is at most .

In an unweighted graph, another typical definition of the low diameter decomposition replaces the second condition with an upper bound on the number of inter-cluster edges [MPX13]. In this fashion, a probabilistic low diameter decomposition has inter-cluster edges in expectation.

Originally, low diameter decompositions were developed for distributed models, where they have been proven useful by reducing communication significantly in certain situations [Awerbuch85, AGLP89]. Later, they also have shown to be fruitful in other models; they have been applied in shortest path approximations [Cohen94], cut sparsifiers [LR99], and tree embeddings with low stretch [AKPW95, Bartal96, Bartal98].

The clustering technique used for computing low diameter decompositions has implicitly been used to develop sparse spanners [BS07, MPVX15, EN19] and synchronizers [Awerbuch85, APPS92]. The main idea is to create the clusters, and add some, but not all, of the inter-cluster edges. In a sense, the inter-cluster edges are sparsified. We formalize this concept as follows.

Definition 2.

Let be an unweighted graph. A sparsified -low diameter decomposition of is a partition of the vertex set into subsets , called clusters, together with a set of edges such that

  • each cluster has strong diameter at most , i.e., for all ;

  • for every edge, one of its endpoints has an edge from into the cluster of the other endpoint, i.e., , we have either for some or for some 333For , we write for the cluster containing ..

  • .

Moreover, we say that a sparsified -low diameter decomposition is tree-supported if for each cluster we have a cluster center and a tree of height at most spanning the cluster. All these trees together are called the support-forest.

Our main result is a clustering algorithm that produces a sparsified low diameter decomposition.

Theorem 3.

There exists an algorithm, such that for each unweighted graph and parameter it outputs a tree-supported sparsified -low diameter decomposition, with in expectation. The algorithm runs in rounds in the CONGEST model, and in depth and work in the PRAM model.

An important feature of this result is that the bounds on the strong diameter and number of rounds are not probabilistic; they are independent of the random choices in the algorithm. We show two applications of this theorem: constructing spanners and constructing synchronizers.

Spanners

Given a graph , we say that is a spanner of stretch , if , for every . It is straightforward that a tree-supported sparsified -low diameter decomposition gives a spanner of size and stretch , for details we refer to Section 3.1. This gives us the following corollary.

Corollary 4.

There exists an algorithm, such that for each unweighted graph and parameter it outputs a spanner of stretch . The expected size of is at most . The algorithm runs in rounds in the CONGEST model, and in depth and work in the PRAM model.

Spanners themselves have been useful in computing approximate shortest paths [ABCP93, Cohen98], distance oracles and labeling schemes [TZ05, Peleg00], and routing [PU89]. A simple greedy algorithm [AGDJS93] gives a spanner of stretch and of size , which is an optimal trade-off under the girth conjecture [FS16]. However, its fastest known implementation in the RAM model takes time [RZ04]. Halperin and Zwick [HZ93] gave a linear-time algorithm to construct spanners with an optimal trade-off for unweighted graphs in the RAM model. However, this algorithm does not adapt well to distributed and parallel models of computation. This problem can be overcome by exploiting the aforementioned relation with sparsified low diameter decompositions. This was (implicitly) done by Baswana and Sen [BS07], who provide an algorithm that computes a spanner of stretch and of size in rounds. The state of the art is by Elkin and Neiman [EN19], which builds off [MPVX15], and is also based on low diameter decompositions. They provide an algorithm that with probability computes a -spanner of expected size in rounds. Standard techniques of boosting the failure probability to something inverse polynomial (or ‘with high probability’) will require a logarithmic overhead. Alternatively, one can view the algorithm of Elkin and Neiman as an algorithm that outputs an -spanner of expected size in rounds, such that with probability we have that .

Corollary 4

improves on the result of Elkin and Neiman by making the bounds on the stretch and the running time independent of the random choices in the algorithm. In particular, the algorithm of Elkin-Neiman involves sampling vertex values from an exponential distribution. The exponential distribution introduces an (as we show) unnecessary amount of randomness; we demonstrate that the geometric distribution suffices. We replace the extra random bits the exponential distribution provides by a tie-breaking rule on the vertex

s, which we believe contributes to a more intuitive construction.

Synchronizers

The second application of Theorem 3 is in constructing synchronizers in the CONGEST model. A synchronizer gives a procedure to run a synchronous algorithm on an asynchronous network. More precisely, the goal is to run any synchronous -round -message complexity CONGEST model algorithm on an asynchronous network with minimal time and message overhead. The first results on synchronizers are by Awerbuch [Awerbuch85], called synchronizers , , and . Subsequently, these results were improved by Awerbuch and Peleg [AP90], and Awerbuch et al. [APPS92], both having time and message complexity.

The synchronizer from [Awerbuch85] essentially consists of running a combination of the simple synchronizers and on a sparsified -low diameter decompositions. In that case, the bound on the sparsified inter-cluster edges goes into the bound for the communication overhead of the synchronizer and the strong diameter bound goes into the bound for the time overhead of the synchronizer. Applying synchronizer on our clustering, we obtain the following result.

Theorem 5.

There exists an algorithm that, given , can run any synchronous -round -message complexity CONGEST model algorithm on an asynchronous CONGEST network. In expectation, the algorithm uses a total of messages. Provided that each message incurs a delay of at most one time unit, it takes rounds. The initialization phase takes time, using messages.

The running time claimed in this theorem is independent of the random choices in our algorithm, which is a direct result of Theorem 3. The previous sparsified low diameter decompositions (implicit in [EN19]) would provide similar bounds on the running time, but only with constant probability.

Low Diameter Decompositions

Perhaps unsurprisingly, we show that, with the right choice of parameters, our clustering algorithm can also compute unsparsified low diameter decompositions.

Theorem 6.

There exists an algorithm, such that for each graph with integer weights and parameter it outputs a low diameter decomposition, whose components are clusters of strong diameter of at most . Moreover, each edge is an inter-cluster edge with probability at most . The algorithm runs in rounds in the CONGEST model, and in depth and work in the PRAM model.

Similar to our spanner algorithm, the bounds on the running time and strong diameter hold independent of the random choices within the algorithm, as opposed to the previous state of the art [MPX13], where they only hold with high probability. In the low diameter decomposition as discussed above, the trade-off between and diameter bound is essentially optimal [Bartal96].

Technical Overview

Our clustering algorithm follows an approach known as ball growing, related to the probabilistic partitions of [Bartal96, Bartal98]. In a sequential setting, this consists of picking a vertex, and repeatedly adding the neighbors of the current vertices to the ball. This stops when a certain bound is reached, such as a bound on the diameter of the ball or on the number of edges between the current ball and the remainder of the graph. The algorithm repeats this procedure with the remainder of the graph until this is empty. Miller, Peng, and Xu [MPX13] showed that this can be parallelized by letting each vertex create its own ball, but after a certain start time delay. In [MPX13], this has been done by sampling the delays from the exponential distribution, which leads to the aforementioned probabilistic diameter guarantee, as the exponential distribution can take infinitely high values – albeit with small probability. Furthermore, multiple authors (see e.g. [FG19, MPX13]) argue that one can round the sampled values from the exponential distribution for most of the algorithm and solely use that the fractional values of the sampled value induce a random permutation of the nodes. In this paper, we show that even fewer random bits are needed: we do not require a random permutation of the nodes. We demonstrate that a tie-breaking rule based on the s is enough.

We sample with a capped-off geometric distribution, also used in [LS93, APPS92]. As opposed to the standard geometric distribution, the capped-off version can only take a finite number of values. We believe this leads to a more direct proof of the spanner algorithm of [EN19] and of the decomposition algorithm of [MPX13]. Moreover, by making the sparsifier low diameter decomposition explicit, the application to synchronizers is almost immediate. In the remainder of this paper, we will not think of the sampled values as start time delays, but as the distance to some conceptual source , similar to the view in [MPX13]. The rest of the clustering algorithm then consists of computing a shortest path tree rooted at , which is easily calculated, both in the CONGEST and PRAM model. The clusters consists of the trees that remain when we disconnect the shortest path tree by removing the root .

As an anonymous reviewer pointed out, in the case of low diameter decompositions, the algorithm of Miller et al. [MPX13] admits an alternative approach. We can exploit the fact that the exponential delays are bounded with high probability. In case the delays exceed the bound, we could return a sub-optimal clustering, without any central communication. As this only happens with low probability, it does not impact the expected number of inter-cluster edges. Note however, that the spanner construction of Elkin and Neiman [EN19] is not in this high-probability regime, therefore this straightforward approach would not work. We additionally believe that, beyond the result itself, our algorithm provides a more streamlined view.

2 The Clustering Algorithm

Let be a graph with integer weights . Let and be parameters, to be chosen according to the application of our algorithm. In the following, we provide an algorithm for computing a clustering, where the strong diameter of these clusters will be . In particular, we will show that each cluster is tree-supported by a tree of height . The number of inter-cluster edges depends on both and , and can be bounded in two ways. The first approach, detailed in Section 3, shows we have a sparsified low diameter decomposition. Here, for each vertex we compute the expected number of edges in the sparsified set of inter-cluster edges, which gives a bound that does not depend on , but only on and . The second approach, detailed in Section 4, shows we have a probabilistic low diameter decomposition, by computing the probability that any edge is an inter-cluster edge.

2.1 Construction

First we conceptually add a node to the graph to form the graph . The node will function as an artificial source for a shortest path tree. Each vertex will have a distance to in depending on some random offset. Hereto, each vertex samples a value from the capped exponential distribution , defined by

This distribution corresponds to the model where we repeat at most Bernoulli trials, and measure how many trials occur (strictly) before the first success, or whether there is no success in the first trials. We check that GeomCap

is indeed a probability distribution on

:

As the intuition suggests, GeomCap has a memoryless property as long as the cap is not reached, i.e., for . The proof is completely analogous to the proof of the memoryless property of the geometric distribution.

For each vertex , we conceptually add an edge to , with weight . We define , which is the minimal path length, for a path from to over . Now we have that the distance from to equals . We call this the level of , ranging from (closest to ) to (furthest from ). Moreover, we define to be the predecessor of on an arbitrary but fixed shortest path from to . Next, we construct a shortest path tree rooted at . When necessary, we do tie-breaking according to s: let be such that and for all satisfying . Then we connect to the shortest path tree using the edge . Moreover, we add to the cluster of and write for the corresponding cluster center. Intuitively, the clusters correspond to the connected components that remain when we remove the source from the created shortest path tree. The formal argument for this can be found in the proof of Lemma 7. The computation of this shortest path tree is model-specific, we provide details in Section 2.3.

The algorithm outputs the shortest path tree , and for each , the center of its cluster center and its level. The knowledge of cluster centers immediately gives a clustering, where – by the remark above – each cluster has radius at most . In Section 3, we show how to construct a set of edges from the cluster centers and levels, such that is a spanner.

In the above, we only need an arbitrary ordering of the vertices. If we assume that each vertex has a unique identifier, , we can provide an alternative way of constructing the same shortest path tree. We construct a graph where , and compute a shortest path tree rooted at . This embeds the tie-breaking rule in the weight of the added edges, and thus in the distances. For generality – and suitable implementation in distributed models with limited bandwidth – the remainder of this paper relies on the former characterization using the tie-breaking rule.

2.2 Tree-Support

Next, we will show that the created clusters are tree-supported by a tree of height . We have already chosen cluster centers, and we will show that we can identify trees rooted at these centers that satisfy the tree-support condition.

Lemma 7.

Each cluster is tree-supported by a tree of height at most .

Proof.

Let be a vertex, which is part of the cluster centered at . We show that there is a path from to contained in this cluster, which has length at most .We proceed to show by induction on that there is a path from to contained in their cluster, which has length at most . The base case, , is trivial. Let be the predecessor of on some path from to of length . It suffices to show that is in the same cluster, then the result follows from the induction hypothesis. By definition of , we have that

By the triangle inequality we have . Combining this, we see . As the distance is minimal, by definition we have . Now suppose that is part of some cluster . Then we have and . However, this implies that . Hence by the tie-breaking for we have and thus . ∎

As an immediate corollary, we obtain a bound on the strong diameter of the clusters.

Corollary 8.

Each cluster has a strong diameter of .

2.3 Implementation and Running Time

For the RAM model, the implementation is straightforward and can be done in linear time [Thorup99]. The implementation in distributed and parallel models requires a little more attention. For both models, the computational aspect is very similar to prior work [MPX13, EN19].

2.3.1 Distributed Model

The algorithm as presented, can be implemented efficiently both in the LOCAL and in the CONGEST model. It runs in rounds as follows. In the initialization phase, each vertex  samples its value  and sets its initial distance to the conceptual vertex  as . In the first round of communication, sends the tuple to its neighbors. In each round, updates its distance to according to received messages. It then broadcasts the tuple of its updated distance and the corresponding to the first vertex on the path from to . Note that at the end of the algorithm, each node knows its own level and cluster center, and the level and cluster center of each of its neighbors.

When the algorithm is applied with (if , we can simply return the connected components of the graph as clusters), we maintain a bound on the message size of , so there are no digit precision consideration for the CONGEST model. Moreover, each vertex has distance at most to , the algorithm terminates within rounds.444The ‘’ appears, as nodes in the lowest level have distance to the source .

2.3.2 PRAM Model

The implementation in the PRAM model is slightly different to the CONGEST model. Instead of broadcasts by each vertex in each round, a vertex updates its distance only once: either after one of its neighbors updated its distance, or after time it sets its distance to . The total required depth differs on the exact model of parallelism, it is in the CRCW model of parallel computation. To show this, we follow the general lines of [KS97], but we have to be careful: during the shortest path computation, we might need to apply our tie-breaking rule, i.e., finding the minimum among all options. Note that in the PRAM model, we can assume without loss of generality that the s are labeled to in the adjacency list representation. Finding the minimum can be done with high probability in depth and work, as we can sort a list of integers between and in depth and work [GMV91]. If we exceed the depth bound, we stop and output the trivial clustering consisting of singletons. This clustering clearly satisfies the diameter bound, and as we only output it with low probability, it has no effect on the expected number of inter-cluster edges. So we can conclude that the additional sorting overhead for the tie-breaking is a factor . The algorithm has total work , where the contribution of comes from sampling from the geometric distribution. In this paper, this factor vanishes as we always have such that .

3 Constructing a Sparsified Low Diameter Decomposition

In this section, we show how the clustering algorithm leads to a sparsified low diameter decomposition. The procedure is as follows: given , we set , , and compute a clustering using the algorithm of Section 2. We denote for the sparsified set of inter-cluster edges. Intuitively, for each vertex , we add an edge to for each cluster in which we have a neighbor on one level below, or a neighbor on the same level as with the of the cluster center smaller than the of center of the cluster of .

Lemma 9.

There exists a set of edges of expected size , such that for every edge, one of its endpoints has an edge from into the cluster of the other endpoint.

Proof.

We define, , where consists of the following edges

First, we show that satisfies the property stated in the lemma, then we consider its size. Let , without loss of generality, we assume , and in case of equality we assume . We will show that there is an edge to the cluster of . First of all, notice that by the triangle inequality. If , then . Because of minimality of , we have , and thus by definition of . If , we have . Moreover, we have . So again it follows that by definition of .

Now, we turn to the expected size of . By linearity of expectation, we have . We will show that for each the expected size of is at most . For each , we potentially add an edge to . First, we calculate the probability that at least such vertices

contribute an edge. Hereto, we look at the random variables

. According to these random variables, we order all vertices: , such that for we satisfy one of the following properties

  • ;

  • and .

We calculate , i.e., the probability that . We do this by conditioning on . We observe

So we calculate for . By definition of , this can only hold if either ’s closest neighbors in the clusters centered at and are on the same level (in which case we have , as and ) or the neighbor from the cluster centered at is at a level lower and . Note that the level of the closest neighbor in the cluster of or corresponds to the distance or respectively. As the allowed distances depend on the of , we split the vertices according to :

If , then we know . If both and are in , we must have . So for every , we are looking at

where the last equality holds as we only rewrote the condition using the order of the ’s. We fill in the definition of and use that the probability of the event we are looking at is independent of :

When , this equals , by the memoryless property of the geometric distribution. To distinguish this, we partition the vertices with into two set

Now for , we obtain by the same reasoning

As before, when , this equals , by the memoryless property of the geometric distribution. And again, we partition into two sets

If we define and , we can summarize our results as

Next, we split the expected value of depending on and :

We bound the first summand with , independent of

. Hereto, we observe that for any non-negative discrete random variable

we have

Using this, we obtain

where the last equality holds by definition of . For the second summand, we look at all simultaneously.

where the last equality holds by the law of total probability. We bound this as follows

where the last equality holds by definition of . In total, this gives us . ∎

Together Lemma 7 and Lemma 9 imply the following theorem.

Theorem 3 (Restated).

There exists an algorithm, such that for each unweighted graph and parameter it outputs a tree-supported sparsified -low diameter decomposition, with in expectation. The algorithm runs in rounds in the CONGEST model, and in depth and work in the PRAM model.

3.1 Constructing a Spanner

Now, we can construct a spanner from the tree supported low diameter decomposition in the following manner. Let denote the support forest, and let denote the set as given in Lemma 9. We define the spanner . As any forest has at most edges, the expected size of is at most . Actually, one could also show that in our construction of , but this would not impact the asymptotic size bound. To show that is a spanner, we show its of limited stretch.

Lemma 10.

is a spanner of stretch .

Proof.

We will show that for every edge , there exists a path from to in of length at most . Consequently we have that for every , hence is a spanner of stretch .

Let . By definition of , one of the endpoints has an edge in into the cluster of the other endpoint. Without loss of generality, let there be an edge with in the cluster of . By Corollary 8, there is path of length at most from to , so in total we have a path of length at most from to to . ∎

Now, the following corollary follows from Theorem 3 and Lemma 10.

Corollary 4 (Restated).

There exists an algorithm, such that for each unweighted graph and parameter it outputs a spanner of stretch . The expected size of is at most . The algorithm runs in rounds in the CONGEST model, and in depth and work in the PRAM model.

3.2 Constructing a Synchronizer

Suppose we are given a synchronous CONGEST model algorithm, but we want to run it on an asynchronous CONGEST network. That is, the messages sent in the network can now have arbitrary delays and, in an event-driven manner, nodes become active each time they receive a message. For the purpose of analyzing the time complexity of the algorithm, it is often assumed that the delay is at most one unit of time, however, the algorithm should behave correctly under any finite delays. In this situation, a node should start simulating its next (synchronous) round when it has received all the messages from the previous round from its neighbors. The problem is that it cannot tell the difference between the situation if a message from a particular neighbor has not arrived yet or if this same neighbor is not sending any message in that round at all. We say that a node is safe if all the messages it has sent have arrived at their destination. In order to determine whether all neighboring nodes are safe, additional messages are sent. The procedure governing these additional messages is called the synchronizer. There are two things to take into account when analyzing synchronizers. First, the time overhead: how much time is needed to send the additional messages for each synchronous round. Second, the message-complexity (or communication) overhead: how many additional messages are sent. For more details on synchronizers see e.g. [Lynch96, KS11].

Let us first consider two simple synchronizers: synchronizer and synchronizer , see [Awerbuch85]. In synchronizer , when a node receives a message from a neighbor, it sends back an ‘acknowledge’ message. When a node has received acknowledge messages for all its sent messages, it marks itself safe and reports this to all its neighbors. The synchronizer uses, for each simulated synchronous round, additional time, and messages.

Synchronizer will produce a different trade off between time and message overhead. It uses an initialization phase in which it creates a rooted spanning tree, where the root is declared the leader. Now after sending messages of a certain synchronous round, again nodes that receive messages reply with an acknowledge message to each. Nodes that are safe, and whose children in the constructed tree are also safe communicate this to their parent in the tree. Eventually the leader will learn that the whole graph is safe, and will broadcast this along the spanning tree. Synchronizer uses time and messages per synchronous round.

Now we are ready to consider a little more involved example, called synchronizer , see [Awerbuch85]. This synchronizer makes use of clustering, where within each cluster synchronizer is used and between clusters synchronizer is used. In the LOCAL model, the cluster centers can simply select a communication link for each neighboring cluster to communicate individually with the neighboring cluster centers [Awerbuch85]. However, in the CONGEST model, communicating information about neighboring clusters to the cluster center might lead to congestion problems. Using a slightly more careful analysis, the procedure can be adapted to the CONGEST model.

Lemma 11 (Implicit in [Awerbuch85]).

Given a -round synchronous CONGEST model algorithm for constructing a sparsified -low diameter decomposition, any synchronous -round -message complexity CONGEST model algorithm can be run on an asynchronous CONGEST network with a total of messages, and, provided that each message incurs a delay of at most one time unit, in time . The initialization phase takes time, using messages.

For a sketch of the algorithm, we refer to Appendix A. If we plug in our clustering, we obtain the following theorem.

Theorem 5 (Restated).

There exists an algorithm that, given , can run any synchronous -round -message complexity CONGEST model algorithm on an asynchronous CONGEST network. In expectation, the algorithm uses a total of messages. Provided that each message incurs a delay of at most one time unit, it takes rounds. The initialization phase takes time, using messages.

4 Constructing a Low Diameter Decomposition

In this section, we show that for an integer weighted graph the computed clustering is a probabilistic low diameter decomposition. To be precise, if we set , and we obtain a -low diameter decomposition. By Corollary 8, we know that each of the clusters has a strong diameter of at most . Now, we show that the probability that an edge is an inter-cluster edge is at most . We use a general proof structure from [MPX13], but make it more streamlined; we avoid an artificially constructed ‘midpoint’ on the edge . Further, our proof borrows the idea of conditioning on the event from Xu [Xu17], which we adapt to our situation.

Lemma 12.

For , the probability that and belong to different clusters is at most .

Proof.

Suppose is an inter-cluster edge. Without loss of generality, we assume . By the triangle inequality, we have , hence we have . Using this, we can upper bound the probability that an edge is an inter-cluster edge by the probability that this inequality holds. Note that we can assume , otherwise the statement is trivially true.

We want to condition on the cluster center that satisfied , and on the cluster center that minimizes . Moreover, we ask that these cluster respect the tie-breaking rule, i.e., both have minimal among all centers with equal distance. Finally, we condition on the value of , which we set to . We call this event , which we formally define to hold when the following four conditions are satisfied

  1. ;

  2. for we either have , or we have and ;

  3. ;

  4. for we either have , or we have and .

Now, we condition on and use the law of total probability:

For simplicity, we omit the bounds for the sum over , which is a finite sum as we always have . The factor two appears because the event assumes , hence we gain a factor two by symmetry of and . We look at the first probability more closely. We can loosen some of the event’s restrictions, and just maintain , as the event we examine is independent of conditions 2, 3, and 4 of the event . We obtain