The Sparsest Additive Spanner via Multiple Weighted BFS Trees

11/05/2018 ∙ by Keren Censor-Hillel, et al. ∙ Technion 0

Spanners are fundamental graph structures that sparsify graphs at the cost of small stretch. In particular, in recent years, many sequential algorithms constructing additive all-pairs spanners were designed, providing very sparse small-stretch subgraphs. Remarkably, it was then shown that the known (+6)-spanner constructions are essentially the sparsest possible, that is, a larger additive stretch cannot guarantee a sparser spanner, which brought the stretch-sparsity trade-off to its limit. Distributed constructions of spanners are also abundant. However, for additive spanners, while there were algorithms constructing (+2) and (+4)-all-pairs spanners, the sparsest case of (+6)-spanners remained elusive. We remedy this by designing a new sequential algorithm for constructing a (+6)-spanner with the essentially-optimal sparsity of roughly O(n^4/3) edges. We then show a distributed implementation of our algorithm, answering an open problem in [Censor-Hillel et al., DISC 2016]. A main ingredient in our distributed algorithm is an efficient construction of multiple weighted BFS trees. A weighted BFS tree is a BFS tree in a weighted graph, that consists of the lightest among all shortest paths from the root to each node. We present a distributed algorithm in the CONGEST model, that constructs multiple weighted BFS trees in |S|+D-1 rounds, where S is the set of sources and D is the diameter of the network graph.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

A spanner of a graph is a spanning subgraph of that approximately preserves distances. Spanners find many applications in distributed computing [11, 9, 41, 42, 44], and thus their distributed construction is the center of many research papers. We focus on spanners that approximately preserve distances between all pairs of nodes, and where the stretch is only by an additive factor (purely-additive all-pairs spanners).

Out of the abundant research on distributed constructions of spanners, only two papers discuss the construction of purely additive spanners in the congest model: the construction of -spanners is discussed in [34], and the construction of -spanners and -spanners in [10], along with other types of additive spanners and lower bounds. However, the distributed construction of -spanners remained elusive, stated explicitly as an open question in [10]. This is especially important since additive factors greater then cannot yield essentially sparser spanners [2].

In this paper, we give a distributed algorithm for constructing a -spanner, with an optimal number of edges up to sub-polynomial factors; our spanner is even sparser than the -spanner presented in [10]. Several sequential algorithms building -spanners were presented, but none of them seems to be appropriate for a distributed setting. Thus, to achieve our result we also present a new, simple sequential algorithm for constructing -spanners, a result that could be of independent interest.

As a key ingredient, we provide a distributed construction of multiple weighted BFS trees. Constructing a breadth-first search (BFS) tree is a central task in many computational settings. In the classic synchronous distributed setting, constructing a BFS tree from a given source is straightforward. Due to its importance, this task has received much attention in additional distributed settings, such as the asynchronous setting (see, e.g., [39] and references therein). Moreover, at the heart of many distributed applications lies a graph structure that represents the edges of multiple BFS trees [29, 33], which are rooted at the nodes of a given subset , where

is the underlying communication graph. Such a structure is used in distance computation and estimation 

[29, 28, 33], routing table construction [33], spanner construction [10, 34, 33], and more.

When the bandwidth is limited, constructing multiple BFS trees efficiently is a non-trivial task. Indeed, distributed constructions of multiple BFS trees in the congest model [39], where in each round of communication every node can send -bit messages to each of its neighbors, have been given in [29, 33], who showed that it is possible to build BFS trees from a set of sources in rounds, where is the diameter of the graph . It is easy to show that this is asymptotically tight.

In some cases, different edges of the graph may have different attributes, which can be represented using edge weights. The existence of edge weights has been extensively studied in various tasks, such as finding or approximating lightest paths [19, 36, 26, 20, 33, 30, 4, 24], finding a minimum spanning tree (MST) in the graph [5, 22, 12], finding a maximum matching [35, 12], and more. However, as far as we are aware, no study addresses the problem of constructing multiple weighted BFS (WBFS) trees, where the goal is not to find the lightest paths from the sources to the nodes, but rather the lightest shortest paths. That is, the path in a WBFS tree from the source to a node is the lightest among all shortest paths from to in .

Thus, we provide an algorithm that constructs multiple WBFS trees from a set of source nodes in the congest model. Our algorithm completes in rounds, which implies that no overhead is needed for incorporating the existence of weights.

1.1 Our contribution

At a high level, our approach for building multiple WBFS trees is to generalize the algorithm of Lenzen et al. [33] in order to handle weights. In [33], the messages are pairs consisting of a source node and a distance, which are prioritized by the distance traversed so far. When incorporating weights into this framework it makes sense to use triplets instead of pairs, where each triplet also contains the weight of the respective path. However, it may be that a node needs to send multiple messages that correspond to the same source and the same distance but contain different weights, since congestion over edges may cause the respective messages to arrive at in different rounds and, in the worst case, in a decreasing order of weights. The challenge in generalizing this framework therefore lies in guaranteeing that despite the need to consider weights, we can carefully choose a total order to prioritize triplets, such that not too many messages need to be sent, allowing us to handle congestion. Our construction and its proof appear in Section 3, giving the following.

Given a weighted graph and a set of nodes , there exists an algorithm for the congest model that constructs a WBFS tree rooted at , for every , in rounds.

The importance of our multiple WBFS trees construction lies in our ability to use it for pinning down the question of constructing -spanners in the congest model. The construction of additive spanners in the congest model was studied beforehand [34, 10], but the case remained unresolved, for reasons we describe below. Naturally, the quality of a spanner is measured by its sparsity, which is the motivation for allowing some stretch in the distances to begin with, and different spanners present different tradeoffs between stretch and sparsity. The properties of our -spanner construction algorithm are summarized in the following theorem.111

We use w.h.p. to indicate a probability that is at least

for some constant of choice.

There exists an algorithm for the congest model that constructs a -spanner with edges in rounds and sucseeds w.h.p.

Previous distributed algorithms for spanners similar to ours, i.e., purely additive all-pairs spanners, construct a -spanner with edges in rounds [34], a -spanner with edges in rounds [10], and a -spanner with edges in rounds [10]. Hence, our algorithm is currently the best non-trivial spanner construction algorithm in terms of density, sparser even than the previous -spanner. The option of getting even sparser spanners by allowing more stretch was essentially ruled out [2], while the question of improving the running time remains open for all stretch parameters.

1.2 Other spanner construction algorithms

Previous distributed spanner construction algorithms all build upon known sequential algorithms, and present a distributed implementations of them, or of a slight variant of them [34, 10]. For example, many sequential algorithms start in a clustering phase, where stars around high-degree nodes are added to the spanner one by one. Implementing this directly in the distributed setting will take too long; instead, it is shown that choosing cluster centers at random yields almost as good results, and can be implemented in a constant time. Similar methods are used for implementing other parts of the construction. However, the approach of finding a distributed implementation for a sequential algorithm fails for all known -spanner algorithms, as described next. Thus, we introduce a new sequential algorithm for the problem, and then present its distributed implementation.

There are three known approaches for the design of sequential -spanner algorithms. The first, presented by Baswana et al. [6], is based on measuring the quality of paths in terms of cost and value, and adding to the spanner only paths which are “affordable”. This approach was later extended by Kavitha [31] to other families of additive spanners. The second approach, presented by Woodruff [45], uses a subroutine that finds almost-shortest paths between pairs of nodes, and obtains a faster algorithm at the expense of a slightly worst sparsity guarantee. The third approach, presented by Knudsen [32], is based on repeatedly going over pairs of nodes, and adding a shortest path between a pair of nodes to the spanner if their current distance in the spanner is too large.

Unfortunately, direct implementation in the congest model of the known sequential algorithms is highly inefficient. We are not aware of fast distributed algorithms that allow the computation of the cost and value of paths needed for the algorithm of [6]. Similarly, for [45], the almost-shortest paths subroutine seems too costly for the congest model. The algorithm of [32] needs repeated updates of the distances in the spanner between pairs of nodes after every addition of a path to it, which is a sequential process in essence, and thus we do not find it suitable for an efficient distributed implementation.

A different approach for the distributed construction of -spanners could be to adapt a distributed algorithm with different stretch guarantees to construct a -spanner. This approach does not seem to work: the distributed algorithms for constructing -spanners [34] and -spanner [10] are both very much tailored for achieving the desired stretch, and it is not clear how to change them in order to construct sparser spanners with higher stretch. The -spanner construction algorithm [10] starts with clustering, and then constructs a -pairwise spanner between the cluster centers. Replacing the -pairwise spanner by a -pairwise spanner will indeed yield a -all-pairs spanner, as desired. However, even using the sparsest -pairwise spanners [10, 1], the resulting -spanner may have edges, denser than our new -spanner and than the known -spanner [10].

Thus, we start by presenting a new sequential algorithm for the construction of -spanners, an algorithm that is more suitable for a distributed implementation, and then discuss its distributed implementation. Our construction starts with a clustering phase, and then adds paths that minimize the number of additional edges that need to be added to the spanner. To implement our construction in the congest model, we assign weights to the edges and use our WBFS algorithm to find shortest paths with as few edges as possible that are not yet in the spanner. Note that although the graph and the spanner we construct for it are both unweighted, the ability of our multiple WBFS algorithm to handle weights is crucial for our solution.

A -spanner must contain edges [2]. The best sequential algorithms [32, 6] construct a spanner with edges. Our distributed algorithm constructs a spanner with edges, which is slightly denser than optimal but still sparser than the edges in the fast sequential construction of [45].

1.3 Related work

Algorithms for the congest model that construct multiple (unweighted) BFS trees, rooted at a set of sources , were suggested in [33] and [29], running in rounds. Both algorithms start the construction of all the BFS trees simultaneously, and proceed by transferring messages containing the source of a BFS tree and the distance the message has traversed so far. The algorithms differ in how they order message deliveries when several messages need to be sent over an edge at the same round. We base our multiple WBFS construction on the [33] algorithm, in which messages sent by a node are prioritized by the distance they traversed so far, with a preference to messages that traversed smaller distance. The [29] algorithm, which we cannot use for our construction [27], prioritizes messages by the identity of the root, and transmits a message only in one direction of each edge in each round.

Spanners were first introduced in 1989 by [40, 41], and since then have been a topic for wide research due to their abundant applications. Prime examples for the need for sparse spanners can be found in synchronizing distributed networks [41], information dissemination [9], compact routing schemes [11, 42, 44], and more.

Distributed constructions of various spanners have been widely studied [34, 33, 43, 10, 6, 7, 13, 14, 15, 17, 18, 21, 16, 38, 23, 37, 25, 8]. Lower bounds were given in [43, 10, 3]. However, obtaining an efficient and sparse ()-all-pairs spanner has remained an open question [10].

Several lower bounds for the time complexity of spanner construction in the congest model where presented in [10], but these are applicable only to pairwise spanners with a bounded number of pairs, and not to all-pairs spanners. A lower bound from [43] states that the construction of a spanner with edges, such as the one we build, must take rounds. This lower bound does not take into account the bandwidth restrictions at all (it is proven for the local model), and so we believe that a higher lower bound for the congest model should apply, but this is left as an intriguing open question.

2 Preliminaries

All graphs in this work are simple, connected and undirected. A graph can be unweighted, , or weighted with , in which case we assume . Given a path in a weighted graph , we use to denote the length of , which is the number of edges in it, and to denote the weight of the path, which is the sum of its edge-weights. The distance between two nodes in a graph , denoted , is the minimum length of a path in connecting and . The diameter of a graph (weighted or unweighted) is .

We consider the congest model of computation[39], where the nodes of a graph communicate synchronously by exchanging -bit messages along the edges. The goal is to distributively solve a problem while minimizing the number of communication rounds.

WBFS trees:

We are interested in a weighted BFS tree, which consists of all lightest shortest paths from the root, formally defined as follows.

Given a connected, weighted graph and a node , a weighted BFS tree (WBFS) for rooted at is a spanning tree of satisfying the following properties:

  1. For each , the path from to in is a shortest path in between and .

  2. For each , no shortest path from to in is lighter than the path from to in .

We emphasize that this is different than requiring a subgraph containing all lightest paths from the root. One may wonder if a WBFS tree always exists, but this is easily evident by the following refinement of a (sequential) BFS search, returning a WBFS tree: go over the nodes in an order of non-decreasing distances from the source , starting with ; each node chooses as a parent a neighbor that was already processed and minimizes , and adds the edge to the tree. Each node has a single parent, so this is indeed a tree; the node ordering guarantees that this is indeed a BFS tree, assuring (i); and the parent choice guarantees the paths are lightest among the shortest, assuring (ii).

Spanners:

Given a graph , a subgraph of is called an -spanner if for every it holds that . The parameters and are called the stretch parameters.

When , such a spanner is called a purely additive spanner. In this paper we focus on purely additive ()-spanners, i.e., and .

For completeness, we mention that when , such a spanner is called a multiplicative spanner. In addition, while sometimes the stretch parameters need to be guaranteed only for some subset of all the pairs of nodes of the graph (such as in pairwise spanners), we emphasize that our construction provides the promise of a stretch for all pairs.

3 Multiple Weighted BFS Trees

In the congest model, the problem of finding a WBFS tree requires each node to know its parent in the WBFS tree, and the unweighted and weighted distances to the source within the tree. This allows the node to send messages to the source node through the lightest among all shortest paths. When there are multiple sources, each node should know the parent leading to each of the sources in .

We define data structures for representing multiple WBFS trees. Given a node , the -proximity-list (or proximity list for short) of , noted , is an ascending lexicographically ordered list of triples , where and are the length and weight of the path from to in . Two different triples are ordered such that if , or and , where and may be compared by any predefined order on the node identifiers. Note that contains a single path from to , so both and cannot happen simultaneously without having .

The -path-map (or path-map for short) of is a mapping from each source to the parent of in , noted by . The list is sorted with respect to the order of , such that the first records of belong to sources closest to .

Algorithm 1, which constructs multiple WBFS trees from a set in the congest model, is based on carefully extending the distributed Bellman-Ford-based algorithm of Lenzen et al. [33]. The heart of the algorithm is a loop (Line 1), and each iteration of it takes a single round in the congest model. We show that iterations of the loop suffice in order to construct the desired WBSF trees.

The algorithm builds the WBFS trees by gradually updating the proximity list and the path map of each node. Each round is composed of two phases: updating the neighbors about changes in the proximity list, and receiving updates from other nodes. The path map is only used by the current node, and therefore changes to it are not sent.

Ideally, each node would update its neighbors regarding all the changes made to its proximity list. However, due to bandwidth restrictions, a node cannot send the entire list in each round. Therefore, at each round each node sends to all of its neighbors the lexicographically smallest triplet in its proximity list that it has not yet sent, while maintaining a record noting which triplets have been sent and which are waiting. Each triplet is only sent once, though a node may send multiple triplets regarding a single source.

A node uses the messages received in the current round in order to update its proximity list and path map for the next round. A triplet received by a node from a neighbor represents the length and weight of some path from to in the graph. The node then considers the extended path from to , compares it to its currently known best path from to , and updates the proximity list and path map in case a shorter path has been found, or a lighter path with the same length.

1
2 for  do
3      
4if  then
5      
         /* A variable marking sent triplets */
6      
7for  rounds do
8       if  then
9            
10             send to all neighbors
11            
12      for received from  do
13            
14            
15             if  then
16                  
17                  
18                  
19                  
20            
21      
Algorithm 1 Weighted distributed Bellman-Ford algorithm for node

To prove correctness, we generalize the proof of [33] to handle weights, and show that our algorithm solves the weighted -detection problem: each node should learn which are the sources from closest to it, but at most of them and only up to distance . This is formally defined as follows.

Given a weighted graph , a subset of source nodes, and a node , let denote the S-proximity-list and let denote the path map of the node . The weighted -detection problem requires that each node learns the first entries of and , where is the number of sources such that .

Given a node , is a variable in Algorithm 1 holding the proximity list of , and we denote by the state of the list at the beginning of round of the algorithm, and by the value of at the end of the algorithm. Recall that is the true proximity list, so our goal is proving , i.e., proving that the algorithm obtains the correct values of the proximity list.

We use similar notations for the path map . Since the records of are updated under the same conditions as the records of , the correctness of at the end of the algorithm with respect to immediately follows, and we omit the details.

We start by showing that if there was no bound on the number of rounds, then the values of would have eventually converged to the true values of .

Given a graph and a set , if we let the for loop in Line 1 of Algorithm 1 to run forever, then there exists a round such that no node sends messages or modifies after round . Moreover, , i.e., for every , it holds that and .

Proof.

In each iteration of the algorithm, each node sends to all of its neighbors the first triplet such that . Each triplet received is sent at most once. Therefore, if we show the existence of a round where for each , all messages in have been sent in previous rounds, it implies that no message is sent in round , and hence . This claim is applied inductively, concluding that for any round , it holds that . We prove the existence of the round by showing that the number of messages that are sent by each node is finite.

Each triple is inserted into when a message is received from another node , and such a message implies that a path from to through of length and weight exists. Thus, the number of messages sent by a node is upper bounded by the number of paths from to of length . Furthermore, the algorithm does not insert a message into if it has already inserted a lexicographically smaller message from the same source. As the graph is finite, the number of paths is bounded, and eventually no node adds further triplets to its lists or sends additional messages.

It remains to show that for all and it holds that .

First, we show that if a triplet is added to in some round , then there exists a path from to such that and . For a source , we insert the triplet into at the beginning of the algorithm, so the claim is true at initialization. Assume there exists a round where a triplet is inserted into but no corresponding path exists, and let be the first such round. This implies that there exists a node that is a neighbor of , which sends the message to in round . The triplet must have been inserted into in some round , and by the minimality of there exists a path from to where and . Since is a neighbor of , the path is valid, satisfying and , contradicting the assumption.

To complete the proof, we claim that the correct triplet is indeed added to at some round of the algorithm, and is not removed. Consider the path from to in the WBFS tree , denoted . At initialization, the triplet is added to . From then, at each round there exists some such that and . Since we proved that this message is eventually sent, this implies that in the beginning of the next round, the triplet is added to . By the definition of a WBFS tree, all other paths from to must be longer or not lighter, implying the triplet cannot be discarded for a lexicographically smaller triplet. This concludes that for any source and node , it holds that . ∎

Lemma 3 shows that without the limit on the number of rounds, the algorithm would compute the right values; however, it does not bound the number of rounds needed for this to occur. Next, we show that rounds suffice. We cannot apply the claims of [33] directly, since the existence of weights restricts the number of viable solutions even further, causing more updates to the proximity list and an increase in the number of messages sent. However, we do use a similar technique: we bound the number of rounds in which the smallest entries of can change.

For an entry , let denote the index of the entry in the lexicographically ordered list at the beginning of round . For completeness, we define if did not appear in at the beginning of round , and if the triplet was removed from before the beginning of this round. Note that a removed triplet is never returned to the list, since the lexicographical order is transitive.

For a triplet , the following holds:

  1. is non-decreasing with r.

  2. If the triplet is sent from a node to a node at round , resulting in the addition of a new triplet to at the end of round , where and , then .

Part (i) follows from the fact that the number of triplets below cannot decrease. To prove part (ii), we show that all the triplets below in are sent from to and added to before is sent and added.

Proof.

Part (i) is a consequence of the method used by our algorithm for managing the list . According to our algorithm, triplets are not removed from when they are sent. The only case in which a triplet is removed from is when a lexicographically smaller triplet is added to the list instead. When this happens in round , it holds that , since the new triplet is lexicographically smaller. Hence, for every other triplet , the number of lexicographically smaller triplets in cannot decrease throughout the algorithm.

We now turn to prove part (ii) of the lemma. By the fact that the triplet is sent by the node in round , we conclude that the triplets preceding it in the list have already been sent by in earlier rounds, and arrived at the node . For each such triplet , either , or and . Therefore, when added to as it is lexicographically smaller than . At round , either is in or it was replaced by a lexicographically smaller triplet containing . Thus, there are at least triplets smaller than in , and hence . ∎

Lemma 3 implies that as the algorithm progresses, messages at higher indexes of the proximity list are sent and updated. This can be used to obtain an upper bound on the round in which a triplet at a certain index of the proximity list can be sent or received, as formalized by the next lemma.

In round of Algorithm 1, a node can:

  1. send a message only if

  2. add to a triplet only if

Part (i), when put in words, is rather intuitive: while a triplet might need to wait before being sent, the waiting time is bounded from above by the distance the triplet has traversed from its source, plus the number of triplets that were to be sent before it. Part (ii) is complementary to part (i): the time before a triplet is added, is, once more, bounded by the distance it traversed plus the number of lexicographically smaller triplets.

Proof.

We start by showing that, for a given round , if Lemma 3(i) holds for all nodes then Lemma 3(ii) holds as well. Consider a triplet that is added to as a result of a message sent from to in round , where and . Lemma 3(i) implies that , and by Lemma 3(ii) we have that . As , we conclude

which implies Lemma 3(ii).

Next, we prove by induction that both parts of the lemma hold. In round , Lemma 3(i) holds trivially, since by definition . Assume that Lemma 3 holds at round ; we show the lemma holds at round . Since Lemma 3(i) implies Lemma 3(ii), it is sufficient to show that every message sent by some node in round satisfies .

Observe that if is sent by a node in round , then the triplet must have been added to in some round . If , according to the induction hypothesis, Lemma 3(ii) holds and , implying , since all the terms are integers.

Otherwise . In this case, in round the triplet appeared in and was not yet sent. Since is sent in round , a different triplet with must have been sent in round , implying:

By Lemma 3(i), we have that , and combined with the induction hypothesis for Lemma 3(i) in round we conclude:

This gives that , since all the terms are integers. ∎

Lemma 3 implies that eventually, the lists converge to contain the correct values, and Lemma 3 restricts the number of rounds in which specific list entries may change. From this, we conclude that the algorithm solves the weighted -detection problem.

Given an instance of the weighted -detection problem, for every and round of an execution of Algorithm 1 with

the truncation of to the first entries, where is the number sources such that , solves weighted -detection problem.

This lemma says that the truncated list is correct at the beginning of the relevant round. To prove it, we use Lemma 3(ii) to show that the values in the truncated list cannot change at round or later, and Lemma 3 to deduce they are correct.

Proof.

Assume w.l.o.g that , as bounds the distance to any source, and , as otherwise needs to learn about all sources.

By Lemma 3, there is a round when all entries of are correct, and let be a triplet in one of the first entries of . Since is one of the first entries and , we have .

Let be the round when is inserted to the list . By Lemma 3(ii), . By Lemma 3(i), when the triplet is inserted to the list, it is already placed in one of the first entries, i.e., . Hence,

Since this claim holds for any of the first entries, these were all correct at the beginning of round , and in all the succeeding rounds. ∎

The construction of multiple WBFS trees is an instance of the -detection problem. Lemma 3 shows that after rounds of Algorithm 1 on such an instance, all the entries of the list are correct, yielding the main result of this section.

  • Given a weighted graph and a set of nodes , there exists an algorithm for the congest model that constructs a WBFS tree rooted at , for every , in rounds.

4 A ()-Spanner Construction

In this section we discuss the distributed construction of -spanners. First, we present a template for constructing a -spanner and analyze the stretch and sparsity of the constructed spanner. Then, we provide an implementation of our template in the congest model and analyze its running time.

A cluster around a cluster center is a subset of the set of neighbors of in . A node belonging to a cluster is clustered, while the other nodes are unclustered.

Our algorithm starts by randomly choosing cluster centers, and adding edges between them to their neighbors, where each neighbor arbitrarily chooses a single center to connect to. Then, additional edges are added, to connect each unclustered node to all its neighbors. Next, shortest paths between clusters are added to the spanner. In order to find these shortest paths in the congest model, we use the WBFS construction algorithm to build WBFS trees from random sources. At the heart of our algorithm stands the path-hitting framework of Woodruff [45]: a shortest path in the graph which has many edges between clustered nodes, must go through many clusters. This fact is used in order to show that a path with many missing edges (edges not in ) is more likely to have an adjacent source of a WBFS tree, and thus it is well approximated by a path within the spanner.

Woodruff’s algorithm starts with a similar clustering step. However, in order to add paths between clusters, it uses an involved subroutine that finds light almost-shortest paths between pairs of nodes. This subroutine seems too global to be implemented efficiently in a distributed setting, so in our construction it is replaced by only considering lightest shortest paths, which we do using the WBFS trees defined earlier.

Our algorithm constructs a -spanner with edges in rounds, as stated next.

  • There exists an algorithm for the congest model that constructs a -spanner with edges in rounds and sucseeds w.h.p.

Lemmas 4 and 4 analyze the size and stretch of Algorithm  given below. The number of rounds of its distributed implementation is analyzed in Lemma 4, giving Theorem 1.1. We use to denote a constant that can be chosen according to the desired exponent of in the failure probability.

Algorithm 

Input: a graph , a constant ;
Output: a subgraph of ;
Initialization: ; ;

Clustering.

Pick each node as a cluster center w.p. , and denote the set of selected nodes by . For each , initialize a cluster .

For each node , choose a neighbor of which is a cluster center, if such a neighbor exists, add the edge to , and add to . If none of the neighbors of is a cluster center, add to all the edges adjacent to . Let .

Path Buying.


While do:

  1. Add each cluster center to w.p. , independently of the other centers

  2. For each pair :

    1. /* is a set of paths */

    2. For each :

      1. Among all the shortest paths from to , let be a path with minimum

      2. If , add to

    3. If , add to one of the shortest among the paths of

Algorithm  outputs a subgraph of with edges, with probability at least .

Proof.

The algorithm starts with and only adds edges from , so is indeed a subgraph of over the same node set.

In the first part of the clustering phase, each node adds to at most one edge, connecting it to a single cluster center, for a total of edges. Then, the probability that a node of degree at least is left unclustered is at most , which is . A union bound implies that all nodes of degree at least are clustered w.p. , and thus the total number of edges added to by unclustered nodes in the second part of the clustering phase is , w.p. .

We start the analysis of the path buying phase by bounding the size of . A node is added to w.p. , so . A Chernoff bound implies that

Similarly, for each value of , we have , and

where the last equality follows since . A union bound implies that and for all , w.p. at least .

Finally, for each , for each we add at most one path with less than missing edges to . Thus, for each value of we add less than edges to , w.p. at least . Summing over all values of , and adding the number of edges contributed by the clustering phase, we conclude that has at most edges, w.p. at least . ∎

The graph constructed by Algorithm  satisfies for each pair , with probability at least .

Figure 1: Illustration of the proof of Lemma 4
Proof.

Consider a shortest path in between two nodes (see Figure 1). Let and be the first and last clustered nodes on , respectively. If all nodes of are unclustered, then is fully contained in and we are done.

Let and be the centers of the clusters containing and , respectively. Let be a shortest path in between and , and denote by the number of edges of . Let be the largest power of such that .

An edge can be in only if it connects two clustered nodes. Hence, , the number of edges in , is smaller than the number of clustered nodes in . On the other hand, cannot contain more than three nodes of the same cluster: the distance between every two nodes in a cluster is at most two, so a shortest path cannot traverse more than three nodes of the same cluster. Thus, the number of clusters intersecting is at least . As , the probability that none of the centers of these clusters is chosen to is at most . For each pair of nodes, a cluster center on a shortest path between them is chosen to , for the appropriate value of , with similar probability. A union bound implies that this claim holds for all pairs in w.p. at least .

Let be a node on in a cluster such that , if such a cluster exists. Denote by the sub-path of from to . As there are edges in , there are also less than edges in . Thus, in step of the path-buying phase for , either the path or some other path between and of length at most is added to . In step , a path from to some node is added to , and this is a shortest path in , so