# Near-Optimal Clustering in the k-machine model

The clustering problem, in its many variants, has numerous applications in operations research and computer science (e.g., in applications in bioinformatics, image processing, social network analysis, etc.). As sizes of data sets have grown rapidly, researchers have focused on designing algorithms for clustering problems in models of computation suited for large-scale computation such as MapReduce, Pregel, and streaming models. The k-machine model (Klauck et al., SODA 2015) is a simple, message-passing model for large-scale distributed graph processing. This paper considers three of the most prominent examples of clustering problems: the uncapacitated facility location problem, the p-median problem, and the p-center problem and presents O(1)-factor approximation algorithms for these problems running in Õ(n/k) rounds in the k-machine model. These algorithms are optimal up to polylogarithmic factors because this paper also shows Ω̃(n/k) lower bounds for obtaining polynomial-factor approximation algorithms for these problems. These are the first results for clustering problems in the k-machine model. We assume that the metric provided as input for these clustering problems in only implicitly provided, as an edge-weighted graph and in a nutshell, our main technical contribution is to show that constant-factor approximation algorithms for all three clustering problems can be obtained by learning only a small portion of the input metric.

## Authors

• 17 publications
• 8 publications
• 9 publications
• 8 publications
11/15/2018

### Large-Scale Distributed Algorithms for Facility Location with Outliers

This paper presents fast, distributed, O(1)-approximation algorithms for...
11/18/2021

### On Clustering with Discounts

We study the k-median with discounts problem, wherein we are given clien...
12/14/2021

### On fully dynamic constant-factor approximation algorithms for clustering problems

Clustering is an important task with applications in many fields of comp...
06/22/2022

### Constant-Factor Approximation Algorithms for Socially Fair k-Clustering

We study approximation algorithms for the socially fair (ℓ_p, k)-cluster...
06/05/2018

### A Projection Method for Metric-Constrained Optimization

We outline a new approach for solving optimization problems which enforc...
11/20/2021

### Faster Deterministic Approximation Algorithms for Correlation Clustering and Cluster Deletion

Correlation clustering is a framework for partitioning datasets based on...
05/23/2018

### Distributed Approximation Algorithms for the Combinatorial Motion Planning Problem

We present a new 4-approximation algorithm for the Combinatorial Motion ...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

The problem of clustering

data has a wide variety of applications in areas such as information retrieval, bioinformatics, image processing, and social network analysis. In general, clustering is a key component of data mining and machine learning algorithms. Informally speaking, the objective of data clustering is to partition data into groups such that data within each group are “close” to each other according to some similarity measure. For example, we might want to partition visitors to an online retail store (e.g., Amazon) into groups of customers who have expressed preferences for similar products. As the sizes of data sets have grown significantly over the last few years, it has become imperative that clustering problems be solved efficiently in models of computation that allow multiple machines to process data in parallel. Distributing input data across multiple machines is important not just for speeding up computation through parallelism, but also because no single machine may have sufficiently large memory to hold a full data set. Motivated by these concerns, recent research has considered problems of designing clustering algorithms

[11][12] in systems such as MapReduce [9] and Pregel [24]. Clustering algorithms [28] have also been designed for streaming models of computation [2].

In this paper we present distributed algorithms for three of the most prominent clustering problems: the uncapacitated metric facility location problem, the -median problem, and the -center problem. All three problems have been studied for several decades now and are well-known to be NP-hard. On the positive side, all three problems have constant-factor (polynomial-time) approximation algorithms. We consider these problems in the recently proposed -machine model [21], a synchronous, message-passing model for large-scale distributed computation. This model cleanly abstracts essential features of systems such as Pregel [24] and Giraph (see http://giraph.apache.org/) that have been designed for large-scale graph processing111Researchers at Facebook recently used Apache Giraph to process graphs with trillion edges [6]., allowing researchers to prove precise upper and lower bounds. One of the main features of the -machine model is that the input, consisting of items, is randomly partitioned across machines. Of particular interest are settings in which is much larger than . Communication occurs via bandwidth-restricted communication links between every pair of machines and thus the underlying communication network is a size- clique. For all three problems, we present constant-factor approximation algorithms that run in rounds in the -machine model. We also show that these algorithms have optimal round complexity, to within polylogarithmic factors, by providing complementary lower bounds for polynomial-factor approximation algorithms222Throughout the paper, we use as a shorthand for and as a shorthand for .. These are the first results on clustering problems in the -machine model.

### 1.1 Problem Definitions

The input to the uncapacitated metric facility location problem (in short, FacLoc) is a set of points, a metric that assigns distances to point-pairs, and a facility opening cost associated with each point . The problem is to find a subset of points to open (as “facilities”) so as to minimize the objective function , where . (For convenience, we abuse notation and use instead of .) FacLoc is NP-hard and is in fact hard to approximate with an approximation factor better than 1.463 [15]. There are several well-known constant-factor approximation algorithms for FacLoc including the primal-dual algorithm of Jain and Vazirani [19] and the greedy algorithm of Mettu and Plaxton [25]. The best approximation factor currently achieved by an algorithm for FacLoc is 1.488 [22].

The input to the -median problem (in short, Median) is a set of points and a metric that assigns distances to point-pairs, and a positive integer . The problem is to find a subset of exactly points to open (as “facilities”) so as to minimize the objective function . Median is NP-hard and and is in fact hard to approximate with an approximation factor better than [18]. A well-known approximation algorithm for the -median problem is due to Jain and Vazirani [19], who present a 6-approximation algorithm. This approximation factor has been improved by subsequent results – see [4], for example. The input to the -center problem (in short, Center) is the same as the input to Median, but the objective function that is minimized is . Like FacLoc and Median, the Center problem is not only NP-hard, it is in fact hard to approximate with an approximation factor strictly better than 2 [13]. There is also an optimal 2-approximation algorithm for this problem [13] obtained via a simple, greedy technique called farthest first traversal.

In all three problems, it is assumed that each point is “connected” to the nearest open facility. So an open facility along with the “clients” that are connected to it forms a cluster.

### 1.2 The k-machine Model and Input-Output Specification

Let denote . The -machine model is a message-passing, synchronous model of distributed computation. Time proceeds in rounds and in each round, each of the machine performs local computation and then sends, possibly distinct, messages to the remaining machines. A fundamental constraint of the -machine model is that each message is required to be small; as is standard, we assume here that each message is of size bits. It is assumed that the machines have unique IDs, that are represented by -bit strings.

As per the random partition assumption of -machine model [21], the points in are distributed uniformly at random across the machines. This results in

points per machine, with high probability (w.h.p.)

333We use “with high probability” to refer to probability that is at least for any constant .. We use , , to denote the machines and to denote the subset of points “hosted” by . The natural way to distribute the rest of the input, namely and (in the case of FacLoc), is for each machine to be given and for each point . The distribution of in this manner is fine, but there is a problem with distributing in this manner. Since is extremely large, it is infeasible for to hold the elements in . (Recall that .) In general, this explicit knowledge of the metric space consumes too much memory, even when divided among machines, to be feasible. So we make, what we call the graph-metric assumption, that the metric is specified implicitly by an edge-weighted graph with vertex set . Let be the edge-weighted graph with non-negative edge weights representing the metric . Thus for any , is the shortest path distance between points and in .

Klauck et al. [21] consider a number of graph problems in the -machine model and we follow their lead in determining the initial distribution of across machines. For each point , machine knows all the edges in incident on and for each such edge , machine knows the ID of the machine that hosts . Thus, elements are needed at each machine to represent the metric space and if is a sparse graph, this representation can be quite compact.

The graph-metric assumption fundamentally affects the algorithms we design. Since the metric is provided implicitly, via , access to the metric is provided through shortest path computations on . In fact, it turns out that these shortest path computations are the costliest part of our algorithms. One way to view our main technical contribution is this: we show that for all three clustering problems, there are constant-factor approximation algorithms that only require a small (i.e., polylogarithmic) number of calls to a subroutine that solves the Single Source Shortest Path (SSSP) problem.

For all three problems, the output consists of , the set of open facilities, and connections between clients (i.e., points that have not been open as facilities) and their nearest open facilities. More precisely, for any machine and any point :

• If , then knows that has been opened as a facility and furthermore knows all -pairs where is a client that connects to and is the ID of the machine hosting .

• If , then knows that is a client and it also knows the pair, where is the open facility that connects to and is the ID of the machine hosting .

### 1.3 Our Results

We first prove lower bounds (in Section 2) for FacLoc, Median, and Center. For each problem, we show that obtaining an -approximation algorithm in the -machine model, for any , requires at least rounds. In the subsequent three sections, we present -round, constant-factor approximation algorithms for the FacLoc, Median, and Center problem, respectively. Our lower bound results show that our algorithms have optimal round complexity, at least up to polylogarithmic factors.

We bring to bear a wide variety of old and new techniques to derive our upper bound results including the facility location algorithm of Mettu and Plaxton [25], the fast version of this algorithm due to Thorup [29]

, the neighborhood-size estimation framework of Cohen

[7, 8], the -median Lagrangian relaxation algorithm of Jain and Vazirani [19] and the recent distributed shortest path algorithms due to Becker et al. [5]. In our view, an important contribution of this paper is to show how all of these techniques can be utilized in the -machine model.

### 1.4 Related Work

Following Klauck et al. [21], two other papers [27, 26] have studied graph problems in the -machine model. In [26], the authors present an -round algorithm for graph connectivity, which then serves as the basis for -round algorithms for other graph problems such as minimum spanning tree (MST) and approximate min-cut. The upper bound for MST does not contradict the lower bounds shown for this problem in Klauck et al. [21] because Pandurangan et al. [26] use a more relaxed notion of how the output MST is represented. Specifically, at the end of the algorithm in [26] every MST edge is known to some machine, whereas Klauck et al. [21] use the stricter requirement that every MST edge be known to the machines hosting the two end points of the edge. This phenomena in which the round complexity of the problem is quite sensitive to the output representation may be relevant to our resuts as well and is further discussed in Section 7.

Earlier in this section, we have mentioned models and systems for large-scale parallel computation such as MapReduce and Pregel. Another model of large-scale parallel computation, that seems essentially equivalent to the -machine model is the Massively Parallel Computation model (MPC) which according to [30] is the “most commonly used theoretical model of computation on synchronous large-scale data processing platforms such as MapReduce and Spark.”

## 2 Lower Bound Results

In this section, we derive lower bounds for achieving poly(n)-factor approximation algorithms in the -machine model for all three problems considered in this paper. Our lower bounds are inspired by the lower bound result from [21] for the Spanning Tree Computation problem.

To prove the lower bounds we describe a family of lower bound graphs where and are sampled from the same distribution as the one used in [21]. That is, is chosen uniformly at random from , satisfying the constraint that for every , . Let and let for some large enough constant that depends on the approximation factor considered. The graph has vertices . We fix the ID’s of the vertices to be the first natural numbers which means that each machine knows whether a vertex is just by knowing ID(v). For every there are three edges in the graph of the form and the weights of these edges depend on the bit values of and where . In particular, we assign weights to as follows – if and , the weights are , if and , the weights are , and if and , the weights are . There is no weight assignment for the case when because the distribution of places no probability mass on this case.

In the following lemma we show that any protocol that reveals and to a single machine must do so by making it receive large messages from other machines. The proof is the same as the entropy argument made in theorem 2.1 in [21] with the added simplification that the entropy at the end of the protocol is zero. Nevertheless, we prove the lemma for completeness.

###### Lemma 1.

Let be a public-coin -error randomized protocol in the -machine model on an -vertex input graph sampled uniformly at random from . If a machine knows both and at the end of the protocol then it must receive bit messages in expectation from other machines.

###### Proof.

Let be the machine that knows both and at the end of the protocol. Since and are encoded in the edge weights of the graph, if the machine hosts then it knows the string via the edges and similarly it knows if it hosts . But if hosts both and then it knows and before the protocol even begins. This is a bad event so we condition on the event that no machine hosts both and which happens with probability .

Before the first round of communication, it can be shown that the entropy . The machine also hosts some vertices and giving it access to some bits of and . It is easy to see via the Chernoff bound that with very high probability hosts at most ’s and ’s for which means it cannot know more than bits of and by virtue of hosting these vertices whp. The event where hosts more vertices cannot influence the entropy by more than for large enough. Hence, the entropy of

given this initial information (which we denote by a random variable

) is . Note that if hosts either or then will contain information about either or respectively but that does not affect our lower bound on the initial entropy.

Let be the messages received by the machine during the course of the protocol . With probability , knows both and at the end of the protocol and therefore . This means that and that . This is under the assumption that different machines host and and there is no error, therefore the expected number of messages received by must be at least . ∎

###### Lemma 2.

For any , every public-coin -error randomized protocol in the -machine model that computes an -factor approximate solution of FacLoc on an -vertex input graph has an expected round complexity .

###### Proof.

To prove the lemma we consider the family of lower bound graphs with the additional property that the vertices and have facility opening cost and every other vertex has opening cost .

Consider the solution to Facility Location where we open the vertices and and connect all other vertices to the closest open facility. The cost of this solution is whereas any other solution will incur a cost of at least . By our choice of , the solution is optimal and any -approximate solution is forced to have the same form as .

After the facility location algorithm terminates, with probability , the machine hosting will know the ID’s of the ’s that serves in . This allows to figure out because if serves and otherwise. By Lemma 1, receives bit messages in expectation throughout the course of the algorithm. This implies an lower bound on the expected round complexity. ∎

###### Lemma 3.

For any , every public-coin error randomized protocol on a -machine network that computes a -factor approximate solution of pMedian and pCenter on an -vertices input graph has an expected round complexity of .

###### Proof.

We show the lower bound for on graphs that come from the family . An optimal solution in a graph from this family is to open and which gives a solution of cost for pMedian and for pCenter. But, we need to be a bit more careful because the pMedian or pCenter algorithms can choose to open some of the ’s and ’s instead of and with only a constant factor increase in the cost of the solution. More specifically, there are four possible cases where we can open different pairs of vertices to get an -approximate solution – , , , and where and are connected by an edge of weight to and respectively. In all these cases, the opened vertices know both and at the end of the algorithm by virtue of knowing the vertices that it serves in the final solution. This is because the value of is high enough to ensure that the two clusters formed in any -approximate solution are the same as the optimal solution no matter what centers are chosen. Therefore, we can apply lemma 1 to all these cases which gives us that the machine hosting one of these vertices will receive bit messages in expectation during the course of the algorithm. This means that the expected round complexity for both the algorithms is . ∎

## 3 Technical Preliminaries

Since the input metric is only implicitly provided, as an edge-weighted graph, computing shortest path distances to learn parts of the metric space turns out to be a key element of our algorithms. The Single Source Shortest Path (SSSP) problem has been considered in the -machine model in Klauck et al. [21] and they describe a -approximation algorithm that runs in the -machine model in rounds. This is too slow for our purpose, since we are looking for an overall running time of . We instead turn to a recent result of Becker at al. [5] and using this we can easily obtain an -round SSSP algorithm. Becker et al. do not work in the -machine model; their result relevant to us is in the Broadcast Congested Clique model. Informally speaking, the Congested Clique model can be thought of as a special case of the -machine model with . The Broadcast Congested Clique model imposes the additional restriction on communication that in each round each machine sends the same message (i.e., broadcasts) to the remaining machines. We now provide a brief description of the Congested Clique models. The Congested Clique model consists of nodes (i.e., computational entities) connected by a clique communication network. Communication is point-to-point via message passing and each message can be at most bits in length. Computation proceeds in synchronous rounds and in each round, each node performs local computations and sends a (possibly different) message to each of the other nodes in the network. For graph problems, the input is assumed to be a spanning subgraph of the underlying clique network and each node is initially aware of the incident edges in the input. The Broadcast Congested Clique model differs from the Congested Clique model only in that in each round, each node is required to send the same message to the remaining nodes. For more details on the Congested Clique models, see [16, 10].

###### Theorem 1.

(Becker et al. [5]) For any , in the Broadcast Congested Clique model, a deterministic -approximation to the SSSP problem in undirected graphs with non-negative edge-weights can be computed in rounds.

It is easy to see that any Broadcast Congested Clique algorithm that runs in rounds can be simulated in the -machine model in rounds. A more general version of this claim is proved in Klauck et al. in the Conversion Theorem (Theorem 4.1 [21]). This leads to the following result about the SSSP problem in the -machine model.

###### Corollary 1.

For any , there is a deterministic -approximation algorithm in the -machine model for solving the SSSP problem in undirected graphs with non-negative edge-weights in rounds.

In addition to SSSP, our clustering algorithms require an efficient solution to a more general problem that we call Multi-Source Shortest Paths (in short, MSSP). The input to MSSP is an edge-weighted graph , with non-negative edge-weights, and a set of sources. The output is required to be, for each vertex , the distance (i.e., ) and the vertex that realizes this distance. The following lemma uses ideas from Thorup [29] to show that MSSP can be reduced to a single call to SSSP and can be solved in an approximate sense in the -machine model in rounds.

###### Lemma 4.

Given a set of sources known to the machines (i.e., each machine knows ), we can, for any value , compute a -approximation to MSSP in rounds, w.h.p. Specifically, after the algorithm has ended, for each , the machine that hosts knows a pair , such that .

###### Proof.

First, as in [29], we add a dummy source vertex , and connecting to each vertex by -weight edges. The shortest path distance from to any other vertex , is same as in the original graph. This dummy source can be hosted by an arbitrary machine and the edge information can be exchanged in rounds

Using Theorem 1, we can compute approximate shortest path distance that satisfies the first property of the lemma, in rounds. By [5] (Section 2.3) we can compute an approximate shortest path tree in addition to approximate distances in the Broadcast Congested Clique in rounds w.h.p. and hence in the -machine model in rounds w.h.p.

Since a tree contains linear (in ) number of edges, all machines can exchange this information in rounds so that every machine knows the computed approximate shortest path tree. Now, each machine can determine locally, for each vertex the vertex which satisfies the properties stated in the lemma. ∎

Note that in the solution to MSSP, for each , . For our algorithms, we also need the solution to a variant of MSSP that we call ExclusiveMSSP in which for each , we are required to output and the vertex that realizes this distance. The following lemma uses ideas from Thorup [29] to show that ExclusiveMSSP can be solved by making calls to a subroutine that solves SSSP.

###### Lemma 5.

Given a set of sources known to the machines (i.e., each machine knows ), we can, for any value , compute a -approximation to ExclusiveMSSP in rounds, w.h.p. Specifically, after the algorithm has ended, for each , the machine that hosts knows a pair , such that .

###### Proof.

Breaking ties by machine ID, each vertex in is assigned a

size bit vector. We create

subsets of by making two sets and for each bit position . The set contains vertices whose bit value is . Note that for all pairs of vertices , there is at least one set such that and . Now we run an MSSP algorithm for each using lemma 4. Now for each vertex is the smallest such that and the vertex is an arbitrary vertex that realizes the distance .

## 4 Facility Location in ~O(n/k) rounds

At the heart of our -machine algorithm for FacLoc is the well-known sequential algorithm of Mettu and Plaxton [25], that computes a 3-approximation for FacLoc. To describe the Mettu-Plaxton algorithm (henceforth, MP algorithm), we need some notation. For each real and vertex , define the “ball” as the set . For each vertex , we define a radius as the solution to the equation . Figure 1 illustrates the definition of (note that is well-defined for every vertex ).

The MP algorithm is the following simple, 2-phase, greedy algorithm: algocf[htbp]

We will work with a slight variant of the MP algorithm, called MP- in [3]. The only difference between the MP algorithm and the MP- algorithm is in the definition of each radius , which is defined for the MP- algorithm, as the value satisfying . (Thus, the MP- algorithm with is just the MP algorithm.)

There are two challenges to implementing the MP- algorithm efficiently in the -machine model (and more generally in a distributed or parallel setting): (i) The calculation of the radius by the machine hosting vertex requires that the machine know distances ; however the distance metric is initially unknown and is too costly to fully calculate, and (ii) the Greedy Phase seems inherently sequential because it considers vertices one-by-one in non-decreasing order of radii; implementing this algorithm as-is would be too slow. In the next three sections, we describe how to overcome these challenges and we end the section with a complete description of our FacLoc algorithm in the -machine model.

### 4.1 Reducing Radius Computation to Neighborhood-Size Computation

To deal with the challenge of computing radii efficiently, without full knowledge of the metric, we use Thorup’s approach [29]. Thorup works in the sequential setting, but like us, he assumes that the distance metric is implicitly specified via an edge-weighted graph. He shows that it is possible to implement the MP algorithm in time on an -edge graph. In other words, it is possible to implement the MP algorithm without computing the full distance metric (e.g., by solving the All Pairs Shortest Path (APSP) problem). We now show how to translate Thorup’s ideas into the -machine model. (We note here that Thorup’s ideas for the FacLoc problem have already been used to design algorithms in “Pregel-like” distributed systems [12].)

For some , we start by discretizing the range of possible radii values using non-negative integer powers of .444Without loss of generality we assume that all numbers in the input, i.e., and , are all at least 1. rounds of preprocessing suffices to normalize the input to satisfy this property. This guarantees that the minimum radius . For any vertex and for any integer , let denote , the size of the neighborhood of within distance . Further, let denote the sum . Now note that if increases from to , then increases by at least . This implies that is a lower bound on . This observation suggests that we might be able to use, as an approximation to , the smallest value for which this lower bound on exceeds . Denote by , this approximation of . In other words, , where is the smallest integer such that . It is not hard to show that is a good approximation to in the following sense.

For all , .

###### Proof.

The values and respectively depend on how and relate to .

Recall that . Following calculations show that can be interpreted as where is rounded up to nearest power of .

 t−1∑i=0qi(v)⋅((1+ϵ)i+1−(1+ϵ)i) =((1+ϵ)t−1)+t−1∑i=1[∣∣B(v,(1+ϵ)i)∖B(v,(1+ϵ)i−1)∣∣⋅((1+ϵ)t−(1+ϵ)i)] =((1+ϵ)t−1)+t∑j=1∑u∈B(v,(1+ϵ)j)∖B(v,(1+ϵ)j−1)(1+ϵ)t−(1+ϵ)j =∑u∈B(v,(1+ϵ)t)((1+ϵ)t−d↑(v,u))

Therefore, we can say that–

 (1+ϵ)α(v,(1+ϵ)t−1)≤t−1∑i=0qi(v)⋅((1+ϵ)i+1−(1+ϵ)i)≤α(v,(1+ϵ)t)

Which implies –

 α(v,(1+ϵ)t−1)≤t−1∑i=0qi(v)⋅((1+ϵ)i+1−(1+ϵ)i)≤α(v,(1+ϵ)t)

Note that by definition of , if then and . Thus, there has to exist a value such that and this is the -value computed by the MP algorithm. Since , the Lemma follows. ∎

From the definition of one can see that in order to compute these values, we only require knowledge of for all , rather than actual distances for all . We now state the high-level -machine model algorithm (Algorithm LABEL:alg:RC) for computing values. algocf[htbp]

In Algorithm LABEL:alg:RC, step 2 is just local computation, so we focus on Step 1 which requires the solution to the problem of computing neighborhood sizes. More specifically, we define the problem NbdSizeComputation as follows: given an edge-weighted graph, with non-negative edge weights, compute the size of for each vertex and positive real . The output to the problem in the -machine model is required to be a distributed data structure (distributed among the machines) such that each machine can answer any query “What is ?” for any and any positive real , using local computation. Note that a “trivial” way of solving NbdSizeComputation is to solve APSP, but as mentioned earlier this is too costly. In the next subsection we show how to solve a “relaxed” version of this problem in the -machine model in rounds, making only calls to a -machine SSSP algorithm.

### 4.2 Neighborhood-Size Estimation in the k-machine Model

To solve NbdSizeComputation efficiently in the -machine model, we turn to an elegant idea due to Cohen [7, 8]. Motivated by certain counting problems, Cohen [7] presents a “size-estimation framework,” a general randomized method in the sequential setting. Cohen’s algorithm starts by assigning to each vertex a rank chosen uniformly from . These ranks induce a random permutation of the vertices. To compute the size estimate of a neighborhood, say , for a vertex and real , Cohen’s algorithm finds the smallest rank of a vertex in . It is then shown (in Section 6, [7]) that the expected value of the smallest rank in is . Thus, in expectation, the reciprocal of the smallest rank in is (almost) identical to . To obtain a good estimate of with high probability, Cohen simply repeats the above-described procedure independently a bunch of times and shows the following concentration result on the average estimator.

###### Theorem 2.

(Cohen [7]) Let be a vertex and a real. For , let denote the smallest rank of a vertex in obtained in the -th repetition of Cohen’s neighborhood-size estimation procedure. Let be the average of . Let . Then, for any ,

 Pr(|^R−μ|≥ϵμ)=exp(−Ω(ϵ2⋅ℓ)).

This theorem implies that repetitions suffice for obtaining -factor estimates w.h.p. of the sizes of for all and all .

Cohen proposes a modified Dijkstra’s SSSP algorithm to find smallest rank vertices in each neighborhood. Let be the vertices of the graph in non-decreasing order of rank. Initiate Dijkstra’s algorithm, first with source , then with source , and so on. During the search with source , if it is detected that for a vertex , for some , then the current search can be “pruned” at . This is because the vertex has ruled out from being the lowest ranked vertex in any of ’s neighborhoods. In fact, this is true not just for , but for all vertices whose shortest paths to pass through . Even though this algorithm performs SSSP computations, the fact that each search is pruned by the results of previous searches makes the overall running time much less than times the worst case running time of an SSSP computation. In particular, by making critical use of the fact that the random vertex ranks induce a random permutation of the vertices, Cohen is able to show that the algorithm runs in time, on -vertex, -edge graphs, w.h.p.

We don’t know how to implement Cohen’s algorithm, as is, efficiently in the -machine model. In particular, it is not clear how to take advantage of pruning that occurs in later searches while simultaneously taking advantage of the parallelism provided by the machines. A naive implementation of Cohen’s algorithm in the -machine model is equivalent to different SSSP computations, which is too expensive. Below, in Algorithm NbdSizeEstimates (Algorithm LABEL:alg:CohenEstimates), we show that we can reduce Cohen’s algorithm to a polylogarithmic number of SSSP computations provided we are willing to relax the requirement that we find the smallest rank in each neighborhood.

The goal of Algorithm LABEL:alg:CohenEstimates is to estimate for all and all . In Step LABEL:alg2:ChooseRanks, each vertex picks a rank uniformly at random from , which is rounded down to the closest value for some integer ( is suitably chosen in the algorithm). In Steps 5-7, in each iteration , , we consider the set of vertices that have rounded rank equal to and solve an instance of the MSSP problem (see Lemma 4) using the vertices in as sources. We repeat the algorithm times for a suitably chosen constant , so that the neighborhood size estimates satisfy the property provided in Theorem 2 with high probability.

Notice that the algorithm’s behavior is not well-defined if a rank falls in the range However, since ranks are chosen uniformly at random from , the probability that the rank of a vertex falls in this range is . By union bound, no rank falls in the interval with probability at least . We condition the correctness proof of this algorithm on this high probability event.

algocf[htbp]

Running time. There are calls to the subroutine solving MSSP. By Corollary 1, each of these calls takes rounds. Since , the overall round complexity of this algorithm in the -machine model is .

Answering queries. At the end of each iteration, each machine holds, for each vertex , the sequence of distances, . Over repetitions, machine holds such sequences for each vertex . Note that each distance is associated with the rounded rank . For any vertex and real , let us denote the query “What is the size of ?” by . To answer query , we consider one of the sequences and find the smallest , such that , and return the rounded rank . To get an estimate that has low relative error, we repeat this over the sequences and compute the average of the ranks computed in each iteration. The estimator is obtained by subtracting from the reciprocal of .

The following lemma shows the correctness of Algorithm LABEL:alg:CohenEstimates in the sense that even though we might not get an approximately correct answer to , the size is guaranteed to be “sandwiched” between the answers to two queries with nearby distances. This guarantee is sufficient to ensure that the RadiusComputation Algorithm produces approximately correct radii (see Section 4.3).

###### Lemma 7.

Let denote for some vertex and real . For any , w.h.p., Algorithm LABEL:alg:CohenEstimates satisfies the following properties:

• for the query , the algorithm returns an answer that is at most .

• for the query , the algorithm returns an answer that is at least .

###### Proof.

Fix a particular repetition , , of the algorithm and a ranking of the vertices. Let denote the smallest rank in in repetition . To answer query , the algorithm examines the sequence of approximate distances , finds the smallest such that , and uses as an approximation for . Since there is a vertex such that . Since we compute a -approximate solution to MSSP, the actual distance . Thus the rank of is at least and therefore the rounded-rank of is at least . Since , the rounded-rank of is simply and so we get that .

Over all repetitions, the algorithm computes the average of the sequence . Letting denote the average of over all repetitions, we see that . From Theorem 2, we know that w.h.p. . Combining these two inequalities, we get

 1¯¯¯¯R ≤ (1+ϵ′1−ϵ′)⋅(s+1) 1¯¯¯¯R−1 ≤ (1+ϵ′1−ϵ′)⋅s+(2ϵ′1−ϵ′) ≤ (1+3ϵ′1−ϵ′)⋅s ≤ (1+ϵ)⋅s.

The second last inequality above follows from the fact , since . The last inequality follows from the setting .

Now we consider query . Again, fix a repetition , , of the algorithm and a ranking of the vertices. Let be a vertex with rank equal to . We get two immediate implications: (i) the rounded-rank of is at most and (ii) . Together these imply that , the approximate rank computed by the algorithm in repetition is at most . Averaging over all repetitions we get that . Using Theorem 2, we know that w.h.p. . Combining these two inequalities, we that get . This leads to

 1¯¯¯¯R−1 ≥ (1+s1+ϵ′)−1 ≥ (1−ϵ′1+ϵ′)⋅s ≥ s1+ϵ.

The second last inequality follows from the fact that . A little bit of algebra shows that implies that and the last inequality follows from this. ∎

### 4.3 Radius Computation Revisited

Having designed a -machine algorithm that returns approximate neighborhood-size estimates we restate the RadiusComputation algorithm (Algorithm LABEL:alg:RC) below. algocf[htbp]

We show below that even though the computed neighborhood-sizes are approximate, in the sense of Lemma 7, the radii that are computed by the RadiusComputation algorithm (Version 2) are a close approximation of the actual radii.

For every , .

###### Proof.

By Lemma 7, we have the following bounds on :

 1(1+ϵ)qi−1(v)≤~qi(v)≤(1+ϵ)qi+1(v)

Similar bounds will apply for the terms . Adding the respective inequalities for these terms, yields the following inequality:

 t−1∑i=0((1+ϵ)i−(1+ϵ)i−1)qi−1(v)≤t−1∑i=0((1+ϵ)i+1−(1+ϵ)i)~qi(v)≤t−1∑i=0((1+ϵ)i+2−(1+ϵ)i+1)qi+1(v).

Now we obtain the following bound using similar arguments as in Lemma 6:

 α(v,(1+ϵ)t−2)≤t−1∑i=0~qi(v)⋅((1+ϵ)i+1−(1+ϵ)i)≤α(v,(1+ϵ)t+1).

This means that there must exist a value such that . The lemma follows since . ∎

### 4.4 Implementing the Greedy Phase

Referring to the two phases in the MP Algorithm (Algorithm LABEL:alg:MP), we have now completed the implementation of the Radius Computation Phase in the -machine model. Turning to the Greedy Phase, we note that discretizing the radius values results in distinct values. If we can efficiently process each batch of vertices with the same (rounded) radius in the -machine model, that would yield an efficient -machine implementation of the Greedy Phase as well. Consider the set of vertices with (rounded) radius . Note that a set is opened as facilities by the Greedy Phase iff satisfies two properties: (i) for any two vertices , and (ii) for any , . Thus the set can be identified by computing a maximal independent set (MIS) in the graph , where