Near-Optimal Clustering in the k-machine model

10/23/2017
by   Sayan Bandyapadhyay, et al.
0

The clustering problem, in its many variants, has numerous applications in operations research and computer science (e.g., in applications in bioinformatics, image processing, social network analysis, etc.). As sizes of data sets have grown rapidly, researchers have focused on designing algorithms for clustering problems in models of computation suited for large-scale computation such as MapReduce, Pregel, and streaming models. The k-machine model (Klauck et al., SODA 2015) is a simple, message-passing model for large-scale distributed graph processing. This paper considers three of the most prominent examples of clustering problems: the uncapacitated facility location problem, the p-median problem, and the p-center problem and presents O(1)-factor approximation algorithms for these problems running in Õ(n/k) rounds in the k-machine model. These algorithms are optimal up to polylogarithmic factors because this paper also shows Ω̃(n/k) lower bounds for obtaining polynomial-factor approximation algorithms for these problems. These are the first results for clustering problems in the k-machine model. We assume that the metric provided as input for these clustering problems in only implicitly provided, as an edge-weighted graph and in a nutshell, our main technical contribution is to show that constant-factor approximation algorithms for all three clustering problems can be obtained by learning only a small portion of the input metric.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/15/2018

Large-Scale Distributed Algorithms for Facility Location with Outliers

This paper presents fast, distributed, O(1)-approximation algorithms for...
research
06/05/2023

Near-Optimal Quantum Coreset Construction Algorithms for Clustering

k-Clustering in ℝ^d (e.g., k-median and k-means) is a fundamental machin...
research
12/14/2021

On fully dynamic constant-factor approximation algorithms for clustering problems

Clustering is an important task with applications in many fields of comp...
research
06/08/2023

Faster Approximation Algorithms for Parameterized Graph Clustering and Edge Labeling

Graph clustering is a fundamental task in network analysis where the goa...
research
06/05/2018

A Projection Method for Metric-Constrained Optimization

We outline a new approach for solving optimization problems which enforc...
research
11/20/2021

Faster Deterministic Approximation Algorithms for Correlation Clustering and Cluster Deletion

Correlation clustering is a framework for partitioning datasets based on...
research
12/05/2022

Two 6-approximation Algorithms for the Stochastic Score Classification Problem

We study the arbitrary cost case of the unweighted Stochastic Score Clas...

Please sign up or login with your details

Forgot password? Click here to reset