Fully Scalable MPC Algorithms for Clustering in High Dimension

07/15/2023
by   Artur Czumaj, et al.
0

We design new algorithms for k-clustering in high-dimensional Euclidean spaces. These algorithms run in the Massively Parallel Computation (MPC) model, and are fully scalable, meaning that the local memory in each machine is n^σ for arbitrarily small fixed σ>0. Importantly, the local memory may be substantially smaller than k. Our algorithms take O(1) rounds and achieve O(1)-bicriteria approximation for k-Median and for k-Means, namely, they compute (1+ε)k clusters of cost within O(1/ε^2)-factor of the optimum. Previous work achieves only poly(log n)-bicriteria approximation [Bhaskara et al., ICML'18], or handles a special case [Cohen-Addad et al., ICML'22]. Our results rely on an MPC algorithm for O(1)-approximation of facility location in O(1) rounds. A primary technical tool that we develop, and may be of independent interest, is a new MPC primitive for geometric aggregation, namely, computing certain statistics on an approximate neighborhood of every data point, which includes range counting and nearest-neighbor search. Our implementation of this primitive works in high dimension, and is based on consistent hashing (aka sparse partition), a technique that was recently used for streaming algorithms [Czumaj et al., FOCS'22].

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/12/2023

On Parallel k-Center Clustering

We consider the classic k-center problem in a parallel setting, on the l...
research
09/26/2020

Sample-and-Gather: Fast Ruling Set Algorithms in the Low-Memory MPC Model

Motivated by recent progress on symmetry breaking problems such as maxim...
research
06/04/2021

Massively Parallel and Dynamic Algorithms for Minimum Size Clustering

In this paper, we study the r-gather problem, a natural formulation of m...
research
10/04/2017

Massively Parallel Algorithms and Hardness for Single-Linkage Clustering Under ℓ_p-Distances

We present massively parallel (MPC) algorithms and hardness of approxima...
research
11/15/2018

Large-Scale Distributed Algorithms for Facility Location with Outliers

This paper presents fast, distributed, O(1)-approximation algorithms for...
research
07/02/2023

Massively Parallel Algorithms for the Stochastic Block Model

Learning the community structure of a large-scale graph is a fundamental...
research
07/13/2023

Breaking 3-Factor Approximation for Correlation Clustering in Polylogarithmic Rounds

In this paper, we study parallel algorithms for the correlation clusteri...

Please sign up or login with your details

Forgot password? Click here to reset