Polylogarithmic Sketches for Clustering

04/26/2022
by   Moses Charikar, et al.
0

Given n points in ℓ_p^d, we consider the problem of partitioning points into k clusters with associated centers. The cost of a clustering is the sum of p^th powers of distances of points to their cluster centers. For p ∈ [1,2], we design sketches of size poly(log(nd),k,1/ϵ) such that the cost of the optimal clustering can be estimated to within factor 1+ϵ, despite the fact that the compressed representation does not contain enough information to recover the cluster centers or the partition into clusters. This leads to a streaming algorithm for estimating the clustering cost with space poly(log(nd),k,1/ϵ). We also obtain a distributed memory algorithm, where the n points are arbitrarily partitioned amongst m machines, each of which sends information to a central party who then computes an approximation of the clustering cost. Prior to this work, no such streaming or distributed-memory algorithm was known with sublinear dependence on d for p ∈ [1,2).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/02/2019

Streaming Balanced Clustering

Clustering of data points in metric space is among the most fundamental ...
research
10/31/2022

Improved Learning-augmented Algorithms for k-means and k-medians Clustering

We consider the problem of clustering in the learning-augmented setting,...
research
06/04/2021

Massively Parallel and Dynamic Algorithms for Minimum Size Clustering

In this paper, we study the r-gather problem, a natural formulation of m...
research
09/16/2019

Streaming PTAS for Constrained k-Means

We generalise the results of Bhattacharya et al. (Journal of Computing S...
research
06/29/2021

Near-Optimal Explainable k-Means for All Dimensions

Many clustering algorithms are guided by certain cost functions such as ...
research
06/23/2020

BETULA: Numerically Stable CF-Trees for BIRCH Clustering

BIRCH clustering is a widely known approach for clustering, that has inf...
research
07/04/2020

Cluster Prediction for Opinion Dynamics from Partial Observations

We present a Bayesian approach to predict the clustering of opinions for...

Please sign up or login with your details

Forgot password? Click here to reset