Learning-Augmented k-means Clustering

10/27/2021
by   Jon Ergun, et al.
0

k-means clustering is a well-studied problem due to its wide applicability. Unfortunately, there exist strong theoretical limits on the performance of any algorithm for the k-means problem on worst-case inputs. To overcome this barrier, we consider a scenario where "advice" is provided to help perform clustering. Specifically, we consider the k-means problem augmented with a predictor that, given any point, returns its cluster label in an approximately optimal clustering up to some, possibly adversarial, error. We present an algorithm whose performance improves along with the accuracy of the predictor, even though naïvely following the accurate predictor can still lead to a high clustering cost. Thus if the predictor is sufficiently accurate, we can retrieve a close to optimal clustering with nearly optimal runtime, breaking known computational barriers for algorithms that do not have access to such advice. We evaluate our algorithms on real datasets and show significant improvements in the quality of clustering.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/31/2022

Improved Learning-augmented Algorithms for k-means and k-medians Clustering

We consider the problem of clustering in the learning-augmented setting,...
research
02/28/2020

Explainable k-Means and k-Medians Clustering

Clustering is a popular form of unsupervised learning for geometric data...
research
06/15/2018

Query K-means Clustering and the Double Dixie Cup Problem

We consider the problem of approximate K-means clustering with outliers ...
research
12/04/2017

Clustering Stable Instances of Euclidean k-means

The Euclidean k-means problem is arguably the most widely-studied cluste...
research
05/22/2017

Improved Clustering with Augmented k-means

Identifying a set of homogeneous clusters in a heterogeneous dataset is ...
research
08/04/2020

Out-of-Distribution Generalization with Maximal Invariant Predictor

Out-of-Distribution (OOD) generalization problem is a problem of seeking...
research
09/19/2018

Data-Driven Clustering via Parameterized Lloyd's Families

Algorithms for clustering points in metric spaces is a long-studied area...

Please sign up or login with your details

Forgot password? Click here to reset