Improved Learning-augmented Algorithms for k-means and k-medians Clustering

10/31/2022
by   Thy Nguyen, et al.
0

We consider the problem of clustering in the learning-augmented setting, where we are given a data set in d-dimensional Euclidean space, and a label for each data point given by an oracle indicating what subsets of points should be clustered together. This setting captures situations where we have access to some auxiliary information about the data set relevant for our clustering objective, for instance the labels output by a neural network. Following prior work, we assume that there are at most an α∈ (0,c) for some c<1 fraction of false positives and false negatives in each predicted cluster, in the absence of which the labels would attain the optimal clustering cost OPT. For a dataset of size m, we propose a deterministic k-means algorithm that produces centers with improved bound on clustering cost compared to the previous randomized algorithm while preserving the O( d m log m) runtime. Furthermore, our algorithm works even when the predictions are not very accurate, i.e. our bound holds for α up to 1/2, an improvement over α being at most 1/7 in the previous work. For the k-medians problem we improve upon prior work by achieving a biquadratic improvement in the dependence of the approximation factor on the accuracy parameter α to get a cost of (1+O(α))OPT, while requiring essentially just O(md log^3 m/α) runtime.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/26/2022

Polylogarithmic Sketches for Clustering

Given n points in ℓ_p^d, we consider the problem of partitioning points ...
research
10/27/2021

Learning-Augmented k-means Clustering

k-means clustering is a well-studied problem due to its wide applicabili...
research
05/31/2021

Locally Private k-Means Clustering with Constant Multiplicative Approximation and Near-Optimal Additive Error

Given a data set of size n in d'-dimensional Euclidean space, the k-mean...
research
03/02/2018

Semi-Supervised Algorithms for Approximately Optimal and Accurate Clustering

We study k-means clustering in a semi-supervised setting. Given an oracl...
research
05/01/2022

The Johnson-Lindenstrauss Lemma for Clustering and Subspace Approximation: From Coresets to Dimension Reduction

We study the effect of Johnson-Lindenstrauss transforms in various Eucli...
research
02/01/2016

Semi-supervised K-means++

Traditionally, practitioners initialize the k-means algorithm with cent...
research
06/15/2021

Learning-based Support Estimation in Sublinear Time

We consider the problem of estimating the number of distinct elements in...

Please sign up or login with your details

Forgot password? Click here to reset