Near-optimal Coresets for Robust Clustering

10/19/2022
by   Lingxiao Huang, et al.
0

We consider robust clustering problems in ℝ^d, specifically k-clustering problems (e.g., k-Median and k-Means with m outliers, where the cost for a given center set C ⊂ℝ^d aggregates the distances from C to all but the furthest m data points, instead of all points as in classical clustering. We focus on the ϵ-coreset for robust clustering, a small proxy of the dataset that preserves the clustering cost within ϵ-relative error for all center sets. Our main result is an ϵ-coreset of size O(m + poly(k ϵ^-1)) that can be constructed in near-linear time. This significantly improves previous results, which either suffers an exponential dependence on (m + k) [Feldman and Schulman, SODA'12], or has a weaker bi-criteria guarantee [Huang et al., FOCS'18]. Furthermore, we show this dependence in m is nearly-optimal, and the fact that it is isolated from other factors may be crucial for dealing with large number of outliers. We construct our coresets by adapting to the outlier setting a recent framework [Braverman et al., FOCS'22] which was designed for capacity-constrained clustering, overcoming a new challenge that the participating terms in the cost, particularly the excluded m outlier points, are dependent on the center set C. We validate our coresets on various datasets, and we observe a superior size-accuracy tradeoff compared with popular baselines including uniform sampling and sensitivity sampling. We also achieve a significant speedup of existing approximation algorithms for robust clustering using our coresets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/18/2018

Distributed k-Clustering for Data with Heavy Noise

In this paper, we consider the k-center/median/means clustering with out...
research
06/30/2021

Coresets for Clustering with Missing Values

We provide the first coreset for clustering points in ℝ^d that have mult...
research
02/22/2023

Improved Coresets for Clustering with Capacity and Fairness Constraints

We study coresets for clustering with capacity and fairness constraints....
research
10/06/2021

Coresets for Kernel Clustering

We devise the first coreset for kernel k-Means, and use it to obtain new...
research
01/20/2023

Coresets for Clustering with General Assignment Constraints

Designing small-sized coresets, which approximately preserve the costs o...
research
04/25/2018

Bi-criteria Approximation Algorithms for Minimum Enclosing Ball and k-Center Clustering with Outliers

Motivated by the arising realistic issues in big data, the problem of Mi...
research
09/30/2020

Clustering under Perturbation Stability in Near-Linear Time

We consider the problem of center-based clustering in low-dimensional Eu...

Please sign up or login with your details

Forgot password? Click here to reset