The Power of Uniform Sampling for Coresets

09/05/2022
by   Vladimir Braverman, et al.
0

Motivated by practical generalizations of the classic k-median and k-means objectives, such as clustering with size constraints, fair clustering, and Wasserstein barycenter, we introduce a meta-theorem for designing coresets for constrained-clustering problems. The meta-theorem reduces the task of coreset construction to one on a bounded number of ring instances with a much-relaxed additive error. This reduction enables us to construct coresets using uniform sampling, in contrast to the widely-used importance sampling, and consequently we can easily handle constrained objectives. Notably and perhaps surprisingly, this simpler sampling scheme can yield coresets whose size is independent of n, the number of input points. Our technique yields smaller coresets, and sometimes the first coresets, for a large number of constrained clustering problems, including capacitated clustering, fair clustering, Euclidean Wasserstein barycenter, clustering in minor-excluded graph, and polygon clustering under Fréchet and Hausdorff distance. Finally, our technique yields also smaller coresets for 1-median in low-dimensional Euclidean spaces, specifically of size Õ(ε^-1.5) in ℝ^2 and Õ(ε^-1.6) in ℝ^3.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/14/2020

Coresets for Clustering in Euclidean Spaces: Importance Sampling is Nearly Optimal

Given a collection of n points in ℝ^d, the goal of the (k,z)-clustering ...
research
12/19/2018

Approximation Schemes for Capacitated Clustering in Doubling Metrics

Motivated by applications in redistricting, we consider the uniform capa...
research
03/21/2022

Coresets for Weight-Constrained Anisotropic Assignment and Clustering

The present paper constructs coresets for weight-constrained anisotropic...
research
06/14/2021

Coresets for constrained k-median and k-means clustering in low dimensional Euclidean space

We study (Euclidean) k-median and k-means with constraints in the stream...
research
04/16/2020

Coresets for Clustering in Excluded-minor Graphs and Beyond

Coresets are modern data-reduction tools that are widely used in data an...
research
01/20/2023

Coresets for Clustering with General Assignment Constraints

Designing small-sized coresets, which approximately preserve the costs o...
research
11/08/2021

Approximating Fair Clustering with Cascaded Norm Objectives

We introduce the (p,q)-Fair Clustering problem. In this problem, we are ...

Please sign up or login with your details

Forgot password? Click here to reset