Towards Optimal Lower Bounds for k-median and k-means Coresets

02/25/2022
by   Vincent Cohen-Addad, et al.
0

Given a set of points in a metric space, the (k,z)-clustering problem consists of finding a set of k points called centers, such that the sum of distances raised to the power of z of every data point to its closest center is minimized. Special cases include the famous k-median problem (z = 1) and k-means problem (z = 2). The k-median and k-means problems are at the heart of modern data analysis and massive data applications have given raise to the notion of coreset: a small (weighted) subset of the input point set preserving the cost of any solution to the problem up to a multiplicative (1 ±ε) factor, hence reducing from large to small scale the input to the problem. In this paper, we present improved lower bounds for coresets in various metric spaces. In finite metrics consisting of n points and doubling metrics with doubling constant D, we show that any coreset for (k,z) clustering must consist of at least Ω(k ε^-2log n) and Ω(k ε^-2 D) points, respectively. Both bounds match previous upper bounds up to polylog factors. In Euclidean spaces, we show that any coreset for (k,z) clustering must consists of at least Ω(kε^-2) points. We complement these lower bounds with a coreset construction consisting of at most Õ(kε^-2·min(ε^-z,k)) points.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/15/2022

Improved Coresets for Euclidean k-Means

Given a set of n points in d dimensions, the Euclidean k-means problem (...
research
09/07/2020

Achieving anonymity via weak lower bound constraints for k-median and k-means

We study k-clustering problems with lower bounds, including k-median and...
research
04/13/2021

A New Coreset Framework for Clustering

Given a metric space, the (k,z)-clustering problem consists of finding k...
research
04/14/2020

Coresets for Clustering in Euclidean Spaces: Importance Sampling is Nearly Optimal

Given a collection of n points in ℝ^d, the goal of the (k,z)-clustering ...
research
07/31/2017

Temporal Hierarchical Clustering

We study hierarchical clusterings of metric spaces that change over time...
research
10/01/2018

Topological Stability of Kinetic k-Centers

We study the k-center problem in a kinetic setting: given a set of conti...
research
08/14/2022

Exact Exponential Algorithms for Clustering Problems

In this paper we initiate a systematic study of exact algorithms for wel...

Please sign up or login with your details

Forgot password? Click here to reset