On Coresets for Clustering in Small Dimensional Euclidean Spaces

02/27/2023
by   Lingxiao Huang, et al.
0

We consider the problem of constructing small coresets for k-Median in Euclidean spaces. Given a large set of data points P⊂ℝ^d, a coreset is a much smaller set S⊂ℝ^d, so that the k-Median costs of any k centers w.r.t. P and S are close. Existing literature mainly focuses on the high-dimension case and there has been great success in obtaining dimension-independent bounds, whereas the case for small d is largely unexplored. Considering many applications of Euclidean clustering algorithms are in small dimensions and the lack of systematic studies in the current literature, this paper investigates coresets for k-Median in small dimensions. For small d, a natural question is whether existing near-optimal dimension-independent bounds can be significantly improved. We provide affirmative answers to this question for a range of parameters. Moreover, new lower bound results are also proved, which are the highest for small d. In particular, we completely settle the coreset size bound for 1-d k-Median (up to log factors). Interestingly, our results imply a strong separation between 1-d 1-Median and 1-d 2-Median. As far as we know, this is the first such separation between k=1 and k=2 in any dimension.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/03/2017

The Bane of Low-Dimensionality Clustering

In this paper, we give a conditional lower bound of n^Ω(k) on running ti...
research
11/15/2022

Improved Coresets for Euclidean k-Means

Given a set of n points in d dimensions, the Euclidean k-means problem (...
research
06/04/2021

On the Strategyproofness of the Geometric Median

The geometric median of a tuple of vectors is the vector that minimizes ...
research
05/12/2023

Parameterized Approximation for Robust Clustering in Discrete Geometric Spaces

We consider the well-studied Robust (k, z)-Clustering problem, which gen...
research
01/20/2023

Coresets for Clustering with General Assignment Constraints

Designing small-sized coresets, which approximately preserve the costs o...
research
12/12/2012

Optimal Time Bounds for Approximate Clustering

Clustering is a fundamental problem in unsupervised learning, and has be...
research
02/03/2021

CountSketches, Feature Hashing and the Median of Three

In this paper, we revisit the classic CountSketch method, which is a spa...

Please sign up or login with your details

Forgot password? Click here to reset