On Optimal Coreset Construction for Euclidean (k,z)-Clustering

11/22/2022
by   Lingxiao Huang, et al.
0

Constructing small-sized coresets for various clustering problems in Euclidean spaces has attracted significant attention for the past decade. A central problem in the coreset literature is to understand what is the best possible coreset size for (k,z)-clustering in Euclidean space. While there has been significant progress in the problem, there is still a gap between the state-of-the-art upper and lower bounds. For instance, the best known upper bound for k-means (z=2) is min{O(k^3/2ε^-2),O(k ε^-4)} [1,2], while the best known lower bound is Ω(kε^-2) [1]. In this paper, we make significant progress on both upper and lower bounds. For a large range of parameters (i.e., ε, k), we have a complete understanding of the optimal coreset size. In particular, we obtain the following results: (1) We present a new coreset lower bound Ω(k ε^-z-2) for Euclidean (k,z)-clustering when ε≥Ω(k^-1/(z+2)). In view of the prior upper bound Õ_z(k ε^-z-2) [1], the bound is optimal. The new lower bound is surprising since Ω(kε^-2) [1] is “conjectured" to be the correct bound in some recent works (see e.g., [1,2]]). (2) For the upper bound, we provide efficient coreset construction algorithms for Euclidean (k,z)-clustering with improved coreset sizes. In particular, we provide an Õ_z(k^2z+2/z+2ε^-2)-sized coreset, with a unfied analysis, for (k,z)-clustering for all z≥ 1 in Euclidean space. [1] Cohen-Addad, Larsen, Saulpic, Schwiegelshohn. STOC'22. [2] Cohen-Addad, Larsen, Saulpic, Schwiegelshohn, Sheikh-Omar, NeurIPS'22.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/13/2017

Fractal dimension and lower bounds for geometric problems

We study the complexity of geometric problems on spaces of low fractal d...
research
02/19/2020

Improved Approximate Degree Bounds For k-distinctness

An open problem that is widely regarded as one of the most important in ...
research
09/12/2022

An Improved Lower Bound for Matroid Intersection Prophet Inequalities

We consider prophet inequalities subject to feasibility constraints that...
research
08/30/2022

Lower bound for constant-size local certification

Given a network property or a data structure, a local certification is a...
research
02/23/2023

Logistic Regression and Classification with non-Euclidean Covariates

We introduce a logistic regression model for data pairs consisting of a ...
research
11/03/2017

The Bane of Low-Dimensionality Clustering

In this paper, we give a conditional lower bound of n^Ω(k) on running ti...
research
11/16/2021

Larger Corner-Free Sets from Combinatorial Degenerations

There is a large and important collection of Ramsey-type combinatorial p...

Please sign up or login with your details

Forgot password? Click here to reset