On Optimal Coreset Construction for Euclidean (k,z)-Clustering
Constructing small-sized coresets for various clustering problems in Euclidean spaces has attracted significant attention for the past decade. A central problem in the coreset literature is to understand what is the best possible coreset size for (k,z)-clustering in Euclidean space. While there has been significant progress in the problem, there is still a gap between the state-of-the-art upper and lower bounds. For instance, the best known upper bound for k-means (z=2) is min{O(k^3/2ε^-2),O(k ε^-4)} [1,2], while the best known lower bound is Ω(kε^-2) [1]. In this paper, we make significant progress on both upper and lower bounds. For a large range of parameters (i.e., ε, k), we have a complete understanding of the optimal coreset size. In particular, we obtain the following results: (1) We present a new coreset lower bound Ω(k ε^-z-2) for Euclidean (k,z)-clustering when ε≥Ω(k^-1/(z+2)). In view of the prior upper bound Õ_z(k ε^-z-2) [1], the bound is optimal. The new lower bound is surprising since Ω(kε^-2) [1] is “conjectured" to be the correct bound in some recent works (see e.g., [1,2]]). (2) For the upper bound, we provide efficient coreset construction algorithms for Euclidean (k,z)-clustering with improved coreset sizes. In particular, we provide an Õ_z(k^2z+2/z+2ε^-2)-sized coreset, with a unfied analysis, for (k,z)-clustering for all z≥ 1 in Euclidean space. [1] Cohen-Addad, Larsen, Saulpic, Schwiegelshohn. STOC'22. [2] Cohen-Addad, Larsen, Saulpic, Schwiegelshohn, Sheikh-Omar, NeurIPS'22.
READ FULL TEXT