Coresets for Clustering with General Assignment Constraints
Designing small-sized coresets, which approximately preserve the costs of the solutions for large datasets, has been an important research direction for the past decade. We consider coreset construction for a variety of general constrained clustering problems. We introduce a general class of assignment constraints, including capacity constraints on cluster centers, and assignment structure constraints for data points (modeled by a convex body ℬ). We give coresets for constrained clustering problems with such general assignment constraints, significantly generalizing known coreset results for constrained clustering. Notable implications of our general theorem include the first ϵ-coreset for capacitated and fair k-Median with m outliers in Euclidean spaces whose size is Õ(m + k^2 ϵ^-4), generalizing and improving upon the prior bounds in [Braverman et al., FOCS'22; Huang et al., ICLR'23] (for capacitated k-Median, the coreset size bound obtained in [Braverman et al., FOCS'22] is Õ(k^3 ϵ^-6), and for k-Median with m outliers, the coreset size bound obtained in [Huang et al., ICLR'23] is Õ(m + k^3 ϵ^-5)), and the first ϵ-coreset of size poly(k ϵ^-1) for fault-tolerant clustering for metric spaces with bounded covering exponent.
READ FULL TEXT