Optimal Fully Dynamic k-Centers Clustering
We present the first algorithm for fully dynamic k-centers clustering in an arbitrary metric space that maintains an optimal 2+ϵ approximation in O(k ·polylog(n,Δ)) amortized update time. Here, n is an upper bound on the number of active points at any time, and Δ is the aspect ratio of the data. Previously, the best known amortized update time was O(k^2·polylog(n,Δ)), and is due to Chan, Gourqin, and Sozio. We demonstrate that the runtime of our algorithm is optimal up to polylog(n,Δ) factors, even for insertion-only streams, which closes the complexity of fully dynamic k-centers clustering. In particular, we prove that any algorithm for k-clustering tasks in arbitrary metric spaces, including k-means, k-medians, and k-centers, must make at least Ω(n k) distance queries to achieve any non-trivial approximation factor. Despite the lower bound for arbitrary metrics, we demonstrate that an update time sublinear in k is possible for metric spaces which admit locally sensitive hash functions (LSH). Namely, we demonstrate a black-box transformation which takes a locally sensitive hash family for a metric space and produces a faster fully dynamic k-centers algorithm for that space. In particular, for a large class of metrics including Euclidean space, ℓ_p spaces, the Hamming Metric, and the Jaccard Metric, for any c > 1, our results yield a c(4+ϵ) approximate k-centers solution in O(n^1/c·polylog(n,Δ)) amortized update time, simultaneously for all k ≥ 1. Previously, the only known comparable result was a O(c log n) approximation for Euclidean space due to Schmidt and Sohler, running in the same amortized update time.
READ FULL TEXT