Nearly-Tight and Oblivious Algorithms for Explainable Clustering

by   Buddhima Gamlath, et al.

We study the problem of explainable clustering in the setting first formalized by Moshkovitz, Dasgupta, Rashtchian, and Frost (ICML 2020). A k-clustering is said to be explainable if it is given by a decision tree where each internal node splits data points with a threshold cut in a single dimension (feature), and each of the k leaves corresponds to a cluster. We give an algorithm that outputs an explainable clustering that loses at most a factor of O(log^2 k) compared to an optimal (not necessarily explainable) clustering for the k-medians objective, and a factor of O(k log^2 k) for the k-means objective. This improves over the previous best upper bounds of O(k) and O(k^2), respectively, and nearly matches the previous Ω(log k) lower bound for k-medians and our new Ω(k) lower bound for k-means. The algorithm is remarkably simple. In particular, given an initial not necessarily explainable clustering in ℝ^d, it is oblivious to the data points and runs in time O(dk log^2 k), independent of the number of data points n. Our upper and lower bounds also generalize to objectives given by higher ℓ_p-norms.


page 1

page 2

page 3

page 4


Almost Tight Approximation Algorithms for Explainable Clustering

Recently, due to an increasing interest for transparency in artificial i...

Near-Optimal Explainable k-Means for All Dimensions

Many clustering algorithms are guided by certain cost functions such as ...

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel k-means Clustering

We present tight lower bounds on the number of kernel evaluations requir...

Near-optimal Algorithms for Explainable k-Medians and k-Means

We consider the problem of explainable k-medians and k-means introduced ...

Exponential Weights Algorithms for Selective Learning

We study the selective learning problem introduced by Qiao and Valiant (...

Optimal Time Bounds for Approximate Clustering

Clustering is a fundamental problem in unsupervised learning, and has be...

Monte Carlo approximation certificates for k-means clustering

Efficient algorithms for k-means clustering frequently converge to subop...