Streaming PTAS for Constrained k-Means

by   Dishant Goyal, et al.

We generalise the results of Bhattacharya et al. (Journal of Computing Systems, 62(1):93-115, 2018) for the list-k-means problem defined as – for a (unknown) partition X_1, ..., X_k of the dataset X ⊆R^d, find a list of k-center sets (each element in the list is a set of k centers) such that at least one of k-center sets {c_1, ..., c_k} in the list gives an (1+ε)-approximation with respect to the cost function min_permutation π[ ∑_i=1^k∑_x ∈ X_i ||x - c_π(i)||^2 ]. The list-k-means problem is important for the constrained k-means problem since algorithms for the former can be converted to PTAS for various versions of the latter. Following are the consequences of our generalisations: - Streaming algorithm: Our D^2-sampling based algorithm running in a single iteration allows us to design a 2-pass, logspace streaming algorithm for the list-k-means problem. This can be converted to a 4-pass, logspace streaming PTAS for various constrained versions of the k-means problem. To the best of our knowledge, these are the first constant pass, logspace streaming PTASs for constrained versions of the k-means problem. - Faster PTAS under stability: Our generalisation is also useful in k-means clustering scenarios where finding good centers becomes easy once good centers for a few "bad" clusters have been chosen. One such scenario is clustering under stability where the number of such bad clusters is a constant. Using the above idea, we significantly improve the running time of the known algorithm from O(dn^3) (k logn)^poly(1/β, 1/ε) to O (dn^3 k^Õ_βε(1/βε)).


page 1

page 2

page 3

page 4


Streaming PTAS for Binary ℓ_0-Low Rank Approximation

We give a 3-pass, polylog-space streaming PTAS for the constrained binar...

FPT Approximation for Constrained Metric k-Median/Means

The Metric k-median problem over a metric space (𝒳, d) is defined as fol...

Coreset-based Strategies for Robust Center-type Problems

Given a dataset V of points from some metric space, the popular k-center...

Coresets for constrained k-median and k-means clustering in low dimensional Euclidean space

We study (Euclidean) k-median and k-means with constraints in the stream...

Polylogarithmic Sketches for Clustering

Given n points in ℓ_p^d, we consider the problem of partitioning points ...

Speeding Up Constrained k-Means Through 2-Means

For the constrained 2-means problem, we present a O(dn+d(1ϵ)^O(1ϵ) n) ti...

Learning Mixture of Gaussians with Streaming Data

In this paper, we study the problem of learning a mixture of Gaussians w...

Please sign up or login with your details

Forgot password? Click here to reset