Log In Sign Up

Unexpected Effects of Online K-means Clustering

by   Michal Moshkovitz, et al.

In this paper we study k-means clustering in the online setting. In the offline setting the main parameters are number of centers, k, and size of the dataset, n. Performance guarantees are given as a function of these parameters. In the online setting new factors come into place: the ordering of the dataset and whether n is known in advance or not. One of the main results of this paper is the discovery that these new factors have dramatic effects on the quality of the clustering algorithms. For example, for constant k: (1) Ω(n) centers are needed if the order is arbitrary, (2) if the order is random and n is unknown in advance, the number of centers reduces to Θ(logn), and (3) if n is known, then the number of centers reduces to a constant. For different values of the new factors, we show upper and lower bounds that are exactly the same up to a constant, thus achieving optimal bounds.


Achieving anonymity via weak lower bound constraints for k-median and k-means

We study k-clustering problems with lower bounds, including k-median and...

No-substitution k-means Clustering with Adversarial Order

We investigate k-means clustering in the online no-substitution setting ...

Consistent k-Clustering for General Metrics

Given a stream of points in a metric space, is it possible to maintain a...

A Constant Approximation Algorithm for Sequential No-Substitution k-Median Clustering under a Random Arrival Order

We study k-median clustering under the sequential no-substitution settin...

Sequential no-Substitution k-Median-Clustering

We study the sample-based k-median clustering objective under a sequenti...

Simple Random Order Contention Resolution for Graphic Matroids with Almost no Prior Information

Random order online contention resolution schemes (ROCRS) are structured...

Strong Consistency for a Class of Adaptive Clustering Procedures

We introduce a class of clustering procedures which includes k-means and...