Unexpected Effects of Online K-means Clustering

08/09/2019
by   Michal Moshkovitz, et al.
0

In this paper we study k-means clustering in the online setting. In the offline setting the main parameters are number of centers, k, and size of the dataset, n. Performance guarantees are given as a function of these parameters. In the online setting new factors come into place: the ordering of the dataset and whether n is known in advance or not. One of the main results of this paper is the discovery that these new factors have dramatic effects on the quality of the clustering algorithms. For example, for constant k: (1) Ω(n) centers are needed if the order is arbitrary, (2) if the order is random and n is unknown in advance, the number of centers reduces to Θ(logn), and (3) if n is known, then the number of centers reduces to a constant. For different values of the new factors, we show upper and lower bounds that are exactly the same up to a constant, thus achieving optimal bounds.

READ FULL TEXT
research
09/07/2020

Achieving anonymity via weak lower bound constraints for k-median and k-means

We study k-clustering problems with lower bounds, including k-median and...
research
12/28/2020

No-substitution k-means Clustering with Adversarial Order

We investigate k-means clustering in the online no-substitution setting ...
research
11/13/2020

Consistent k-Clustering for General Metrics

Given a stream of points in a metric space, is it possible to maintain a...
research
02/08/2021

A Constant Approximation Algorithm for Sequential No-Substitution k-Median Clustering under a Random Arrival Order

We study k-median clustering under the sequential no-substitution settin...
research
05/30/2019

Sequential no-Substitution k-Median-Clustering

We study the sample-based k-median clustering objective under a sequenti...
research
11/28/2022

Simple Random Order Contention Resolution for Graphic Matroids with Almost no Prior Information

Random order online contention resolution schemes (ROCRS) are structured...
research
02/27/2022

Strong Consistency for a Class of Adaptive Clustering Procedures

We introduce a class of clustering procedures which includes k-means and...

Please sign up or login with your details

Forgot password? Click here to reset