A Nearly Tight Analysis of Greedy k-means++

07/16/2022
by   Christoph Grunau, et al.
0

The famous k-means++ algorithm of Arthur and Vassilvitskii [SODA 2007] is the most popular way of solving the k-means problem in practice. The algorithm is very simple: it samples the first center uniformly at random and each of the following k-1 centers is then always sampled proportional to its squared distance to the closest center so far. Afterward, Lloyd's iterative algorithm is run. The k-means++ algorithm is known to return a Θ(log k) approximate solution in expectation. In their seminal work, Arthur and Vassilvitskii [SODA 2007] asked about the guarantees for its following greedy variant: in every step, we sample ℓ candidate centers instead of one and then pick the one that minimizes the new cost. This is also how k-means++ is implemented in e.g. the popular Scikit-learn library [Pedregosa et al.; JMLR 2011]. We present nearly matching lower and upper bounds for the greedy k-means++: We prove that it is an O(ℓ^3 log^3 k)-approximation algorithm. On the other hand, we prove a lower bound of Ω(ℓ^3 log^3 k / log^2(ℓlog k)). Previously, only an Ω(ℓlog k) lower bound was known [Bhattacharya, Eube, Röglin, Schmidt; ESA 2020] and there was no known upper bound.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/02/2019

Noisy, Greedy and Not So Greedy k-means++

The k-means++ algorithm due to Arthur and Vassilvitskii has become the m...
research
02/18/2021

No-Substitution k-means Clustering with Low Center Complexity and Memory

Clustering is a fundamental task in machine learning. Given a dataset X ...
research
07/25/2023

Noisy k-means++ Revisited

The k-means++ algorithm by Arthur and Vassilvitskii [SODA 2007] is a cla...
research
03/05/2020

Simple and sharp analysis of k-means||

We present a truly simple analysis of k-means|| (Bahmani et al., PVLDB 2...
research
06/22/2020

Improved Bounds for Metric Capacitated Covering Problems

In the Metric Capacitated Covering (MCC) problem, given a set of balls ℬ...
research
02/14/2018

Dynamic Fair Division Problem with General Valuations

In this paper, we focus on how to dynamically allocate a divisible resou...
research
01/14/2021

New bounds for k-means and information k-means

In this paper, we derive a new dimension-free non-asymptotic upper bound...

Please sign up or login with your details

Forgot password? Click here to reset