Fast and Accurate k-means++ via Rejection Sampling

12/22/2020
by   Vincent Cohen-Addad, et al.
0

k-means++ <cit.> is a widely used clustering algorithm that is easy to implement, has nice theoretical guarantees and strong empirical performance. Despite its wide adoption, k-means++ sometimes suffers from being slow on large data-sets so a natural question has been to obtain more efficient algorithms with similar guarantees. In this paper, we present a near linear time algorithm for k-means++ seeding. Interestingly our algorithm obtains the same theoretical guarantees as k-means++ and significantly improves earlier results on fast k-means++ seeding. Moreover, we show empirically that our algorithm is significantly faster than k-means++ and obtains solutions of equivalent quality.

READ FULL TEXT

page 1

page 2

page 3

page 4

03/05/2020

Fast Noise Removal for k-Means Clustering

This paper considers k-means clustering in the presence of noise. It is ...
10/27/2020

Improved Guarantees for k-means++ and k-means++ Parallel

In this paper, we study k-means++ and k-means++ parallel, the two most p...
02/16/2022

Spatial Transformer K-Means

K-means defines one of the most employed centroid-based clustering algor...
05/10/2020

Improving The Performance Of The K-means Algorithm

The Incremental K-means (IKM), an improved version of K-means (KM), was ...
06/10/2015

Fast Online Clustering with Randomized Skeleton Sets

We present a new fast online clustering algorithm that reliably recovers...
08/07/2020

A Sub-linear Time Algorithm for Approximating k-Nearest-Neighbor with Full Quality Guarantee

In this paper we propose an algorithm for the approximate k-Nearest-Neig...
08/26/2019

An empirical comparison between stochastic and deterministic centroid initialisation for K-Means variations

K-Means is one of the most used algorithms for data clustering and the u...