Exact Acceleration of K-Means++ and K-Means

05/06/2021
by   Edward Raff, et al.
0

K-Means++ and its distributed variant K-Means have become de facto tools for selecting the initial seeds of K-means. While alternatives have been developed, the effectiveness, ease of implementation, and theoretical grounding of the K-means++ and methods have made them difficult to "best" from a holistic perspective. By considering the limited opportunities within seed selection to perform pruning, we develop specialized triangle inequality pruning strategies and a dynamic priority queue to show the first acceleration of K-Means++ and K-Means that is faster in run-time while being algorithmicly equivalent. For both algorithms we are able to reduce distance computations by over 500×. For K-means++ this results in up to a 17× speedup in run-time and a 551× speedup for K-means. We achieve this with simple, but carefully chosen, modifications to known techniques which makes it easy to integrate our approach into existing implementations of these algorithms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/08/2012

Robust seed selection algorithm for k-means type algorithms

Selection of initial seeds greatly affects the quality of the clusters a...
research
12/21/2019

DBP: Discrimination Based Block-Level Pruning for Deep Model Acceleration

Neural network pruning is one of the most popular methods of acceleratin...
research
12/23/2009

Elkan's k-Means for Graphs

This paper extends k-means algorithms from the Euclidean domain to the d...
research
02/08/2016

Fast K-Means with Accurate Bounds

We propose a novel accelerated exact k-means algorithm, which performs b...
research
05/22/2019

KPynq: A Work-Efficient Triangle-Inequality based K-means on FPGA

K-means is a popular but computation-intensive algorithm for unsupervise...
research
02/10/2021

Early Abandoning and Pruning for Elastic Distances

Elastic distances are key tools for time series analysis. Straightforwar...
research
10/12/2018

Faster k-Medoids Clustering: Improving the PAM, CLARA, and CLARANS Algorithms

Clustering non-Euclidean data is difficult, and one of the most used alg...

Please sign up or login with your details

Forgot password? Click here to reset