Recombinator-k-means: Enhancing k-means++ by seeding from pools of previous runs

05/01/2019
by   Carlo Baldassi, et al.
0

We present a heuristic algorithm, called recombinator-k-means, that can substantially improve the results of k-means optimization. Instead of using simple independent restarts and returning the best result, our scheme performs restarts in batches, using the results of a previous batch as a reservoir of candidates for the new initial starting values (seeds), exploiting the popular k-means++ seeding algorithm to piece them together into new promising initial configurations. Our scheme is general (it only affects the seeding part of the optimization, thus it could be applied even to k-medians or k-medoids, for example), it has no additional costs and it is trivially parallelizable across the restarts of each batch. In some circumstances, it can systematically find better configurations than the best one obtained after 10^4 restarts of a standard scheme. Our implementation is publicly available at https://github.com/carlobaldassi/RecombinatorKMeans.jl.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/08/2022

Systematically improving existing k-means initialization algorithms at nearly no cost, by pairwise-nearest-neighbor smoothing

We present a meta-method for initializing (seeding) the k-means clusteri...
research
02/08/2012

Robust seed selection algorithm for k-means type algorithms

Selection of initial seeds greatly affects the quality of the clusters a...
research
12/23/2022

Using MM principles to deal with incomplete data in K-means clustering

Among many clustering algorithms, the K-means clustering algorithm is wi...
research
04/19/2019

Optimal initialization of K-means using Particle Swarm Optimization

This paper proposes the use of an optimization algorithm, namely PSO to ...
research
01/31/2022

Fast Distributed k-Means with a Small Number of Rounds

We propose a new algorithm for k-means clustering in a distributed setti...
research
03/15/2019

Tackling Initial Centroid of K-Means with Distance Part (DP-KMeans)

The initial centroid is a fairly challenging problem in the k-means meth...

Please sign up or login with your details

Forgot password? Click here to reset