Recombinator-k-means: Enhancing k-means++ by seeding from pools of previous runs
We present a heuristic algorithm, called recombinator-k-means, that can substantially improve the results of k-means optimization. Instead of using simple independent restarts and returning the best result, our scheme performs restarts in batches, using the results of a previous batch as a reservoir of candidates for the new initial starting values (seeds), exploiting the popular k-means++ seeding algorithm to piece them together into new promising initial configurations. Our scheme is general (it only affects the seeding part of the optimization, thus it could be applied even to k-medians or k-medoids, for example), it has no additional costs and it is trivially parallelizable across the restarts of each batch. In some circumstances, it can systematically find better configurations than the best one obtained after 10^4 restarts of a standard scheme. Our implementation is publicly available at https://github.com/carlobaldassi/RecombinatorKMeans.jl.
READ FULL TEXT