DeepAI AI Chat
Log In Sign Up

Further heuristics for k-means: The merge-and-split heuristic and the (k,l)-means

by   Frank Nielsen, et al.
Association for Computing Machinery

Finding the optimal k-means clustering is NP-hard in general and many heuristics have been designed for minimizing monotonically the k-means objective. We first show how to extend Lloyd's batched relocation heuristic and Hartigan's single-point relocation heuristic to take into account empty-cluster and single-point cluster events, respectively. Those events tend to increasingly occur when k or d increases, or when performing several restarts. First, we show that those special events are a blessing because they allow to partially re-seed some cluster centers while further minimizing the k-means objective function. Second, we describe a novel heuristic, merge-and-split k-means, that consists in merging two clusters and splitting this merged cluster again with two new centers provided it improves the k-means objective. This novel heuristic can improve Hartigan's k-means when it has converged to a local minimum. We show empirically that this merge-and-split k-means improves over the Hartigan's heuristic which is the de facto method of choice. Finally, we propose the (k,l)-means objective that generalizes the k-means objective by associating the data points to their l closest cluster centers, and show how to either directly convert or iteratively relax the (k,l)-means into a k-means in order to reach better local minima.


page 1

page 2

page 3

page 4


Structures of Spurious Local Minima in k-means

k-means clustering is a fundamental problem in unsupervised learning. Th...

On Approximability of Clustering Problems Without Candidate Centers

The k-means objective is arguably the most widely-used cost function for...

Distributional Clustering: A distribution-preserving clustering method

One key use of k-means clustering is to identify cluster prototypes whic...

Clustering by connection center evolution

The determination of cluster centers generally depends on the scale that...

Socially Fair k-Means Clustering

We show that the popular k-means clustering algorithm (Lloyd's heuristic...

Robust Clustering Using Tau-Scales

K means is a popular non-parametric clustering procedure introduced by S...

Feedback Clustering for Online Travel Agencies Searches: a Case Study

Understanding choices performed by online customers is a growing need in...