Further heuristics for k-means: The merge-and-split heuristic and the (k,l)-means

06/23/2014
by   Frank Nielsen, et al.
0

Finding the optimal k-means clustering is NP-hard in general and many heuristics have been designed for minimizing monotonically the k-means objective. We first show how to extend Lloyd's batched relocation heuristic and Hartigan's single-point relocation heuristic to take into account empty-cluster and single-point cluster events, respectively. Those events tend to increasingly occur when k or d increases, or when performing several restarts. First, we show that those special events are a blessing because they allow to partially re-seed some cluster centers while further minimizing the k-means objective function. Second, we describe a novel heuristic, merge-and-split k-means, that consists in merging two clusters and splitting this merged cluster again with two new centers provided it improves the k-means objective. This novel heuristic can improve Hartigan's k-means when it has converged to a local minimum. We show empirically that this merge-and-split k-means improves over the Hartigan's heuristic which is the de facto method of choice. Finally, we propose the (k,l)-means objective that generalizes the k-means objective by associating the data points to their l closest cluster centers, and show how to either directly convert or iteratively relax the (k,l)-means into a k-means in order to reach better local minima.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/16/2020

Structures of Spurious Local Minima in k-means

k-means clustering is a fundamental problem in unsupervised learning. Th...
research
09/30/2020

On Approximability of Clustering Problems Without Candidate Centers

The k-means objective is arguably the most widely-used cost function for...
research
11/14/2019

Distributional Clustering: A distribution-preserving clustering method

One key use of k-means clustering is to identify cluster prototypes whic...
research
10/19/2016

Clustering by connection center evolution

The determination of cluster centers generally depends on the scale that...
research
06/17/2020

Socially Fair k-Means Clustering

We show that the popular k-means clustering algorithm (Lloyd's heuristic...
research
06/19/2019

Robust Clustering Using Tau-Scales

K means is a popular non-parametric clustering procedure introduced by S...
research
06/28/2020

Feedback Clustering for Online Travel Agencies Searches: a Case Study

Understanding choices performed by online customers is a growing need in...

Please sign up or login with your details

Forgot password? Click here to reset