Log In Sign Up

An empirical comparison between stochastic and deterministic centroid initialisation for K-Means variations

by   Avgoustinos Vouros, et al.

K-Means is one of the most used algorithms for data clustering and the usual clustering method for benchmarking. Despite its wide application it is well-known that it suffers from a series of disadvantages, such as the positions of the initial clustering centres (centroids), which can greatly affect the clustering solution. Over the years many K-Means variations and initialisations techniques have been proposed with different degrees of complexity. In this study we focus on common K-Means variations and deterministic initialisation techniques and we first show that more sophisticated initialisation methods reduce or alleviates the need of complex K-Means clustering, and secondly, that deterministic methods can achieve equivalent or better performance than stochastic methods. These conclusions are obtained through extensive benchmarking using different model data sets from various studies as well as clustering data sets.


page 10

page 11

page 12

page 18

page 19

page 20

page 21

page 23


Modified Possibilistic Fuzzy C-Means Algorithm for Clustering Incomplete Data Sets

Possibilistic fuzzy c-means (PFCM) algorithm is a reliable algorithm has...

Clustering With Side Information: From a Probabilistic Model to a Deterministic Algorithm

In this paper, we propose a model-based clustering method (TVClust) that...

Deterministic Initialization of the K-Means Algorithm Using Hierarchical Clustering

K-means is undoubtedly the most widely used partitional clustering algor...

Linear, Deterministic, and Order-Invariant Initialization Methods for the K-Means Clustering Algorithm

Over the past five decades, k-means has become the clustering algorithm ...

Fast and Accurate k-means++ via Rejection Sampling

k-means++ <cit.> is a widely used clustering algorithm that is easy to i...

Band Depth based initialization of k-Means for functional data clustering

The k-Means algorithm is one of the most popular choices for clustering ...

An Empirical Evaluation of k-Means Coresets

Coresets are among the most popular paradigms for summarizing data. In p...

Code Repositories


Code for the manuscript: "An empirical comparison between stochastic and deterministic centroid initialisation for K-Means variations"

view repo


A gui for running different K-Means clustering techniques on benchmark datasets

view repo