An empirical comparison between stochastic and deterministic centroid initialisation for K-Means variations

by   Avgoustinos Vouros, et al.

K-Means is one of the most used algorithms for data clustering and the usual clustering method for benchmarking. Despite its wide application it is well-known that it suffers from a series of disadvantages, such as the positions of the initial clustering centres (centroids), which can greatly affect the clustering solution. Over the years many K-Means variations and initialisations techniques have been proposed with different degrees of complexity. In this study we focus on common K-Means variations and deterministic initialisation techniques and we first show that more sophisticated initialisation methods reduce or alleviates the need of complex K-Means clustering, and secondly, that deterministic methods can achieve equivalent or better performance than stochastic methods. These conclusions are obtained through extensive benchmarking using different model data sets from various studies as well as clustering data sets.


page 10

page 11

page 12

page 18

page 19

page 20

page 21

page 23


Modified Possibilistic Fuzzy C-Means Algorithm for Clustering Incomplete Data Sets

Possibilistic fuzzy c-means (PFCM) algorithm is a reliable algorithm has...

TELeR: A General Taxonomy of LLM Prompts for Benchmarking Complex Tasks

While LLMs have shown great success in understanding and generating text...

Clustering With Side Information: From a Probabilistic Model to a Deterministic Algorithm

In this paper, we propose a model-based clustering method (TVClust) that...

Deterministic Initialization of the K-Means Algorithm Using Hierarchical Clustering

K-means is undoubtedly the most widely used partitional clustering algor...

Linear, Deterministic, and Order-Invariant Initialization Methods for the K-Means Clustering Algorithm

Over the past five decades, k-means has become the clustering algorithm ...

Comparison of Clustering Algorithms for Statistical Features of Vibration Data Sets

Vibration-based condition monitoring systems are receiving increasing at...

CKmeans and FCKmeans : Two Deterministic Initialization Procedures For Kmeans Algorithm Using Crowding Distance

This paper presents two novel deterministic initialization procedures fo...

Code Repositories


Code for the manuscript: "An empirical comparison between stochastic and deterministic centroid initialisation for K-Means variations"

view repo


A gui for running different K-Means clustering techniques on benchmark datasets

view repo

Please sign up or login with your details

Forgot password? Click here to reset