An empirical comparison between stochastic and deterministic centroid initialisation for K-Means variations

08/26/2019
by   Avgoustinos Vouros, et al.
16

K-Means is one of the most used algorithms for data clustering and the usual clustering method for benchmarking. Despite its wide application it is well-known that it suffers from a series of disadvantages, such as the positions of the initial clustering centres (centroids), which can greatly affect the clustering solution. Over the years many K-Means variations and initialisations techniques have been proposed with different degrees of complexity. In this study we focus on common K-Means variations and deterministic initialisation techniques and we first show that more sophisticated initialisation methods reduce or alleviates the need of complex K-Means clustering, and secondly, that deterministic methods can achieve equivalent or better performance than stochastic methods. These conclusions are obtained through extensive benchmarking using different model data sets from various studies as well as clustering data sets.

READ FULL TEXT

page 10

page 11

page 12

page 18

page 19

page 20

page 21

page 23

research
07/09/2020

Modified Possibilistic Fuzzy C-Means Algorithm for Clustering Incomplete Data Sets

Possibilistic fuzzy c-means (PFCM) algorithm is a reliable algorithm has...
research
05/19/2023

TELeR: A General Taxonomy of LLM Prompts for Benchmarking Complex Tasks

While LLMs have shown great success in understanding and generating text...
research
08/25/2015

Clustering With Side Information: From a Probabilistic Model to a Deterministic Algorithm

In this paper, we propose a model-based clustering method (TVClust) that...
research
04/28/2013

Deterministic Initialization of the K-Means Algorithm Using Hierarchical Clustering

K-means is undoubtedly the most widely used partitional clustering algor...
research
09/12/2014

Linear, Deterministic, and Order-Invariant Initialization Methods for the K-Means Clustering Algorithm

Over the past five decades, k-means has become the clustering algorithm ...
research
05/11/2023

Comparison of Clustering Algorithms for Statistical Features of Vibration Data Sets

Vibration-based condition monitoring systems are receiving increasing at...
research
04/19/2023

CKmeans and FCKmeans : Two Deterministic Initialization Procedures For Kmeans Algorithm Using Crowding Distance

This paper presents two novel deterministic initialization procedures fo...

Please sign up or login with your details

Forgot password? Click here to reset