DeepAI
Log In Sign Up

An empirical comparison between stochastic and deterministic centroid initialisation for K-Means variations

08/26/2019
by   Avgoustinos Vouros, et al.
16

K-Means is one of the most used algorithms for data clustering and the usual clustering method for benchmarking. Despite its wide application it is well-known that it suffers from a series of disadvantages, such as the positions of the initial clustering centres (centroids), which can greatly affect the clustering solution. Over the years many K-Means variations and initialisations techniques have been proposed with different degrees of complexity. In this study we focus on common K-Means variations and deterministic initialisation techniques and we first show that more sophisticated initialisation methods reduce or alleviates the need of complex K-Means clustering, and secondly, that deterministic methods can achieve equivalent or better performance than stochastic methods. These conclusions are obtained through extensive benchmarking using different model data sets from various studies as well as clustering data sets.

READ FULL TEXT

page 10

page 11

page 12

page 18

page 19

page 20

page 21

page 23

07/09/2020

Modified Possibilistic Fuzzy C-Means Algorithm for Clustering Incomplete Data Sets

Possibilistic fuzzy c-means (PFCM) algorithm is a reliable algorithm has...
08/25/2015

Clustering With Side Information: From a Probabilistic Model to a Deterministic Algorithm

In this paper, we propose a model-based clustering method (TVClust) that...
04/28/2013

Deterministic Initialization of the K-Means Algorithm Using Hierarchical Clustering

K-means is undoubtedly the most widely used partitional clustering algor...
09/12/2014

Linear, Deterministic, and Order-Invariant Initialization Methods for the K-Means Clustering Algorithm

Over the past five decades, k-means has become the clustering algorithm ...
12/22/2020

Fast and Accurate k-means++ via Rejection Sampling

k-means++ <cit.> is a widely used clustering algorithm that is easy to i...
06/02/2021

Band Depth based initialization of k-Means for functional data clustering

The k-Means algorithm is one of the most popular choices for clustering ...
07/03/2022

An Empirical Evaluation of k-Means Coresets

Coresets are among the most popular paradigms for summarizing data. In p...

Code Repositories

Code-KMeans-benchmark

Code for the manuscript: "An empirical comparison between stochastic and deterministic centroid initialisation for K-Means variations"


view repo

clustering-workplace

A gui for running different K-Means clustering techniques on benchmark datasets


view repo