Log In Sign Up

DISCERN: Diversity-based Selection of Centroids for k-Estimation and Rapid Non-stochastic Clustering

by   Ali Hassani, et al.

As one of the most ubiquitously applied unsupervised learning methods, clustering has also been known to have a few disadvantages. More specifically, parameters such as the number of clusters and neighborhood radius are what call the `unsupervised` nature of these algorithms into question. Moreover, the stochastic nature of a great number of these algorithms is also a considerable point of weakness. In order to address these issues, we propose DISCERN which can serve as an initialization algorithm for K-Means, finding suitable centroids that increase the performance of K-Means. Following that, the algorithm can estimate the number of clusters if need be. The algorithm does all of that, while maintaining complete robustness and returning the same results at each separate run. We ran experiments on the proposed method processing multiple datasets and the results show its undeniable superiority in terms of results, computational time and robustness when compared to the randomized K-Means and K-Means++ initialization. In addition, the superiority in estimating the number of clusters is also discussed and we prove the lower complexity when compared to methods such as the elbow and silhouette methods in estimating the number of clusters.


page 10

page 13

page 16

page 17

page 20

page 22


Unsupervised K-Means Clustering Algorithm

The k-means algorithm is generally the most known and used clustering me...

k2-means for fast and accurate large scale clustering

We propose k^2-means, a new clustering method which efficiently copes wi...

Ensemble Method for Cluster Number Determination and Algorithm Selection in Unsupervised Learning

Unsupervised learning, and more specifically clustering, suffers from th...

Revisiting Agglomerative Clustering

In data clustering, emphasis is often placed in finding groups of points...

A Clustering Method Based on Information Entropy Payload

Existing clustering algorithms such as K-means often need to preset para...

Grapevine: A Wine Prediction Algorithm Using Multi-dimensional Clustering Methods

We present a method for a wine recommendation system that employs multid...

A Novel Adaptive Possibilistic Clustering Algorithm

In this paper a novel possibilistic c-means clustering algorithm, called...

Code Repositories


DISCERN: Diversity-based Selection of Centroids for k-Estimation and Rapid Non-stochastic Clustering

view repo