DISCERN: Diversity-based Selection of Centroids for k-Estimation and Rapid Non-stochastic Clustering

10/14/2019
by   Ali Hassani, et al.
11

As one of the most ubiquitously applied unsupervised learning methods, clustering has also been known to have a few disadvantages. More specifically, parameters such as the number of clusters and neighborhood radius are what call the `unsupervised` nature of these algorithms into question. Moreover, the stochastic nature of a great number of these algorithms is also a considerable point of weakness. In order to address these issues, we propose DISCERN which can serve as an initialization algorithm for K-Means, finding suitable centroids that increase the performance of K-Means. Following that, the algorithm can estimate the number of clusters if need be. The algorithm does all of that, while maintaining complete robustness and returning the same results at each separate run. We ran experiments on the proposed method processing multiple datasets and the results show its undeniable superiority in terms of results, computational time and robustness when compared to the randomized K-Means and K-Means++ initialization. In addition, the superiority in estimating the number of clusters is also discussed and we prove the lower complexity when compared to methods such as the elbow and silhouette methods in estimating the number of clusters.

READ FULL TEXT

page 10

page 13

page 16

page 17

page 20

page 22

research
04/25/2020

Unsupervised K-Means Clustering Algorithm

The k-means algorithm is generally the most known and used clustering me...
research
05/30/2016

k2-means for fast and accurate large scale clustering

We propose k^2-means, a new clustering method which efficiently copes wi...
research
12/23/2021

Ensemble Method for Cluster Number Determination and Algorithm Selection in Unsupervised Learning

Unsupervised learning, and more specifically clustering, suffers from th...
research
11/21/2016

Effective Deterministic Initialization for k-Means-Like Methods via Local Density Peaks Searching

The k-means clustering algorithm is popular but has the following main d...
research
09/13/2022

A Clustering Method Based on Information Entropy Payload

Existing clustering algorithms such as K-means often need to preset para...
research
06/29/2018

Grapevine: A Wine Prediction Algorithm Using Multi-dimensional Clustering Methods

We present a method for a wine recommendation system that employs multid...
research
05/16/2020

Revisiting Agglomerative Clustering

In data clustering, emphasis is often placed in finding groups of points...

Please sign up or login with your details

Forgot password? Click here to reset