K-Splits: Improved K-Means Clustering Algorithm to Automatically Detect the Number of Clusters

10/09/2021
by   Seyed Omid Mohammadi, et al.
0

This paper introduces k-splits, an improved hierarchical algorithm based on k-means to cluster data without prior knowledge of the number of clusters. K-splits starts from a small number of clusters and uses the most significant data distribution axis to split these clusters incrementally into better fits if needed. Accuracy and speed are two main advantages of the proposed method. We experiment on six synthetic benchmark datasets plus two real-world datasets MNIST and Fashion-MNIST, to prove that our algorithm has excellent accuracy in finding the correct number of clusters under different conditions. We also show that k-splits is faster than similar methods and can even be faster than the standard k-means in lower dimensions. Finally, we suggest using k-splits to uncover the exact position of centroids and then input them as initial points to the k-means algorithm to fine-tune the results.

READ FULL TEXT
research
01/31/2019

A Novel Initial Clusters Generation Method for K-means-based Clustering Algorithms for Mixed Datasets

Mixed datasets consist of numeric and categorical attributes. Various K-...
research
11/22/2022

Global k-means++: an effective relaxation of the global k-means clustering algorithm

The k-means algorithm is a very prevalent clustering method because of i...
research
08/27/2022

Geometrical Homogeneous Clustering for Image Data Reduction

In this paper, we present novel variations of an earlier approach called...
research
08/30/2022

k-MS: A novel clustering algorithm based on morphological reconstruction

This work proposes a clusterization algorithm called k-Morphological Set...
research
08/21/2020

ConiVAT: Cluster Tendency Assessment and Clustering with Partial Background Knowledge

The VAT method is a visual technique for determining the potential clust...
research
12/18/2019

s-DRN: Stabilized Developmental Resonance Network

Online incremental clustering of sequentially incoming data without prio...
research
06/13/2013

Non-parametric Power-law Data Clustering

It has always been a great challenge for clustering algorithms to automa...

Please sign up or login with your details

Forgot password? Click here to reset