PaVa: a novel Path-based Valley-seeking clustering algorithm

06/13/2023
by   Lin Ma, et al.
0

Clustering methods are being applied to a wider range of scenarios involving more complex datasets, where the shapes of clusters tend to be arbitrary. In this paper, we propose a novel Path-based Valley-seeking clustering algorithm for arbitrarily shaped clusters. This work aims to seek the valleys among clusters and then individually extract clusters. Three vital techniques are used in this algorithm. First, path distance (minmax distance) is employed to transform the irregular boundaries among clusters, that is density valleys, into perfect spherical shells. Second, a suitable density measurement, k-distance, is employed to make adjustment on Minimum Spanning Tree, by which a robust minmax distance is calculated. Third, we seek the transformed density valleys by determining their centers and radius. First, the clusters are wrapped in spherical shells after the distance transformation, making the extraction process efficient even with clusters of arbitrary shape. Second, adjusted Minimum Spanning Tree enhances the robustness of minmax distance under different kinds of noise. Last, the number of clusters does not need to be inputted or decided manually due to the individual extraction process. After applying the proposed algorithm to several commonly used synthetic datasets, the results indicate that the Path-based Valley-seeking algorithm is accurate and efficient. The algorithm is based on the dissimilarity of objects, so it can be applied to a wide range of fields. Its performance on real-world datasets illustrates its versatility.

READ FULL TEXT

page 3

page 26

research
09/17/2019

Global Optimal Path-Based Clustering Algorithm

Combinatorial optimization problems for clustering are known to be NP-ha...
research
09/24/2020

Clustering Based on Graph of Density Topology

Data clustering with uneven distribution in high level noise is challeng...
research
10/16/2019

FISHDBC: Flexible, Incremental, Scalable, Hierarchical Density-Based Clustering for Arbitrary Data and Distance

FISHDBC is a flexible, incremental, scalable, and hierarchical density-b...
research
08/21/2020

ConiVAT: Cluster Tendency Assessment and Clustering with Partial Background Knowledge

The VAT method is a visual technique for determining the potential clust...
research
01/23/2020

Towards Automatic Clustering Analysis using Traces of Information Gain: The InfoGuide Method

Clustering analysis has become a ubiquitous information retrieval tool i...
research
03/26/2020

A Two-Stage Reconstruction of Microstructures with Arbitrarily Shaped Inclusions

The main goal of our research is to develop an effective method with a w...
research
01/22/2018

An Efficient Density-based Clustering Algorithm for Higher-Dimensional Data

DBSCAN is a typically used clustering algorithm due to its clustering ab...

Please sign up or login with your details

Forgot password? Click here to reset