A Distance-based Separability Measure for Internal Cluster Validation

06/17/2021
by   Shuyue Guan, et al.
0

To evaluate clustering results is a significant part of cluster analysis. Since there are no true class labels for clustering in typical unsupervised learning, many internal cluster validity indices (CVIs), which use predicted labels and data, have been created. Without true labels, to design an effective CVI is as difficult as to create a clustering method. And it is crucial to have more CVIs because there are no universal CVIs that can be used to measure all datasets and no specific methods of selecting a proper CVI for clusters without true labels. Therefore, to apply a variety of CVIs to evaluate clustering results is necessary. In this paper, we propose a novel internal CVI – the Distance-based Separability Index (DSI), based on a data separability measure. We compared the DSI with eight internal CVIs including studies from early Dunn (1974) to most recent CVDD (2019) and an external CVI as ground truth, by using clustering results of five clustering algorithms on 12 real and 97 synthetic datasets. Results show DSI is an effective, unique, and competitive CVI to other compared CVIs. We also summarized the general process to evaluate CVIs and created the rank-difference metric for comparison of CVIs' results.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/02/2020

An Internal Cluster Validity Index Based on Distance-based Separability Measure

To evaluate clustering results is a significant part in cluster analysis...
research
06/15/2020

Selecting the Number of Clusters K with a Stability Trade-off: an Internal Validation Criterion

Model selection is a major challenge in non-parametric clustering. There...
research
08/02/2023

A new approach for evaluating internal cluster validation indices

A vast number of different methods are available for unsupervised classi...
research
09/09/2016

Measuring Player's Behaviour Change over Time in Public Goods Game

An important issue in public goods game is whether player's behaviour ch...
research
05/18/2023

Computational thematics: Comparing algorithms for clustering the genres of literary fiction

What are the best methods of capturing thematic similarity between liter...
research
12/12/2012

An Information-Theoretic External Cluster-Validity Measure

In this paper we propose a measure of clustering quality or accuracy tha...
research
09/04/2020

The Area Under the ROC Curve as a Measure of Clustering Quality

The Area Under the the Receiver Operating Characteristics (ROC) Curve, r...

Please sign up or login with your details

Forgot password? Click here to reset