An Internal Cluster Validity Index Based on Distance-based Separability Measure

09/02/2020
by   Shuyue Guan, et al.
0

To evaluate clustering results is a significant part in cluster analysis. Usually, there is no true class labels for clustering as a typical unsupervised learning. Thus, a number of internal evaluations, which use predicted labels and data, have been created. They also named internal cluster validity indices (CVIs). Without true labels, to design an effective CVI is not simple because it is similar to create a clustering method. And, to have more CVIs is crucial because there is no universal CVI that can be used to measure all datasets, and no specific method for selecting a proper CVI for clusters without true labels. Therefore, to apply more CVIs to evaluate clustering results is necessary. In this paper, we propose a novel CVI - called Distance-based Separability Index (DSI), based on a data separability measure. We applied the DSI and eight other internal CVIs including early studies from Dunn (1974) to most recent studies CVDD (2019) as comparison. We used an external CVI as ground truth for clustering results of five clustering algorithms on 12 real and 97 synthetic datasets. Results show DSI is an effective, unique, and competitive CVI to other compared CVIs. In addition, we summarized the general process to evaluate CVIs and created a new method - rank difference - to compare the results of CVIs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/17/2021

A Distance-based Separability Measure for Internal Cluster Validation

To evaluate clustering results is a significant part of cluster analysis...
research
07/15/2020

Evaluating and Validating Cluster Results

Clustering is the technique to partition data according to their charact...
research
05/11/2021

An internal validity index based on density-involved distance

It is crucial to evaluate the quality of clustering results in cluster a...
research
12/12/2012

An Information-Theoretic External Cluster-Validity Measure

In this paper we propose a measure of clustering quality or accuracy tha...
research
09/04/2020

The Area Under the ROC Curve as a Measure of Clustering Quality

The Area Under the the Receiver Operating Characteristics (ROC) Curve, r...
research
05/18/2023

Computational thematics: Comparing algorithms for clustering the genres of literary fiction

What are the best methods of capturing thematic similarity between liter...
research
11/25/2022

Fuzzy clustering for the within-season estimation of cotton phenology

Crop phenology is crucial information for crop yield estimation and agri...

Please sign up or login with your details

Forgot password? Click here to reset