J-Score: A Robust Measure of Clustering Accuracy

09/03/2021
by   Navid Ahmadinejad, et al.
0

Background. Clustering analysis discovers hidden structures in a data set by partitioning them into disjoint clusters. Robust accuracy measures that evaluate the goodness of clustering results are critical for algorithm development and model diagnosis. Common problems of current clustering accuracy measures include overlooking unmatched clusters, biases towards excessive clusters, unstable baselines, and difficult interpretation. In this study, we presented a novel accuracy measure, J-score, that addresses these issues. Methods. Given a data set with known class labels, J-score quantifies how well the hypothetical clusters produced by clustering analysis recover the true classes. It starts with bidirectional set matching to identify the correspondence between true classes and hypothetical clusters based on Jaccard index. It then computes two weighted sums of Jaccard indices measuring the reconciliation from classes to clusters and vice versa. The final J-score is the harmonic mean of the two weighted sums. Results. Via simulation studies, we evaluated the performance of J-score and compared with existing measures. Our results show that J-score is effective in distinguishing partition structures that differ only by unmatched clusters, rewarding correct inference of class numbers, addressing biases towards excessive clusters, and having a relatively stable baseline. The simplicity of its calculation makes the interpretation straightforward. It is a valuable tool complementary to other accuracy measures. We released an R/jScore package implementing the algorithm.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/01/2022

A new nonparametric interpoint distance-based measure for assessment of clustering

A new interpoint distance-based measure is proposed to identify the opti...
research
03/28/2016

Hierarchy of Groups Evaluation Using Different F-score Variants

The paper presents a cursory examination of clustering, focusing on a ra...
research
10/13/2018

Measuring Swampiness: Quantifying Chaos in Large Heterogeneous Data Repositories

As scientific data repositories and filesystems grow in size and complex...
research
10/01/2019

Deep Lifetime Clustering

The goal of lifetime clustering is to develop an inductive model that ma...
research
12/12/2012

An Information-Theoretic External Cluster-Validity Measure

In this paper we propose a measure of clustering quality or accuracy tha...
research
06/19/2017

On comparing clusterings: an element-centric framework unifies overlaps and hierarchy

Clustering is one of the most universal approaches for understanding com...
research
04/26/2023

Automated calibration of consensus weighted distance-based clustering approaches using sharp

In consensus clustering, a clustering algorithm is used in combination w...

Please sign up or login with your details

Forgot password? Click here to reset