Adjusted Asymmetric Accuracy: A Well-Behaving External Cluster Validity Measure

09/07/2022
by   Marek Gagolewski, et al.
0

There is no, nor will there ever be, single best clustering algorithm, but we would still like to be able to pinpoint those which are well-performing on certain task types and filter out the systematically disappointing ones. Clustering algorithms are traditionally evaluated using either internal or external validity measures. Internal measures quantify different aspects of the obtained partitions, e.g., the average degree of cluster compactness or point separability. Yet, their validity is questionable because the clusterings they promote can sometimes be meaningless. External measures, on the other hand, compare the algorithms' outputs to the reference, ground truth groupings that are provided by experts. The commonly-used classical partition similarity scores, such as the normalised mutual information, Fowlkes-Mallows, or adjusted Rand index, might not possess all the desirable properties, e.g., they do not identify pathological edge cases correctly. Furthermore, they are not nicely interpretable: it is hard to say what a score of 0.8 really means. Its behaviour might also vary as the number of true clusters changes. This makes comparing clustering algorithms across many benchmark datasets difficult. To remedy this, we propose and analyse a new measure: an asymmetric version of the optimal set-matching accuracy. It is corrected for chance and the imbalancedness of cluster sizes.

READ FULL TEXT

page 2

page 13

page 14

page 16

research
09/20/2022

Sanity Check for External Clustering Validation Benchmarks using Internal Validation Measures

We address the lack of reliability in benchmarking clustering techniques...
research
12/12/2012

An Information-Theoretic External Cluster-Validity Measure

In this paper we propose a measure of clustering quality or accuracy tha...
research
06/17/2016

Ground Truth Bias in External Cluster Validity Indices

It has been noticed that some external CVIs exhibit a preferential bias ...
research
09/20/2022

A Framework for Benchmarking Clustering Algorithms

The evaluation of clustering algorithms can be performed by running them...
research
06/02/2019

Comprehensive cluster validity Index based on structural simplicity

Nonhierarchical clustering depending on unsupervised algorithms may not ...
research
12/03/2015

Adjusting for Chance Clustering Comparison Measures

Adjusted for chance measures are widely used to compare partitions/clust...
research
07/26/2019

A close-up comparison of the misclassification error distance and the adjusted Rand index for external clustering evaluation

The misclassification error distance and the adjusted Rand index are two...

Please sign up or login with your details

Forgot password? Click here to reset