An Information-Theoretic External Cluster-Validity Measure

12/12/2012
by   Byron E Dom, et al.
0

In this paper we propose a measure of clustering quality or accuracy that is appropriate in situations where it is desirable to evaluate a clustering algorithm by somehow comparing the clusters it produces with "ground truth' consisting of classes assigned to the patterns by manual means or some other means in whose veracity there is confidence. Such measures are refered to as "external'. Our measure also has the characteristic of allowing clusterings with different numbers of clusters to be compared in a quantitative and principled way. Our evaluation scheme quantitatively measures how useful the cluster labels of the patterns are as predictors of their class labels. In cases where all clusterings to be compared have the same number of clusters, the measure is equivalent to the mutual information between the cluster labels and the class labels. In cases where the numbers of clusters are different, however, it computes the reduction in the number of bits that would be required to encode (compress) the class labels if both the encoder and decoder have free acccess to the cluster labels. To achieve this encoding the estimated conditional probabilities of the class labels given the cluster labels must also be encoded. These estimated probabilities can be seen as a model for the class labels and their associated code length as a model cost.

READ FULL TEXT
research
09/07/2022

Adjusted Asymmetric Accuracy: A Well-Behaving External Cluster Validity Measure

There is no, nor will there ever be, single best clustering algorithm, b...
research
06/17/2021

A Distance-based Separability Measure for Internal Cluster Validation

To evaluate clustering results is a significant part of cluster analysis...
research
09/02/2020

An Internal Cluster Validity Index Based on Distance-based Separability Measure

To evaluate clustering results is a significant part in cluster analysis...
research
04/04/2016

Clustering Millions of Faces by Identity

In this work, we attempt to address the following problem: Given a large...
research
11/28/2017

A fatal point concept and a low-sensitivity quantitative measure for traffic safety analytics

The variability of the clusters generated by clustering techniques in th...
research
09/20/2022

Sanity Check for External Clustering Validation Benchmarks using Internal Validation Measures

We address the lack of reliability in benchmarking clustering techniques...
research
09/03/2021

J-Score: A Robust Measure of Clustering Accuracy

Background. Clustering analysis discovers hidden structures in a data se...

Please sign up or login with your details

Forgot password? Click here to reset