Clustering Validation with The Area Under Precision-Recall Curves

04/04/2023
by   Pablo Andretta Jaskowiak, et al.
0

Confusion matrices and derived metrics provide a comprehensive framework for the evaluation of model performance in machine learning. These are well-known and extensively employed in the supervised learning domain, particularly classification. Surprisingly, such a framework has not been fully explored in the context of clustering validation. Indeed, just recently such a gap has been bridged with the introduction of the Area Under the ROC Curve for Clustering (AUCC), an internal/relative Clustering Validation Index (CVI) that allows for clustering validation in real application scenarios. In this work we explore the Area Under Precision-Recall Curve (and related metrics) in the context of clustering validation. We show that these are not only appropriate as CVIs, but should also be preferred in the presence of cluster imbalance. We perform a comprehensive evaluation of proposed and state-of-art CVIs on real and simulated data sets. Our observations corroborate towards an unified validation framework for supervised and unsupervised learning, given that they are consistent with existing guidelines established for the evaluation of supervised learning models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/04/2020

The Area Under the ROC Curve as a Measure of Clustering Quality

The Area Under the the Receiver Operating Characteristics (ROC) Curve, r...
research
08/27/2020

reval: a Python package to determine the best number of clusters with stability-based relative clustering validation

Determining the number of clusters that best partitions a dataset can be...
research
03/01/2021

Validation of cluster analysis results on validation data: A systematic framework

Cluster analysis refers to a wide range of data analytic techniques for ...
research
01/20/2015

Regroupement sémantique de définitions en espagnol

This article focuses on the description and evaluation of a new unsuperv...
research
06/18/2012

Unachievable Region in Precision-Recall Space and Its Effect on Empirical Evaluation

Precision-recall (PR) curves and the areas under them are widely used to...
research
07/10/2018

A New Variational Model for Binary Classification in the Supervised Learning Context

We examine the supervised learning problem in its continuous setting and...
research
02/13/2021

HAWKS: Evolving Challenging Benchmark Sets for Cluster Analysis

Comprehensive benchmarking of clustering algorithms is rendered difficul...

Please sign up or login with your details

Forgot password? Click here to reset