Never mind the metrics – what about the uncertainty? Visualising confusion matrix metric distributions

06/05/2022
by   David Lovell, et al.
0

There are strong incentives to build models that demonstrate outstanding predictive performance on various datasets and benchmarks. We believe these incentives risk a narrow focus on models and on the performance metrics used to evaluate and compare them – resulting in a growing body of literature to evaluate and compare metrics. This paper strives for a more balanced perspective on classifier performance metrics by highlighting their distributions under different models of uncertainty and showing how this uncertainty can easily eclipse differences in the empirical performance of classifiers. We begin by emphasising the fundamentally discrete nature of empirical confusion matrices and show how binary matrices can be meaningfully represented in a three dimensional compositional lattice, whose cross-sections form the basis of the space of receiver operating characteristic (ROC) curves. We develop equations, animations and interactive visualisations of the contours of performance metrics within (and beyond) this ROC space, showing how some are affected by class imbalance. We provide interactive visualisations that show the discrete posterior predictive probability mass functions of true and false positive rates in ROC space, and how these relate to uncertainty in performance metrics such as Balanced Accuracy (BA) and the Matthews Correlation Coefficient (MCC). Our hope is that these insights and visualisations will raise greater awareness of the substantial uncertainty in performance metric estimates that can arise when classifiers are evaluated on empirical datasets and benchmarks, and that classification model performance claims should be tempered by this understanding.

READ FULL TEXT

page 24

page 26

page 27

page 28

page 30

page 32

page 33

page 35

research
10/12/2020

Class-Weighted Evaluation Metrics for Imbalanced Data Classification

Class distribution skews in imbalanced datasets may lead to models with ...
research
06/05/2018

Eliciting Binary Performance Metrics

Given a binary prediction problem, which performance metric should the c...
research
06/19/2020

Classifier uncertainty: evidence, potential impact, and probabilistic treatment

Classifiers are often tested on relatively small data sets, which should...
research
10/16/2018

An empirical evaluation of imbalanced data strategies from a practitioner's point of view

This research tested the following well known strategies to deal with bi...
research
09/12/2022

Analysis and Comparison of Classification Metrics

A number of different performance metrics are commonly used in the machi...
research
11/06/2020

Beyond Marginal Uncertainty: How Accurately can Bayesian Regression Models Estimate Posterior Predictive Correlations?

While uncertainty estimation is a well-studied topic in deep learning, m...
research
02/10/2023

On the Lattice of Program Metrics

In this paper we are concerned with understanding the nature of program ...

Please sign up or login with your details

Forgot password? Click here to reset