What is Your Metric Telling You? Evaluating Classifier Calibration under Context-Specific Definitions of Reliability

05/23/2022
by   John Kirchenbauer, et al.
0

Classifier calibration has received recent attention from the machine learning community due both to its practical utility in facilitating decision making, as well as the observation that modern neural network classifiers are poorly calibrated. Much of this focus has been towards the goal of learning classifiers such that their output with largest magnitude (the "predicted class") is calibrated. However, this narrow interpretation of classifier outputs does not adequately capture the variety of practical use cases in which classifiers can aid in decision making. In this work, we argue that more expressive metrics must be developed that accurately measure calibration error for the specific context in which a classifier will be deployed. To this end, we derive a number of different metrics using a generalization of Expected Calibration Error (ECE) that measure calibration error under different definitions of reliability. We then provide an extensive empirical evaluation of commonly used neural network architectures and calibration techniques with respect to these metrics. We find that: 1) definitions of ECE that focus solely on the predicted class fail to accurately measure calibration error under a selection of practically useful definitions of reliability and 2) many common calibration techniques fail to improve calibration performance uniformly across ECE metrics derived from these diverse definitions of reliability.

READ FULL TEXT

page 7

page 22

page 24

page 25

page 27

page 28

page 29

page 30

research
09/08/2021

Estimating Expected Calibration Errors

Uncertainty in probabilistic classifiers predictions is a key concern wh...
research
07/27/2022

Calibrate: Interactive Analysis of Probabilistic Model Output

Analyzing classification model performance is a crucial task for machine...
research
09/12/2022

Analysis and Comparison of Classification Metrics

A number of different performance metrics are commonly used in the machi...
research
02/03/2022

Hidden Heterogeneity: When to Choose Similarity-Based Calibration

Trustworthy classifiers are essential to the adoption of machine learnin...
research
12/20/2021

Classifier Calibration: How to assess and improve predicted class probabilities: a survey

This paper provides both an introduction to and a detailed overview of t...
research
11/05/2022

Accurate and Reliable Methods for 5G UAV Jamming Identification With Calibrated Uncertainty

Only increasing accuracy without considering uncertainty may negatively ...
research
04/18/2022

Trinary Tools for Continuously Valued Binary Classifiers

Classification methods for binary (yes/no) tasks often produce a continu...

Please sign up or login with your details

Forgot password? Click here to reset