Stop Measuring Calibration When Humans Disagree

10/28/2022
by   Joris Baan, et al.
0

Calibration is a popular framework to evaluate whether a classifier knows when it does not know - i.e., its predictive probabilities are a good indication of how likely a prediction is to be correct. Correctness is commonly estimated against the human majority class. Recently, calibration to human majority has been measured on tasks where humans inherently disagree about which class applies. We show that measuring calibration to human majority given inherent disagreements is theoretically problematic, demonstrate this empirically on the ChaosNLI dataset, and derive several instance-level measures of calibration that capture key statistical properties of human judgements - class frequency, ranking and entropy.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/30/2022

A Unifying Theory of Distance from Calibration

We study the fundamental question of how to define and measure the dista...
research
06/25/2023

TCE: A Test-Based Approach to Measuring Calibration Error

This paper proposes a new metric to measure the calibration error of pro...
research
09/10/2023

Towards reliable predictive analytics: a generalized calibration framework

Calibration is a pivotal aspect in predictive modeling, as it ensures th...
research
04/30/2023

Calibration Error Estimation Using Fuzzy Binning

Neural network-based decisions tend to be overconfident, where their raw...
research
06/23/2020

Calibration of Neural Networks using Splines

Calibrating neural networks is of utmost importance when employing them ...
research
09/29/2021

Combining Human Predictions with Model Probabilities via Confusion Matrices and Calibration

An increasingly common use case for machine learning models is augmentin...
research
02/08/2022

Calibrated Learning to Defer with One-vs-All Classifiers

The learning to defer (L2D) framework has the potential to make AI syste...

Please sign up or login with your details

Forgot password? Click here to reset