DeepAI AI Chat
Log In Sign Up

Stop Measuring Calibration When Humans Disagree

10/28/2022
by   Joris Baan, et al.
0

Calibration is a popular framework to evaluate whether a classifier knows when it does not know - i.e., its predictive probabilities are a good indication of how likely a prediction is to be correct. Correctness is commonly estimated against the human majority class. Recently, calibration to human majority has been measured on tasks where humans inherently disagree about which class applies. We show that measuring calibration to human majority given inherent disagreements is theoretically problematic, demonstrate this empirically on the ChaosNLI dataset, and derive several instance-level measures of calibration that capture key statistical properties of human judgements - class frequency, ranking and entropy.

READ FULL TEXT

page 1

page 2

page 3

page 4

11/30/2022

A Unifying Theory of Distance from Calibration

We study the fundamental question of how to define and measure the dista...
10/24/2019

Calibration tests in multi-class classification: A unifying framework

In safety-critical applications a probabilistic model is usually require...
02/08/2022

Inference from Sampling with Response Probabilities Estimated via Calibration

A solution to control for nonresponse bias consists of multiplying the d...
06/23/2020

Calibration of Neural Networks using Splines

Calibrating neural networks is of utmost importance when employing them ...
02/03/2022

Hidden Heterogeneity: When to Choose Similarity-Based Calibration

Trustworthy classifiers are essential to the adoption of machine learnin...
09/29/2021

Combining Human Predictions with Model Probabilities via Confusion Matrices and Calibration

An increasingly common use case for machine learning models is augmentin...
02/08/2022

Calibrated Learning to Defer with One-vs-All Classifiers

The learning to defer (L2D) framework has the potential to make AI syste...