Revisiting Calibration for Question Answering

05/25/2022
by   Chenglei Si, et al.
2

Model calibration aims to adjust (calibrate) models' confidence so that they match expected accuracy. We argue that the traditional evaluation of calibration (expected calibration error; ECE) does not reflect usefulness of the model confidence. For example, after conventional temperature scaling, confidence scores become similar for all predictions, which makes it hard for users to distinguish correct predictions from wrong ones, even though it achieves low ECE. Building on those observations, we propose a new calibration metric, MacroCE, that better captures whether the model assigns low confidence to wrong predictions and high confidence to correct predictions. We examine various conventional calibration methods including temperature scaling, feature-based classifier, neural answer reranking, and label smoothing, all of which do not bring significant gains under our new MacroCE metric. Towards more effective calibration, we propose a new calibration method based on the model's prediction consistency along the training trajectory. This new method, which we name as consistency calibration, shows promise for better calibration.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/13/2022

Sample-dependent Adaptive Temperature Scaling for Improved Calibration

It is now well known that neural networks can be wrong with high confide...
research
08/16/2023

Dual-Branch Temperature Scaling Calibration for Long-Tailed Recognition

The calibration for deep neural networks is currently receiving widespre...
research
03/17/2022

Confidence Calibration for Intent Detection via Hyperspherical Space and Rebalanced Accuracy-Uncertainty Loss

Data-driven methods have achieved notable performance on intent detectio...
research
03/05/2023

Expectation consistency for calibration of neural networks

Despite their incredible performance, it is well reported that deep neur...
research
01/02/2021

Uncertainty-sensitive Activity Recognition: a Reliability Benchmark and the CARING Models

Beyond assigning the correct class, an activity recognition model should...
research
11/21/2022

AdaFocal: Calibration-aware Adaptive Focal Loss

Much recent work has been devoted to the problem of ensuring that a neur...
research
06/07/2023

Proximity-Informed Calibration for Deep Neural Networks

Confidence calibration is central to providing accurate and interpretabl...

Please sign up or login with your details

Forgot password? Click here to reset