Calibration of Pre-trained Transformers

03/17/2020
by   Shrey Desai, et al.
0

Pre-trained Transformers are now ubiquitous in natural language processing, but despite their high end-task performance, little is known empirically about whether they are calibrated. Specifically, do these models' posterior probabilities provide an accurate empirical measure of how likely the model is to be correct on a given example? We focus on BERT and RoBERTa in this work, and analyze their calibration across three tasks: natural language inference, paraphrase detection, and commonsense reasoning. For each task, we consider in-domain as well as challenging out-of-domain settings, where models face more examples they should be uncertain about. We show that: (1) when used out-of-the-box, pre-trained models are calibrated in-domain, and compared to baselines, their calibration error out-of-domain can be as much as 3.5x lower; (2) temperature scaling is effective at further reducing calibration error in-domain, and using label smoothing to deliberately increase empirical uncertainty helps calibrate posteriors out-of-domain.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/14/2022

On the Calibration of Pre-trained Language Models using Mixup Guided by Area Under the Margin and Saliency

A well-calibrated neural model produces confidence (probability outputs)...
research
05/28/2022

Teaching Models to Express Their Uncertainty in Words

We show that a GPT-3 model can learn to express uncertainty about its ow...
research
11/06/2022

Calibration Meets Explanation: A Simple and Effective Approach for Model Confidence Estimates

Calibration strengthens the trustworthiness of black-box models by produ...
research
12/09/2021

Obtaining Calibrated Probabilities with Personalized Ranking Models

For personalized ranking models, the well-calibrated probability of an i...
research
08/25/2022

Calibrated Selective Classification

Selective classification allows models to abstain from making prediction...
research
08/21/2015

Posterior calibration and exploratory analysis for natural language processing models

Many models in natural language processing define probabilistic distribu...
research
07/05/2023

Set Learning for Accurate and Calibrated Models

Model overconfidence and poor calibration are common in machine learning...

Please sign up or login with your details

Forgot password? Click here to reset