Making Heads and Tails of Models with Marginal Calibration for Sparse Tagsets

09/15/2021
by   Michael Kranzlein, et al.
0

For interpreting the behavior of a probabilistic model, it is useful to measure a model's calibration–the extent to which it produces reliable confidence scores. We address the open problem of calibration for tagging models with sparse tagsets, and recommend strategies to measure and reduce calibration error (CE) in such models. We show that several post-hoc recalibration techniques all reduce calibration error across the marginal distribution for two existing sequence taggers. Moreover, we propose tag frequency grouping (TFG) as a way to measure calibration error in different frequency bands. Further, recalibrating each group separately promotes a more equitable reduction of calibration error across the tag frequency spectrum.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/15/2020

Mitigating bias in calibration error estimation

Building reliable machine learning systems requires that we correctly un...
research
02/22/2021

Localized Calibration: Metrics and Recalibration

Probabilistic classifiers output confidence scores along with their pred...
research
05/15/2021

Calibrating sufficiently

When probabilistic classifiers are trained and calibrated, the so-called...
research
10/28/2022

Beyond calibration: estimating the grouping loss of modern neural networks

Good decision making requires machine-learning models to provide trustwo...
research
03/08/2022

Honest calibration assessment for binary outcome predictions

Probability predictions from binary regressions or machine learning meth...
research
02/08/2023

On the Richness of Calibration

Probabilistic predictions can be evaluated through comparisons with obse...
research
11/14/2022

Calibrated Interpretation: Confidence Estimation in Semantic Parsing

Task-oriented semantic parsing is increasingly being used in user-facing...

Please sign up or login with your details

Forgot password? Click here to reset