On the Richness of Calibration

02/08/2023
by   Benedikt Höltgen, et al.
0

Probabilistic predictions can be evaluated through comparisons with observed label frequencies, that is, through the lens of calibration. Recent scholarship on algorithmic fairness has started to look at a growing variety of calibration-based objectives under the name of multi-calibration but has still remained fairly restricted. In this paper, we explore and analyse forms of evaluation through calibration by making explicit the choices involved in designing calibration scores. We organise these into three grouping choices and a choice concerning the agglomeration of group errors. This provides a framework for comparing previously proposed calibration scores and helps to formulate novel ones with desirable mathematical properties. In particular, we explore the possibility of grouping datapoints based on their input features rather than on predictions and formally demonstrate advantages of such approaches. We also characterise the space of suitable agglomeration functions for group errors, generalising previously proposed calibration scores. Complementary to such population-level scores, we explore calibration scores at the individual level and analyse their relationship to choices of grouping. We draw on these insights to introduce and axiomatise fairness deviation measures for population-level scores. We demonstrate that with appropriate choices of grouping, these novel global fairness scores can provide notions of (sub-)group or individual fairness.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/25/2022

Gerrymandering Individual Fairness

Individual fairness, proposed by Dwork et al., is a fairness measure tha...
research
10/28/2022

Beyond calibration: estimating the grouping loss of modern neural networks

Good decision making requires machine-learning models to provide trustwo...
research
03/08/2023

HappyMap: A Generalized Multi-calibration Method

Multi-calibration is a powerful and evolving concept originating in the ...
research
05/15/2021

Calibrating sufficiently

When probabilistic classifiers are trained and calibrated, the so-called...
research
08/26/2018

Discriminative but Not Discriminatory: A Comparison of Fairness Definitions under Different Worldviews

We mathematically compare three competing definitions of group-level non...
research
06/06/2023

Matched Pair Calibration for Ranking Fairness

We propose a test of fairness in score-based ranking systems called matc...
research
09/15/2021

Making Heads and Tails of Models with Marginal Calibration for Sparse Tagsets

For interpreting the behavior of a probabilistic model, it is useful to ...

Please sign up or login with your details

Forgot password? Click here to reset