Statistical Inference for Fairness Auditing

05/05/2023
by   John J. Cherian, et al.
0

Before deploying a black-box model in high-stakes problems, it is important to evaluate the model's performance on sensitive subpopulations. For example, in a recidivism prediction task, we may wish to identify demographic groups for which our prediction model has unacceptably high false positive rates or certify that no such groups exist. In this paper, we frame this task, often referred to as "fairness auditing," in terms of multiple hypothesis testing. We show how the bootstrap can be used to simultaneously bound performance disparities over a collection of groups with statistical guarantees. Our methods can be used to flag subpopulations affected by model underperformance, and certify subpopulations for which the model performs adequately. Crucially, our audit is model-agnostic and applicable to nearly any performance metric or group fairness criterion. Our methods also accommodate extremely rich – even infinite – collections of subpopulations. Further, we generalize beyond subpopulations by showing how to assess performance over certain distribution shifts. We test the proposed methods on benchmark datasets in predictive inference and algorithmic fairness and find that our audits can provide interpretable and trustworthy guarantees.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/11/2022

Is calibration a fairness requirement? An argument from the point of view of moral philosophy and decision theory

In this paper, we provide a moral analysis of two criteria of statistica...
research
02/13/2023

The Possibility of Fairness: Revisiting the Impossibility Theorem in Practice

The “impossibility theorem” – which is considered foundational in algori...
research
09/03/2020

Fairness in the Eyes of the Data: Certifying Machine-Learning Models

We present a framework that allows to certify the fairness degree of a m...
research
08/21/2022

Statistical Methods for Assessing Differences in False Non-Match Rates Across Demographic Groups

Biometric recognition is used across a variety of applications from cybe...
research
06/08/2020

Achieving Equalized Odds by Resampling Sensitive Attributes

We present a flexible framework for learning predictive models that appr...
research
10/14/2020

Exchanging Lessons Between Algorithmic Fairness and Domain Generalization

Standard learning approaches are designed to perform well on average for...
research
11/18/2018

Understanding Learned Models by Identifying Important Features at the Right Resolution

In many application domains, it is important to characterize how complex...

Please sign up or login with your details

Forgot password? Click here to reset