Provable Detection of Propagating Sampling Bias in Prediction Models

02/13/2023
by   Pavan Ravishankar, et al.
0

With an increased focus on incorporating fairness in machine learning models, it becomes imperative not only to assess and mitigate bias at each stage of the machine learning pipeline but also to understand the downstream impacts of bias across stages. Here we consider a general, but realistic, scenario in which a predictive model is learned from (potentially biased) training data, and model predictions are assessed post-hoc for fairness by some auditing method. We provide a theoretical analysis of how a specific form of data bias, differential sampling bias, propagates from the data stage to the prediction stage. Unlike prior work, we evaluate the downstream impacts of data biases quantitatively rather than qualitatively and prove theoretical guarantees for detection. Under reasonable assumptions, we quantify how the amount of bias in the model predictions varies as a function of the amount of differential sampling bias in the data, and at what point this bias becomes provably detectable by the auditor. Through experiments on two criminal justice datasets – the well-known COMPAS dataset and historical data from NYPD's stop and frisk policy – we demonstrate that the theoretical results hold in practice even when our assumptions are relaxed.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/24/2020

Efficiently Mitigating Classification Bias via Transfer Learning

Prediction bias in machine learning models refers to unintended model be...
research
02/24/2022

Attainability and Optimality: The Equalized Odds Fairness Revisited

Fairness of machine learning algorithms has been of increasing interest....
research
01/15/2019

Identifying and Correcting Label Bias in Machine Learning

Datasets often contain biases which unfairly disadvantage certain groups...
research
10/25/2021

DECAF: Generating Fair Synthetic Data Using Causally-Aware Generative Networks

Machine learning models have been criticized for reflecting unfair biase...
research
09/28/2020

Why resampling outperforms reweighting for correcting sampling bias

A data set sampled from a certain population is biased if the subgroups ...
research
04/20/2023

The Dataset Multiplicity Problem: How Unreliable Data Impacts Predictions

We introduce dataset multiplicity, a way to study how inaccuracies, unce...
research
07/31/2021

Bayesian analysis of the prevalence bias: learning and predicting from imbalanced data

Datasets are rarely a realistic approximation of the target population. ...

Please sign up or login with your details

Forgot password? Click here to reset