Shedding light on underrepresentation and Sampling Bias in machine learning

06/08/2023
by   Sami Zhioua, et al.
0

Accurately measuring discrimination is crucial to faithfully assessing fairness of trained machine learning (ML) models. Any bias in measuring discrimination leads to either amplification or underestimation of the existing disparity. Several sources of bias exist and it is assumed that bias resulting from machine learning is born equally by different groups (e.g. females vs males, whites vs blacks, etc.). If, however, bias is born differently by different groups, it may exacerbate discrimination against specific sub-populations. Sampling bias, is inconsistently used in the literature to describe bias due to the sampling procedure. In this paper, we attempt to disambiguate this term by introducing clearly defined variants of sampling bias, namely, sample size bias (SSB) and underrepresentation bias (URB). We show also how discrimination can be decomposed into variance, bias, and noise. Finally, we challenge the commonly accepted mitigation approach that discrimination can be addressed by collecting more samples of the underrepresented group.

READ FULL TEXT

page 14

page 15

research
06/14/2023

Compatibility of Fairness Metrics with EU Non-Discrimination Laws: Demographic Parity Conditional Demographic Disparity

Empirical evidence suggests that algorithmic decisions driven by Machine...
research
12/13/2022

Simplicity Bias Leads to Amplified Performance Disparities

The simple idea that not all things are equally difficult has surprising...
research
04/13/2022

Estimating Structural Disparities for Face Models

In machine learning, disparity metrics are often defined by measuring th...
research
07/14/2020

A Normative approach to Attest Digital Discrimination

Digital discrimination is a form of discrimination whereby users are aut...
research
01/27/2023

Aleatoric and Epistemic Discrimination in Classification

Machine learning (ML) models can underperform on certain population grou...
research
04/01/2020

Bias in Machine Learning What is it Good (and Bad) for?

In public media as well as in scientific publications, the term bias is ...
research
08/14/2021

TRAPDOOR: Repurposing backdoors to detect dataset bias in machine learning-based genomic analysis

Machine Learning (ML) has achieved unprecedented performance in several ...

Please sign up or login with your details

Forgot password? Click here to reset