Algorithmic Bias and Data Bias: Understanding the Relation between Distributionally Robust Optimization and Data Curation

07/24/2021
by   Agnieszka Słowik, et al.
0

Machine learning systems based on minimizing average error have been shown to perform inconsistently across notable subsets of the data, which is not exposed by a low average error for the entire dataset. In consequential social and economic applications, where data represent people, this can lead to discrimination of under- represented gender and ethnic groups. Given the importance of bias mitigation in machine learning, the topic leads to contentious debates on how to ensure fairness in practice (data bias versus algorithmic bias). Distributionally Robust Optimiza- tion (DRO) seemingly addresses this problem by minimizing the worst expected risk across subpopulations. We establish theoretical results that clarify the relation between DRO and the optimization of the same loss averaged on an adequately weighted training dataset. The results cover finite and infinite number of training distributions, as well as convex and non-convex loss functions. We show that nei- ther DRO nor curating the training set should be construed as a complete solution for bias mitigation: in the same way that there is no universally robust training set, there is no universal way to setup a DRO problem and ensure a socially acceptable set of results. We then leverage these insights to provide a mininal set of practical recommendations for addressing bias with DRO. Finally, we discuss ramifications of our results in other related applications of DRO, using an example of adversarial robustness. Our results show that there is merit to both the algorithm-focused and the data-focused side of the bias debate, as long as arguments in favor of these positions are precisely qualified and backed by relevant mathematics known today.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/16/2018

Some HCI Priorities for GDPR-Compliant Machine Learning

In this short paper, we consider the roles of HCI in enabling the better...
research
03/23/2020

Fairway: A Way to Build Fair ML Software

Machine learning software is increasingly being used to make decisions t...
research
03/23/2020

Fairway: SE Principles for Building Fairer Software

Machine learning software is increasingly being used to make decisions t...
research
05/24/2022

Bias Discovery in Machine Learning Models for Mental Health

Fairness and bias are crucial concepts in artificial intelligence, yet t...
research
07/31/2023

No Fair Lunch: A Causal Perspective on Dataset Bias in Machine Learning for Medical Imaging

As machine learning methods gain prominence within clinical decision-mak...
research
06/08/2023

Are fairness metric scores enough to assess discrimination biases in machine learning?

This paper presents novel experiments shedding light on the shortcomings...
research
04/13/2022

Achieving Representative Data via Convex Hull Feasibility Sampling Algorithms

Sampling biases in training data are a major source of algorithmic biase...

Please sign up or login with your details

Forgot password? Click here to reset