FLEA: Provably Fair Multisource Learning from Unreliable Training Data

06/22/2021
by   Eugenia Iofinova, et al.
0

Fairness-aware learning aims at constructing classifiers that not only make accurate predictions, but do not discriminate against specific groups. It is a fast-growing area of machine learning with far-reaching societal impact. However, existing fair learning methods are vulnerable to accidental or malicious artifacts in the training data, which can cause them to unknowingly produce unfair classifiers. In this work we address the problem of fair learning from unreliable training data in the robust multisource setting, where the available training data comes from multiple sources, a fraction of which might be not representative of the true data distribution. We introduce FLEA, a filtering-based algorithm that allows the learning system to identify and suppress those data sources that would have a negative impact on fairness or accuracy if they were used for training. We show the effectiveness of our approach by a diverse range of experiments on multiple datasets. Additionally we prove formally that, given enough data, FLEA protects the learner against unreliable data as long as the fraction of affected data sources is less than half.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/12/2020

Ensuring Fairness Beyond the Training Data

We initiate the study of fair classifiers that are robust to perturbatio...
research
04/13/2022

Achieving Representative Data via Convex Hull Feasibility Sampling Algorithms

Sampling biases in training data are a major source of algorithmic biase...
research
03/05/2019

Copying Machine Learning Classifiers

We study model-agnostic copies of machine learning classifiers. We devel...
research
01/29/2019

Robust Learning from Untrusted Sources

Modern machine learning methods often require more data for training tha...
research
10/18/2022

Towards Fair Classification against Poisoning Attacks

Fair classification aims to stress the classification models to achieve ...
research
02/24/2020

On the Sample Complexity of Adversarial Multi-Source PAC Learning

We study the problem of learning from multiple untrusted data sources, a...
research
07/07/2021

Impossibility results for fair representations

With the growing awareness to fairness in machine learning and the reali...

Please sign up or login with your details

Forgot password? Click here to reset