DeepAI AI Chat
Log In Sign Up

Adaptive Sampling to Reduce Disparate Performance

by   Jacob Abernethy, et al.

Existing methods for reducing disparate performance of a classifier across different demographic groups assume that one has access to a large data set, thereby focusing on the algorithmic aspect of optimizing overall performance subject to additional constraints. However, poor data collection and imbalanced data sets can severely affect the quality of these methods. In this work, we consider a setting where data collection and optimization are performed simultaneously. In such a scenario, a natural strategy to mitigate the performance difference of the classifier is to provide additional training data drawn from the demographic groups that are worse off. In this paper, we propose to consistently follow this strategy throughout the whole training process and to guide the resulting classifier towards equal performance on the different groups by adaptively sampling each data point from the group that is currently disadvantaged. We provide a rigorous theoretical analysis of our approach in a simplified one-dimensional setting and an extensive experimental evaluation on numerous real-world data sets, including a case study on the data collected during the Flint water crisis.


page 1

page 2

page 3

page 4


Adaptive Sampling Strategies to Construct Equitable Training Datasets

In domains ranging from computer vision to natural language processing, ...

Classification of Imbalanced Credit scoring data sets Based on Ensemble Method with the Weighted-Hybrid-Sampling

In the era of big data, the utilization of credit-scoring models to dete...

Recovering from Biased Data: Can Fairness Constraints Improve Accuracy?

Multiple fairness constraints have been proposed in the literature, moti...

Improving fairness in speaker verification via Group-adapted Fusion Network

Modern speaker verification models use deep neural networks to encode ut...

Training Fair Deep Neural Networks by Balancing Influence

Most fair machine learning methods either highly rely on the sensitive i...

Understanding the Representation and Representativeness of Age in AI Data Sets

A diverse representation of different demographic groups in AI training ...