Adaptive Sampling to Reduce Disparate Performance

by   Jacob Abernethy, et al.

Existing methods for reducing disparate performance of a classifier across different demographic groups assume that one has access to a large data set, thereby focusing on the algorithmic aspect of optimizing overall performance subject to additional constraints. However, poor data collection and imbalanced data sets can severely affect the quality of these methods. In this work, we consider a setting where data collection and optimization are performed simultaneously. In such a scenario, a natural strategy to mitigate the performance difference of the classifier is to provide additional training data drawn from the demographic groups that are worse off. In this paper, we propose to consistently follow this strategy throughout the whole training process and to guide the resulting classifier towards equal performance on the different groups by adaptively sampling each data point from the group that is currently disadvantaged. We provide a rigorous theoretical analysis of our approach in a simplified one-dimensional setting and an extensive experimental evaluation on numerous real-world data sets, including a case study on the data collected during the Flint water crisis.



There are no comments yet.


page 1

page 2

page 3

page 4


Classification of Imbalanced Credit scoring data sets Based on Ensemble Method with the Weighted-Hybrid-Sampling

In the era of big data, the utilization of credit-scoring models to dete...

Recovering from Biased Data: Can Fairness Constraints Improve Accuracy?

Multiple fairness constraints have been proposed in the literature, moti...

"What We Can't Measure, We Can't Understand": Challenges to Demographic Data Procurement in the Pursuit of Fairness

As calls for fair and unbiased algorithmic systems increase, so too does...

An Empirical Evaluation of the t-SNE Algorithm for Data Visualization in Structural Engineering

A fundamental task in machine learning involves visualizing high-dimensi...

Fair for All: Best-effort Fairness Guarantees for Classification

Standard approaches to group-based notions of fairness, such as parity a...

Exploiting Class Learnability in Noisy Data

In many domains, collecting sufficient labeled training data for supervi...

Efficient SVDD Sampling with Approximation Guarantees for the Decision Boundary

Support Vector Data Description (SVDD) is a popular one-class classifier...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.