Retiring Adult: New Datasets for Fair Machine Learning

08/10/2021
by   Frances Ding, et al.
0

Although the fairness community has recognized the importance of data, researchers in the area primarily rely on UCI Adult when it comes to tabular data. Derived from a 1994 US Census survey, this dataset has appeared in hundreds of research papers where it served as the basis for the development and comparison of many algorithmic fairness interventions. We reconstruct a superset of the UCI Adult data from available US Census sources and reveal idiosyncrasies of the UCI Adult dataset that limit its external validity. Our primary contribution is a suite of new datasets derived from US Census surveys that extend the existing data ecosystem for research on fair machine learning. We create prediction tasks relating to income, employment, health, transportation, and housing. The data span multiple years and all states of the United States, allowing researchers to study temporal shift and geographic variation. We highlight a broad initial sweep of new empirical insights relating to trade-offs between fairness criteria, performance of algorithmic interventions, and the role of distribution shift based on our new datasets. Our findings inform ongoing debates, challenge some existing narratives, and point to future research directions. Our datasets are available at https://github.com/zykls/folktables.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/31/2023

A Suite of Fairness Datasets for Tabular Classification

There have been many papers with algorithms for improving fairness of ma...
research
07/05/2022

Developing a Philosophical Framework for Fair Machine Learning: The Case of Algorithmic Collusion and Market Fairness

Fair machine learning research has been primarily concerned with classif...
research
07/20/2020

An Empirical Characterization of Fair Machine Learning For Clinical Risk Prediction

The use of machine learning to guide clinical decision making has the po...
research
06/15/2023

Harvard Glaucoma Fairness: A Retinal Nerve Disease Dataset for Fairness Learning and Fair Identity Normalization

Fairness in machine learning is important for societal well-being, but l...
research
10/20/2018

The Frontiers of Fairness in Machine Learning

The last few years have seen an explosion of academic and popular intere...
research
03/25/2021

Fairness in Ranking: A Survey

In the past few years, there has been much work on incorporating fairnes...
research
10/15/2020

An Open-Source Dataset on Dietary Behaviors and DASH Eating Plan Optimization Constraints

Linear constrained optimization techniques have been applied to many rea...

Please sign up or login with your details

Forgot password? Click here to reset