DeepAI AI Chat
Log In Sign Up

Who Decides if AI is Fair? The Labels Problem in Algorithmic Auditing

by   Abhilash Mishra, et al.

Labelled "ground truth" datasets are routinely used to evaluate and audit AI algorithms applied in high-stakes settings. However, there do not exist widely accepted benchmarks for the quality of labels in these datasets. We provide empirical evidence that quality of labels can significantly distort the results of algorithmic audits in real-world settings. Using data annotators typically hired by AI firms in India, we show that fidelity of the ground truth data can lead to spurious differences in performance of ASRs between urban and rural populations. After a rigorous, albeit expensive, label cleaning process, these disparities between groups disappear. Our findings highlight how trade-offs between label quality and data annotation costs can complicate algorithmic audits in practice. They also emphasize the need for development of consensus-driven, widely accepted benchmarks for label quality.


page 1

page 2

page 3

page 4


Recovering Accurate Labeling Information from Partially Valid Data for Effective Multi-Label Learning

Partial Multi-label Learning (PML) aims to induce the multi-label predic...

Deep Learning for Earth Image Segmentation based on Imperfect Polyline Labels with Annotation Errors

In recent years, deep learning techniques (e.g., U-Net, DeepLab) have ac...

The 'Problem' of Human Label Variation: On Ground Truth in Data, Modeling and Evaluation

Human variation in labeling is often considered noise. Annotation projec...

Beyond Hard Labels: Investigating data label distributions

High-quality data is a key aspect of modern machine learning. However, l...

The Dataset Nutrition Label: A Framework To Drive Higher Data Quality Standards

Artificial intelligence (AI) systems built on incomplete or biased data ...

Water from Two Rocks: Maximizing the Mutual Information

Our goal is to forecast ground truth Y using two sources of information ...