Whose Ground Truth? Accounting for Individual and Collective Identities Underlying Dataset Annotation

12/08/2021
by   Emily Denton, et al.
12

Human annotations play a crucial role in machine learning (ML) research and development. However, the ethical considerations around the processes and decisions that go into building ML datasets has not received nearly enough attention. In this paper, we survey an array of literature that provides insights into ethical considerations around crowdsourced dataset annotation. We synthesize these insights, and lay out the challenges in this space along two layers: (1) who the annotator is, and how the annotators' lived experiences can impact their annotations, and (2) the relationship between the annotators and the crowdsourcing platforms and what that relationship affords them. Finally, we put forth a concrete set of recommendations and considerations for dataset developers at various stages of the ML data pipeline: task formulation, selection of annotators, platform and infrastructure choices, dataset analysis and evaluation, and dataset documentation and release.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/09/2022

CrowdWorkSheets: Accounting for Individual and Collective Identities Underlying Crowdsourced Dataset Annotation

Human annotated data plays a crucial role in machine learning (ML) resea...
research
09/22/2020

Ethical Machine Learning in Health Care

The use of machine learning (ML) in health care raises numerous ethical ...
research
07/19/2023

Beyond the ML Model: Applying Safety Engineering Frameworks to Text-to-Image Development

Identifying potential social and ethical risks in emerging machine learn...
research
06/08/2020

The Big Picture: Ethical Considerations and Statistical Analysis of Industry Involvement in Machine Learning Research

It is commonly believed among the machine learning (ML) community that i...
research
07/12/2022

A Conceptual Framework for Using Machine Learning to Support Child Welfare Decisions

Human services systems make key decisions that impact individuals in the...
research
11/04/2022

The 'Problem' of Human Label Variation: On Ground Truth in Data, Modeling and Evaluation

Human variation in labeling is often considered noise. Annotation projec...
research
10/18/2021

The Problem of Zombie Datasets:A Framework For Deprecating Datasets

What happens when a machine learning dataset is deprecated for legal, et...

Please sign up or login with your details

Forgot password? Click here to reset