CrowdWorkSheets: Accounting for Individual and Collective Identities Underlying Crowdsourced Dataset Annotation

06/09/2022
by   Mark Díaz, et al.
0

Human annotated data plays a crucial role in machine learning (ML) research and development. However, the ethical considerations around the processes and decisions that go into dataset annotation have not received nearly enough attention. In this paper, we survey an array of literature that provides insights into ethical considerations around crowdsourced dataset annotation. We synthesize these insights, and lay out the challenges in this space along two layers: (1) who the annotator is, and how the annotators' lived experiences can impact their annotations, and (2) the relationship between the annotators and the crowdsourcing platforms, and what that relationship affords them. Finally, we introduce a novel framework, CrowdWorkSheets, for dataset developers to facilitate transparent documentation of key decisions points at various stages of the data annotation pipeline: task formulation, selection of annotators, platform and infrastructure choices, dataset analysis and evaluation, and dataset release and maintenance.

READ FULL TEXT
research
12/08/2021

Whose Ground Truth? Accounting for Individual and Collective Identities Underlying Dataset Annotation

Human annotations play a crucial role in machine learning (ML) research ...
research
09/22/2020

Ethical Machine Learning in Health Care

The use of machine learning (ML) in health care raises numerous ethical ...
research
05/04/2023

MLHOps: Machine Learning for Healthcare Operations

Machine Learning Health Operations (MLHOps) is the combination of proces...
research
04/03/2022

Data Cards: Purposeful and Transparent Dataset Documentation for Responsible AI

As research and industry moves towards large-scale models capable of num...
research
08/23/2021

Analyzing the Granularity and Cost of Annotation in Clinical Sequence Labeling

Well-annotated datasets, as shown in recent top studies, are becoming mo...
research
10/28/2020

Towards Ethics by Design in Online Abusive Content Detection

To support safety and inclusion in online communications, significant ef...

Please sign up or login with your details

Forgot password? Click here to reset