Resolving the Human Subjects Status of Machine Learning's Crowdworkers

06/08/2022
by   Divyansh Kaushik, et al.
0

In recent years, machine learning (ML) has come to rely more heavily on crowdworkers, both for building bigger datasets and for addressing research questions requiring human interaction or judgment. Owing to the diverse tasks performed by crowdworkers, and the myriad ways the resulting datasets are used, it can be difficult to determine when these individuals are best thought of as workers, versus as human subjects. These difficulties are compounded by conflicting policies, with some institutions and researchers treating all ML crowdwork as human subjects research, and other institutions holding that ML crowdworkers rarely constitute human subjects. Additionally, few ML papers involving crowdwork mention IRB oversight, raising the prospect that many might not be in compliance with ethical and regulatory requirements. In this paper, we focus on research in natural language processing to investigate the appropriate designation of crowdsourcing studies and the unique challenges that ML research poses for research oversight. Crucially, under the U.S. Common Rule, these judgments hinge on determinations of "aboutness", both whom (or what) the collected data is about and whom (or what) the analysis is about. We highlight two challenges posed by ML: (1) the same set of workers can serve multiple roles and provide many sorts of information; and (2) compared to the life sciences and social sciences, ML research tends to embrace a dynamic workflow, where research questions are seldom stated ex ante and data sharing opens the door for future studies to ask questions about different targets from the original study. In particular, our analysis exposes a potential loophole in the Common Rule, where researchers can elude research ethics oversight by splitting data collection and analysis into distinct studies. We offer several policy recommendations to address these concerns.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/20/2021

Machine learning in the social and health sciences

The uptake of machine learning (ML) approaches in the social and health ...
research
12/03/2020

Ethical Testing in the Real World: Evaluating Physical Testing of Adversarial Machine Learning

This paper critically assesses the adequacy and representativeness of ph...
research
06/01/2023

The ethical ambiguity of AI data enrichment: Measuring gaps in research ethics norms and practices

The technical progression of artificial intelligence (AI) research has b...
research
04/20/2021

Beyond Fair Pay: Ethical Implications of NLP Crowdsourcing

The use of crowdworkers in NLP research is growing rapidly, in tandem wi...
research
12/26/2019

The Application of Machine Learning Techniques for Predicting Results in Team Sport: A Review

Over the past two decades, Machine Learning (ML) techniques have been in...
research
09/09/2021

Toward a Perspectivist Turn in Ground Truthing for Predictive Computing

Most Artificial Intelligence applications are based on supervised machin...
research
11/19/2020

Social Determinants of Recidivism: A Machine Learning Solution

Current literature in criminal justice analytics often focuses on predic...

Please sign up or login with your details

Forgot password? Click here to reset