Studying Up Machine Learning Data: Why Talk About Bias When We Mean Power?

09/16/2021
by   Milagros Miceli, et al.
0

Research in machine learning (ML) has primarily argued that models trained on incomplete or biased datasets can lead to discriminatory outputs. In this commentary, we propose moving the research focus beyond bias-oriented framings by adopting a power-aware perspective to "study up" ML datasets. This means accounting for historical inequities, labor conditions, and epistemological standpoints inscribed in data. We draw on HCI and CSCW work to support our argument, critically analyze previous research, and point at two co-existing lines of work within our community – one bias-oriented, the other power-aware. This way, we highlight the need for dialogue and cooperation in three areas: data quality, data work, and data documentation. In the first area, we argue that reducing societal problems to "bias" misses the context-based nature of data. In the second one, we highlight the corporate forces and market imperatives involved in the labor of data workers that subsequently shape ML datasets. Finally, we propose expanding current transparency-oriented efforts in dataset documentation to reflect the social contexts of data design and production.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/28/2021

Algorithmic Factors Influencing Bias in Machine Learning

It is fair to say that many of the prominent examples of bias in Machine...
research
05/24/2022

The Data-Production Dispositif

Machine learning (ML) depends on data to train and verify models. Very o...
research
07/11/2022

Documenting Data Production Processes: A Participatory Approach for Data Work

The opacity of machine learning data is a significant threat to ethical ...
research
07/19/2021

Introducing a Family of Synthetic Datasets for Research on Bias in Machine Learning

A significant impediment to progress in research on bias in machine lear...
research
07/15/2023

Visual Analytics For Machine Learning: A Data Perspective Survey

The past decade has witnessed a plethora of works that leverage the powe...
research
08/14/2021

TRAPDOOR: Repurposing backdoors to detect dataset bias in machine learning-based genomic analysis

Machine Learning (ML) has achieved unprecedented performance in several ...
research
11/06/2019

Designing Evaluations of Machine Learning Models for Subjective Inference: The Case of Sentence Toxicity

Machine Learning (ML) is increasingly applied in real-life scenarios, ra...

Please sign up or login with your details

Forgot password? Click here to reset