Do Datasets Have Politics? Disciplinary Values in Computer Vision Dataset Development

08/09/2021
by   Morgan Klaus Scheuerman, et al.
0

Data is a crucial component of machine learning. The field is reliant on data to train, validate, and test models. With increased technical capabilities, machine learning research has boomed in both academic and industry settings, and one major focus has been on computer vision. Computer vision is a popular domain of machine learning increasingly pertinent to real-world applications, from facial recognition in policing to object detection for autonomous vehicles. Given computer vision's propensity to shape machine learning research and impact human life, we seek to understand disciplinary practices around dataset documentation - how data is collected, curated, annotated, and packaged into datasets for computer vision researchers and practitioners to use for model tuning and development. Specifically, we examine what dataset documentation communicates about the underlying values of vision data and the larger practices and goals of computer vision as a field. To conduct this study, we collected a corpus of about 500 computer vision datasets, from which we sampled 114 dataset publications across different vision tasks. Through both a structured and thematic content analysis, we document a number of values around accepted data practices, what makes desirable data, and the treatment of humans in the dataset construction process. We discuss how computer vision datasets authors value efficiency at the expense of care; universality at the expense of contextuality; impartiality at the expense of positionality; and model work at the expense of data work. Many of the silenced values we identify sit in opposition with social computing practices. We conclude with suggestions on how to better incorporate silenced values into the dataset creation and curation process.

READ FULL TEXT

page 1

page 12

research
08/05/2022

Bias and Fairness in Computer Vision Applications of the Criminal Justice System

Discriminatory practices involving AI-driven police work have been the s...
research
05/11/2023

Towards a Better Understanding of the Computer Vision Research Community in Africa

Computer vision is a broad field of study that encompasses different tas...
research
04/19/2022

A Tour of Visualization Techniques for Computer Vision Datasets

We survey a number of data visualization techniques for analyzing Comput...
research
09/20/2023

Dataset Factory: A Toolchain For Generative Computer Vision Datasets

Generative AI workflows heavily rely on data-centric tasks - such as fil...
research
08/12/2020

Online Graph Completion: Multivariate Signal Recovery in Computer Vision

The adoption of "human-in-the-loop" paradigms in computer vision and mac...
research
12/08/2022

Analysis of Deep Learning Architectures and Efficacy of Detecting Forest Fires

The aim of this research is to review the state of computer vision as ap...
research
02/14/2022

Can Machines Help Us Answering Question 16 in Datasheets, and In Turn Reflecting on Inappropriate Content?

Large datasets underlying much of current machine learning raise serious...

Please sign up or login with your details

Forgot password? Click here to reset