Understanding Machine Learning Practitioners' Data Documentation Perceptions, Needs, Challenges, and Desiderata

06/06/2022
by   Amy Heger, et al.
5

Data is central to the development and evaluation of machine learning (ML) models. However, the use of problematic or inappropriate datasets can result in harms when the resulting models are deployed. To encourage responsible AI practice through more deliberate reflection on datasets and transparency around the processes by which they are created, researchers and practitioners have begun to advocate for increased data documentation and have proposed several data documentation frameworks. However, there is little research on whether these data documentation frameworks meet the needs of ML practitioners, who both create and consume datasets. To address this gap, we set out to understand ML practitioners' data documentation perceptions, needs, challenges, and desiderata, with the goal of deriving design requirements that can inform future data documentation frameworks. We conducted a series of semi-structured interviews with 14 ML practitioners at a single large, international technology company. We had them answer a list of questions taken from datasheets for datasets (Gebru, 2021). Our findings show that current approaches to data documentation are largely ad hoc and myopic in nature. Participants expressed needs for data documentation frameworks to be adaptable to their contexts, integrated into their existing tools and workflows, and automated wherever possible. Despite the fact that data documentation frameworks are often motivated from the perspective of responsible AI, participants did not make the connection between the questions that they were asked to answer and their responsible AI implications. In addition, participants often had difficulties prioritizing the needs of dataset consumers and providing information that someone unfamiliar with their datasets might need to know. Based on these findings, we derive seven design requirements for future data documentation frameworks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/13/2018

Improving fairness in machine learning systems: What do industry practitioners need?

The potential for machine learning (ML) systems to amplify social inequi...
research
01/07/2022

Developing Assistive Technology to Support Reminiscence Therapy: A User-Centered Study to Identify Caregivers' Needs

Reminiscence therapy is an inexpensive non-pharmacological therapy commo...
research
03/19/2023

Right the docs: Characterising voice dataset documentation practices used in machine learning

Voice-enabled technology is quickly becoming ubiquitous, and is constitu...
research
04/23/2020

Human Factors in Model Interpretability: Industry Practices, Challenges, and Needs

As the use of machine learning (ML) models in product development and da...
research
11/22/2022

Automated, not Automatic: Needs and Practices in European Fact-checking Organizations as a basis for Designing Human-centered AI Systems

To mitigate the negative effects of false information more effectively, ...
research
03/21/2022

Whose AI Dream? In search of the aspiration in data annotation

This paper present the practice of data annotation from the perspective ...
research
06/05/2023

AHA!: Facilitating AI Impact Assessment by Generating Examples of Harms

While demands for change and accountability for harmful AI consequences ...

Please sign up or login with your details

Forgot password? Click here to reset