Considerations for Ethical Speech Recognition Datasets

Speech AI Technologies are largely trained on publicly available datasets or by the massive web-crawling of speech. In both cases, data acquisition focuses on minimizing collection effort, without necessarily taking the data subjects' protection or user needs into consideration. This results to models that are not robust when used on users who deviate from the dominant demographics in the training set, discriminating individuals having different dialects, accents, speaking styles, and disfluencies. In this talk, we use automatic speech recognition as a case study and examine the properties that ethical speech datasets should possess towards responsible AI applications. We showcase diversity issues, inclusion practices, and necessary considerations that can improve trained models, while facilitating model explainability and protecting users and data subjects. We argue for the legal privacy protection of data subjects, targeted data sampling corresponding to user demographics needs, appropriate meta data that ensure explainability accountability in cases of model failure, and the sociotechnical & situated model design. We hope this talk can inspire researchers & practitioners to design and use more human-centric datasets in speech technologies and other domains, in ways that empower and respect users, while improving machine learning models' robustness and utility.

READ FULL TEXT

page 1

page 2

page 3

research
05/08/2023

Augmented Datasheets for Speech Datasets and Ethical Decision-Making

Speech datasets are crucial for training Speech Language Technologies (S...
research
02/07/2023

Ethical Considerations for Collecting Human-Centric Image Datasets

Human-centric image datasets are critical to the development of computer...
research
11/20/2020

Training Ethically Responsible AI Researchers: a Case Study

Ethical oversight of AI research is beset by a number of problems. There...
research
08/22/2023

Collect, Measure, Repeat: Reliability Factors for Responsible AI Data Collection

The rapid entry of machine learning approaches in our daily activities a...
research
07/02/2021

Ethics Sheets for AI Tasks

Several high-profile events, such as the use of biased recidivism system...
research
08/21/2019

AI and Accessibility: A Discussion of Ethical Considerations

According to the World Health Organization, more than one billion people...
research
08/10/2021

Modeling and Evaluating Personas with Software Explainability Requirements

This work focuses on the context of software explainability, which is th...

Please sign up or login with your details

Forgot password? Click here to reset