Gender Representation in Open Source Speech Resources

03/18/2020
by   Mahault Garnerin, et al.
0

With the rise of artificial intelligence (AI) and the growing use of deep-learning architectures, the question of ethics, transparency and fairness of AI systems has become a central concern within the research community. We address transparency and fairness in spoken language systems by proposing a study about gender representation in speech resources available through the Open Speech and Language Resource platform. We show that finding gender information in open source corpora is not straightforward and that gender balance depends on other corpus characteristics (elicited/non elicited speech, low/high resource language, speech task targeted). The paper ends with recommendations about metadata and gender information for researchers in order to assure better transparency of the speech systems built using such corpora.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/24/2022

Lahjoita puhetta – a large-scale corpus of spoken Finnish with some benchmarks

The Donate Speech campaign has so far succeeded in gathering approximate...
research
10/14/2020

Google Crowdsourced Speech Corpora and Related Open-Source Resources for Low-Resource Languages and Dialects: An Overview

This paper presents an overview of a program designed to address the gro...
research
12/31/2020

Open Korean Corpora: A Practical Report

Korean is often referred to as a low-resource language in the research c...
research
09/09/2022

Overlapped speech and gender detection with WavLM pre-trained features

This article focuses on overlapped speech and gender detection in order ...
research
06/03/2021

A diachronic evaluation of gender asymmetry in euphemism

The use of euphemisms is a known driver of language change. It has been ...
research
04/18/2023

Wizundry: A Cooperative Wizard of Oz Platform for Simulating Future Speech-based Interfaces with Multiple Wizards

Wizard of Oz (WoZ) as a prototyping method has been used to simulate int...
research
02/08/2018

Praaline: Integrating Tools for Speech Corpus Research

This paper presents Praaline, an open-source software system for managin...

Please sign up or login with your details

Forgot password? Click here to reset