A Study of Gender Impact in Self-supervised Models for Speech-to-Text Systems

04/04/2022
by   Marcely Zanon Boito, et al.
0

Self-supervised models for speech processing emerged recently as popular foundation blocks in speech processing pipelines. These models are pre-trained on unlabeled audio data and then used in speech processing downstream tasks such as automatic speech recognition (ASR) or speech translation (ST). Since these models are now used in research and industrial systems alike, it becomes necessary to understand the impact caused by some features such as gender distribution within pre-training data. Using French as our investigation language, we train and compare gender-specific wav2vec 2.0 models against models containing different degrees of gender balance in their pre-training data. The comparison is performed by applying these models to two speech-to-text downstream tasks: ASR and ST. Our results show that the type of downstream integration matters. We observe lower overall performance using gender-specific pre-training before fine-tuning an end-to-end ASR system. However, when self-supervised models are used as feature extractors, the overall ASR and ST results follow more complex patterns, in which the balanced pre-trained model is not necessarily the best option. Lastly, our crude 'fairness' metric, the relative performance difference measured between female and male test sets, does not display a strong variation from balanced to gender-specific pre-trained wav2vec 2.0 models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/15/2021

Don't speak too fast: The impact of data bias on self-supervised speech models

Self-supervised Speech Models (S3Ms) have been proven successful in many...
research
03/31/2022

How Does Pre-trained Wav2Vec2.0 Perform on Domain Shifted ASR? An Extensive Benchmark on Air Traffic Control Communications

Recent work on self-supervised pre-training focus on leveraging large-sc...
research
07/10/2021

Layer-wise Analysis of a Self-supervised Speech Representation Model

Recently proposed self-supervised learning approaches have been successf...
research
09/09/2022

Overlapped speech and gender detection with WavLM pre-trained features

This article focuses on overlapped speech and gender detection in order ...
research
06/11/2022

Investigation of Ensemble features of Self-Supervised Pretrained Models for Automatic Speech Recognition

Self-supervised learning (SSL) based models have been shown to generate ...
research
12/03/2022

Unsupervised Fine-Tuning Data Selection for ASR Using Self-Supervised Speech Models

Self-supervised learning (SSL) has been able to leverage unlabeled data ...
research
05/17/2022

Deploying self-supervised learning in the wild for hybrid automatic speech recognition

Self-supervised learning (SSL) methods have proven to be very successful...

Please sign up or login with your details

Forgot password? Click here to reset