Improving Fairness and Robustness in End-to-End Speech Recognition through unsupervised clustering

06/06/2023
by   Irina-Elena Veliche, et al.
0

The challenge of fairness arises when Automatic Speech Recognition (ASR) systems do not perform equally well for all sub-groups of the population. In the past few years there have been many improvements in overall speech recognition quality, but without any particular focus on advancing Equality and Equity for all user groups for whom systems do not perform well. ASR fairness is therefore also a robustness issue. Meanwhile, data privacy also takes priority in production systems. In this paper, we present a privacy preserving approach to improve fairness and robustness of end-to-end ASR without using metadata, zip codes, or even speaker or utterance embeddings directly in training. We extract utterance level embeddings using a speaker ID model trained on a public dataset, which we then use in an unsupervised fashion to create acoustic clusters. We use cluster IDs instead of speaker utterance embeddings as extra features during model training, which shows improvements for all demographic groups and in particular for different accents.

READ FULL TEXT
research
08/13/2019

End-to-End Multi-Speaker Speech Recognition using Speaker Embeddings and Transfer Learning

This paper presents our latest investigation on end-to-end automatic spe...
research
07/22/2022

Toward Fairness in Speech Recognition: Discovery and mitigation of performance disparities

As for other forms of AI, speech recognition has recently been examined ...
research
07/25/2023

On-Device Speaker Anonymization of Acoustic Embeddings for ASR based onFlexible Location Gradient Reversal Layer

Smart devices serviced by large-scale AI models necessitates user data t...
research
09/19/2021

Model-Based Approach for Measuring the Fairness in ASR

The issue of fairness arises when the automatic speech recognition (ASR)...
research
08/07/2020

Investigation of Speaker-adaptation methods in Transformer based ASR

End-to-end models are fast replacing conventional hybrid models in autom...
research
05/01/2022

Bilingual End-to-End ASR with Byte-Level Subwords

In this paper, we investigate how the output representation of an end-to...
research
04/29/2021

Improving Fairness in Speaker Recognition

The human voice conveys unique characteristics of an individual, making ...

Please sign up or login with your details

Forgot password? Click here to reset