ASR4REAL: An extended benchmark for speech models

by   Morgane Riviere, et al.

Popular ASR benchmarks such as Librispeech and Switchboard are limited in the diversity of settings and speakers they represent. We introduce a set of benchmarks matching real-life conditions, aimed at spotting possible biases and weaknesses in models. We have found out that even though recent models do not seem to exhibit a gender bias, they usually show important performance discrepancies by accent, and even more important ones depending on the socio-economic status of the speakers. Finally, all tested models show a strong performance drop when tested on conversational speech, and in this precise context even a language model trained on a dataset as big as Common Crawl does not seem to have significant positive effect which reiterates the importance of developing conversational language models


page 1

page 2

page 3

page 4


Politeness Stereotypes and Attack Vectors: Gender Stereotypes in Japanese and Korean Language Models

In efforts to keep up with the rapid progress and use of large language ...

WER we are and WER we think we are

Natural language processing of conversational speech requires the availa...

RedditBias: A Real-World Resource for Bias Evaluation and Debiasing of Conversational Language Models

Text representation models are prone to exhibit a range of societal bias...

CHBias: Bias Evaluation and Mitigation of Chinese Conversational Language Models

Warning: This paper contains content that may be offensive or upsetting....

Please sign up or login with your details

Forgot password? Click here to reset