AfriNames: Most ASR models "butcher" African Names

06/01/2023
by   Tobi Olatunji, et al.
3

Useful conversational agents must accurately capture named entities to minimize error for downstream tasks, for example, asking a voice assistant to play a track from a certain artist, initiating navigation to a specific location, or documenting a laboratory result for a patient. However, where named entities such as “Ukachukwu“ (Igbo), “Lakicia“ (Swahili), or “Ingabire“ (Rwandan) are spoken, automatic speech recognition (ASR) models' performance degrades significantly, propagating errors to downstream systems. We model this problem as a distribution shift and demonstrate that such model bias can be mitigated through multilingual pre-training, intelligent data augmentation strategies to increase the representation of African-named entities, and fine-tuning multilingual ASR models on multiple African accents. The resulting fine-tuned models show an 81.5% relative WER improvement compared with the baseline on samples with African-named entities.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/31/2022

How Does Pre-trained Wav2Vec2.0 Perform on Domain Shifted ASR? An Extensive Benchmark on Air Traffic Control Communications

Recent work on self-supervised pre-training focus on leveraging large-sc...
research
03/02/2023

Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages

We introduce the Universal Speech Model (USM), a single large model that...
research
04/09/2020

Improving Readability for Automatic Speech Recognition Transcription

Modern Automatic Speech Recognition (ASR) systems can achieve high perfo...
research
05/11/2020

Incremental Learning for End-to-End Automatic Speech Recognition

We propose an incremental learning for end-to-end Automatic Speech Recog...
research
05/13/2022

Who Are We Talking About? Handling Person Names in Speech Translation

Recent work has shown that systems for speech translation (ST) – similar...
research
06/26/2018

Contextual ASR Adaptation for Conversational Agents

Statistical language models (LM) play a key role in Automatic Speech Rec...
research
05/22/2023

CopyNE: Better Contextual ASR by Copying Named Entities

Recent years have seen remarkable progress in automatic speech recogniti...

Please sign up or login with your details

Forgot password? Click here to reset