Making More of Little Data: Improving Low-Resource Automatic Speech Recognition Using Data Augmentation

05/18/2023
by   Martijn Bartelds, et al.
4

The performance of automatic speech recognition (ASR) systems has advanced substantially in recent years, particularly for languages for which a large amount of transcribed speech is available. Unfortunately, for low-resource languages, such as minority languages, regional languages or dialects, ASR performance generally remains much lower. In this study, we investigate whether data augmentation techniques could help improve low-resource ASR performance, focusing on four typologically diverse minority languages or language variants (West Germanic: Gronings, West-Frisian; Malayo-Polynesian: Besemah, Nasal). For all four languages, we examine the use of self-training, where an ASR system trained with the available human-transcribed data is used to generate transcriptions, which are then combined with the original data to train a new ASR system. For Gronings, for which there was a pre-existing text-to-speech (TTS) system available, we also examined the use of TTS to generate ASR training data from text-only sources. We find that using a self-training approach consistently yields improved performance (a relative WER reduction up to 20.5 transcribed speech). The performance gain from TTS augmentation for Gronings was even stronger (up to 25.5 based on 24 minutes of manually transcribed speech). In sum, our results show the benefit of using self-training or (if possible) TTS-generated data as an efficient solution to overcome the limitations of data availability for resource-scarce languages in order to improve ASR performance.

READ FULL TEXT

page 6

page 7

page 8

page 12

page 13

research
07/14/2022

Data Augmentation for Low-Resource Quechua ASR Improvement

Automatic Speech Recognition (ASR) is a key element in new services that...
research
06/09/2021

Unsupervised Automatic Speech Recognition: A Review

Automatic Speech Recognition (ASR) systems can be trained to achieve rem...
research
05/12/2023

Investigating the Sensitivity of Automatic Speech Recognition Systems to Phonetic Variation in L2 Englishes

Automatic Speech Recognition (ASR) systems exhibit the best performance ...
research
07/24/2023

Code-Switched Urdu ASR for Noisy Telephonic Environment using Data Centric Approach with Hybrid HMM and CNN-TDNN

Call Centers have huge amount of audio data which can be used for achiev...
research
04/21/2021

Disfluency Detection with Unlabeled Data and Small BERT Models

Disfluency detection models now approach high accuracy on English text. ...
research
10/23/2019

Analyzing ASR pretraining for low-resource speech-to-text translation

Previous work has shown that for low-resource source languages, automati...
research
08/09/2020

LRSpeech: Extremely Low-Resource Speech Synthesis and Recognition

Speech synthesis (text to speech, TTS) and recognition (automatic speech...

Please sign up or login with your details

Forgot password? Click here to reset