Data Augmentation for Speech Recognition in Maltese: A Low-Resource Perspective

11/15/2021
by   Carlos Mena, et al.
0

Developing speech technologies is a challenge for low-resource languages for which both annotated and raw speech data is sparse. Maltese is one such language. Recent years have seen an increased interest in the computational processing of Maltese, including speech technologies, but resources for the latter remain sparse. In this paper, we consider data augmentation techniques for improving speech recognition for such languages, focusing on Maltese as a test case. We consider three different types of data augmentation: unsupervised training, multilingual training and the use of synthesized speech as training data. The goal is to determine which of these techniques, or combination of them, is the most effective to improve speech recognition for languages where the starting point is a small corpus of approximately 7 hours of transcribed speech. Our results show that combining the three data augmentation techniques studied here lead us to an absolute WER improvement of 15 language model.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/04/2021

Voice Conversion Can Improve ASR in Very Low-Resource Settings

Voice conversion (VC) has been proposed to improve speech recognition sy...
research
12/23/2020

Speech Synthesis as Augmentation for Low-Resource ASR

Speech synthesis might hold the key to low-resource speech recognition. ...
research
10/08/2020

Population Based Training for Data Augmentation and Regularization in Speech Recognition

Varying data augmentation policies and regularization over the course of...
research
06/16/2023

Low-Resource Text-to-Speech Using Specific Data and Noise Augmentation

Many neural text-to-speech architectures can synthesize nearly natural s...
research
12/03/2020

Adapt-and-Adjust: Overcoming the Long-Tail Problem of Multilingual Speech Recognition

One crucial challenge of real-world multilingual speech recognition is t...
research
11/24/2020

Synth2Aug: Cross-domain speaker recognition with TTS synthesized speech

In recent years, Text-To-Speech (TTS) has been used as a data augmentati...

Please sign up or login with your details

Forgot password? Click here to reset