Text-To-Speech Data Augmentation for Low Resource Speech Recognition

04/01/2022
by   Rodolfo Zevallos, et al.
0

Nowadays, the main problem of deep learning techniques used in the development of automatic speech recognition (ASR) models is the lack of transcribed data. The goal of this research is to propose a new data augmentation method to improve ASR models for agglutinative and low-resource languages. This novel data augmentation method generates both synthetic text and synthetic audio. Some experiments were conducted using the corpus of the Quechua language, which is an agglutinative and low-resource language. In this study, a sequence-to-sequence (seq2seq) model was applied to generate synthetic text, in addition to generating synthetic speech using a text-to-speech (TTS) model for Quechua. The results show that the new data augmentation method works well to improve the ASR model for Quechua. In this research, an 8.73 improvement in the word-error-rate (WER) of the ASR model is obtained using a combination of synthetic text and synthetic speech.

READ FULL TEXT
research
07/14/2022

Data Augmentation for Low-Resource Quechua ASR Improvement

Automatic Speech Recognition (ASR) is a key element in new services that...
research
02/25/2021

MixSpeech: Data Augmentation for Low-resource Automatic Speech Recognition

In this paper, we propose MixSpeech, a simple yet effective data augment...
research
07/20/2022

When Is TTS Augmentation Through a Pivot Language Useful?

Developing Automatic Speech Recognition (ASR) for low-resource languages...
research
12/19/2019

Generating Synthetic Audio Data for Attention-Based Speech Recognition Systems

Recent advances in text-to-speech (TTS) led to the development of flexib...
research
10/23/2019

Analyzing ASR pretraining for low-resource speech-to-text translation

Previous work has shown that for low-resource source languages, automati...
research
12/10/2018

Low Resource Multi-modal Data Augmentation for End-to-end ASR

We explore training attention-based encoder-decoder ASR for low-resource...
research
11/19/2021

Semi-supervised transfer learning for language expansion of end-to-end speech recognition models to low-resource languages

In this paper, we propose a three-stage training methodology to improve ...

Please sign up or login with your details

Forgot password? Click here to reset