LRSpeech: Extremely Low-Resource Speech Synthesis and Recognition

08/09/2020
by   Jin Xu, et al.
0

Speech synthesis (text to speech, TTS) and recognition (automatic speech recognition, ASR) are important speech tasks, and require a large amount of text and speech pairs for model training. However, there are more than 6,000 languages in the world and most languages are lack of speech training data, which poses significant challenges when building TTS and ASR systems for extremely low-resource languages. In this paper, we develop LRSpeech, a TTS and ASR system under the extremely low-resource setting, which can support rare languages with low data cost. LRSpeech consists of three key techniques: 1) pre-training on rich-resource languages and fine-tuning on low-resource languages; 2) dual transformation between TTS and ASR to iteratively boost the accuracy of each other; 3) knowledge distillation to customize the TTS model on a high-quality target-speaker voice and improve the ASR model on multiple voices. We conduct experiments on an experimental language (English) and a truly low-resource language (Lithuanian) to verify the effectiveness of LRSpeech. Experimental results show that LRSpeech 1) achieves high quality for TTS in terms of both intelligibility (more than 98 naturalness (above 3.5 mean opinion score (MOS)) of the synthesized speech, which satisfy the requirements for industrial deployment, 2) achieves promising recognition accuracy for ASR, and 3) last but not least, uses extremely low-resource training data. We also conduct comprehensive analyses on LRSpeech with different amounts of data resources, and provide valuable insights and guidances for industrial deployment. We are currently deploying LRSpeech into a commercialized cloud speech service to support TTS on more rare languages.

READ FULL TEXT
research
11/04/2021

Voice Conversion Can Improve ASR in Very Low-Resource Settings

Voice conversion (VC) has been proposed to improve speech recognition sy...
research
02/05/2023

MAC: A unified framework boosting low resource automatic speech recognition

We propose a unified framework for low resource automatic speech recogni...
research
05/18/2023

Making More of Little Data: Improving Low-Resource Automatic Speech Recognition Using Data Augmentation

The performance of automatic speech recognition (ASR) systems has advanc...
research
05/13/2019

Almost Unsupervised Text to Speech and Automatic Speech Recognition

Text to speech (TTS) and automatic speech recognition (ASR) are two dual...
research
02/13/2023

Fast and small footprint Hybrid HMM-HiFiGAN based system for speech synthesis in Indian languages

Hidden-Markov-model (HMM) based text-to-speech (HTS) offers flexibility ...
research
06/01/2017

Using of heterogeneous corpora for training of an ASR system

The paper summarizes the development of the LVCSR system built as a part...
research
10/06/2021

Integrating Categorical Features in End-to-End ASR

All-neural, end-to-end ASR systems gained rapid interest from the speech...

Please sign up or login with your details

Forgot password? Click here to reset