End-to-end Text-to-speech for Low-resource Languages by Cross-Lingual Transfer Learning

04/13/2019
by   Tao Tu, et al.
0

End-to-end text-to-speech (TTS) has shown great success on large quantities of paired text plus speech data. However, laborious data collection remains difficult for at least 95 development of TTS in different languages. In this paper, we aim to build TTS systems for such low-resource (target) languages where only very limited paired data are available. We show such TTS can be effectively constructed by transferring knowledge from a high-resource (source) language. Since the model trained on source language cannot be directly applied to target language due to input space mismatch, we propose a method to learn a mapping between source and target linguistic symbols. Benefiting from this learned mapping, pronunciation information can be preserved throughout the transferring procedure. Preliminary experiments show that we only need around 15 minutes of paired data to obtain a relatively good TTS system. Furthermore, analytic studies demonstrated that the automatically discovered mapping correlate well with the phonetic expertise.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/14/2023

Learning Cross-lingual Mappings for Data Augmentation to Improve Low-Resource Speech Recognition

Exploiting cross-lingual resources is an effective way to compensate for...
research
11/02/2021

Cross-lingual Transfer for Speech Processing using Acoustic Language Similarity

Speech processing systems currently do not support the vast majority of ...
research
10/19/2021

Neural Lexicon Reader: Reduce Pronunciation Errors in End-to-end TTS by Leveraging External Textual Knowledge

End-to-end TTS suffers from high data requirements as it is difficult fo...
research
04/21/2018

Massively Parallel Cross-Lingual Learning in Low-Resource Target Language Translation

We work on translation from rich-resource languages to low-resource lang...
research
12/06/2022

Learning the joint distribution of two sequences using little or no paired data

We present a noisy channel generative model of two sequences, for exampl...
research
07/07/2022

Non-Linear Pairwise Language Mappings for Low-Resource Multilingual Acoustic Model Fusion

Multilingual speech recognition has drawn significant attention as an ef...
research
12/19/2017

Cross-language Framework for Word Recognition and Spotting of Indic Scripts

Handwritten word recognition and spotting of low-resource scripts are di...

Please sign up or login with your details

Forgot password? Click here to reset