Cross-lingual Multispeaker Text-to-Speech under Limited-Data Scenario

05/21/2020
by   Zexin Cai, et al.
0

Modeling voices for multiple speakers and multiple languages in one text-to-speech system has been a challenge for a long time. This paper presents an extension on Tacotron2 to achieve bilingual multispeaker speech synthesis when there are limited data for each language. We achieve cross-lingual synthesis, including code-switching cases, between English and Mandarin for monolingual speakers. The two languages share the same phonemic representations for input, while the language attribute and the speaker identity are independently controlled by language tokens and speaker embeddings, respectively. In addition, we investigate the model's performance on the cross-lingual synthesis, with and without a bilingual dataset during training. With the bilingual dataset, not only can the model generate high-fidelity speech for all speakers concerning the language they speak, but also can generate accented, yet fluent and intelligible speech for monolingual speakers regarding non-native language. For example, the Mandarin speaker can speak English fluently. Furthermore, the model trained with bilingual dataset is robust for code-switching text-to-speech, as shown in our results and provided samples.https://caizexin.github.io/mlms-syn-samples/index.html.

READ FULL TEXT
research
10/14/2021

Revisiting IPA-based Cross-lingual Text-to-speech

International Phonetic Alphabet (IPA) has been widely used in cross-ling...
research
11/03/2020

Towards Code-switched Classification Exploiting Constituent Language Resources

Code-switching is a commonly observed communicative phenomenon denoting ...
research
11/17/2022

Towards Building Text-To-Speech Systems for the Next Billion Users

Deep learning based text-to-speech (TTS) systems have been evolving rapi...
research
10/20/2022

Text Enhancement for Paragraph Processing in End-to-End Code-switching TTS

Current end-to-end code-switching Text-to-Speech (TTS) can already gener...
research
11/23/2018

Learning pronunciation from a foreign language in speech synthesis networks

Although there are more than 65,000 languages in the world, the pronunci...
research
05/17/2023

Empirical Analysis of Oral and Nasal Vowels of Konkani

Konkani is a highly nasalised language which makes it unique among Indo-...
research
08/17/2021

Combining speakers of multiple languages to improve quality of neural voices

In this work, we explore multiple architectures and training procedures ...

Please sign up or login with your details

Forgot password? Click here to reset