Learning to Speak Fluently in a Foreign Language: Multilingual Speech Synthesis and Cross-Language Voice Cloning

07/09/2019
by   Yu Zhang, et al.
0

We present a multispeaker, multilingual text-to-speech (TTS) synthesis model based on Tacotron that is able to produce high quality speech in multiple languages. Moreover, the model is able to transfer voices across languages, e.g. synthesize fluent Spanish speech using an English speaker's voice, without training on any bilingual or parallel examples. Such transfer works across distantly related languages, e.g. English and Mandarin. Critical to achieving this result are: 1. using a phonemic input representation to encourage sharing of model capacity across languages, and 2. incorporating an adversarial loss term to encourage the model to disentangle its representation of speaker identity (which is perfectly correlated with language in the training data) from the speech content. Further scaling up the model by training on multiple speakers of each language, and incorporating an autoencoding input to help stabilize attention during training, results in a model which can be used to consistently synthesize intelligible speech for training speakers in all languages seen during training, and in native or foreign accents.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/19/2023

MParrotTTS: Multilingual Multi-speaker Text to Speech Synthesis in Low Resource Setting

We present MParrotTTS, a unified multilingual, multi-speaker text-to-spe...
research
02/06/2019

Unsupervised Polyglot Text To Speech

We present a TTS neural network that is able to produce speech in multip...
research
11/23/2018

Learning pronunciation from a foreign language in speech synthesis networks

Although there are more than 65,000 languages in the world, the pronunci...
research
07/04/2022

Mix and Match: An Empirical Study on Training Corpus Composition for Polyglot Text-To-Speech (TTS)

Training multilingual Neural Text-To-Speech (NTTS) models using only mon...
research
09/27/2022

Multilingual analysis of intelligibility classification using English, Korean, and Tamil dysarthric speech datasets

This paper analyzes dysarthric speech datasets from three languages with...
research
08/21/2022

Visualising Model Training via Vowel Space for Text-To-Speech Systems

With the recent developments in speech synthesis via machine learning, t...
research
01/24/2023

Multilingual Multiaccented Multispeaker TTS with RADTTS

We work to create a multilingual speech synthesis system which can gener...

Please sign up or login with your details

Forgot password? Click here to reset