Combining speakers of multiple languages to improve quality of neural voices

08/17/2021
by   Javier Latorre, et al.
0

In this work, we explore multiple architectures and training procedures for developing a multi-speaker and multi-lingual neural TTS system with the goals of a) improving the quality when the available data in the target language is limited and b) enabling cross-lingual synthesis. We report results from a large experiment using 30 speakers in 8 different languages across 15 different locales. The system is trained on the same amount of data per speaker. Compared to a single-speaker model, when the suggested system is fine tuned to a speaker, it produces significantly better quality in most of the cases while it only uses less than 40% of the speaker's data used to build the single-speaker model. In cross-lingual synthesis, on average, the generated quality is within 80% of native single-speaker models, in terms of Mean Opinion Score.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/14/2021

Improve Cross-lingual Voice Cloning Using Low-quality Code-switched Data

Recently, sequence-to-sequence (seq-to-seq) models have been successfull...
research
10/14/2021

Revisiting IPA-based Cross-lingual Text-to-speech

International Phonetic Alphabet (IPA) has been widely used in cross-ling...
research
05/21/2020

Cross-lingual Multispeaker Text-to-Speech under Limited-Data Scenario

Modeling voices for multiple speakers and multiple languages in one text...
research
01/20/2022

Cross-Lingual Text-to-Speech Using Multi-Task Learning and Speaker Classifier Joint Training

In cross-lingual speech synthesis, the speech in various languages can b...
research
12/13/2016

Performance Improvements of Probabilistic Transcript-adapted ASR with Recurrent Neural Network and Language-specific Constraints

Mismatched transcriptions have been proposed as a mean to acquire probab...
research
10/09/2020

Investigating Cross-Linguistic Adjective Ordering Tendencies with a Latent-Variable Model

Across languages, multiple consecutive adjectives modifying a noun (e.g....
research
03/31/2022

Data-augmented cross-lingual synthesis in a teacher-student framework

Cross-lingual synthesis can be defined as the task of letting a speaker ...

Please sign up or login with your details

Forgot password? Click here to reset