Cross-Lingual Text-to-Speech Using Multi-Task Learning and Speaker Classifier Joint Training

01/20/2022
by   J. Yang, et al.
0

In cross-lingual speech synthesis, the speech in various languages can be synthesized for a monoglot speaker. Normally, only the data of monoglot speakers are available for model training, thus the speaker similarity is relatively low between the synthesized cross-lingual speech and the native language recordings. Based on the multilingual transformer text-to-speech model, this paper studies a multi-task learning framework to improve the cross-lingual speaker similarity. To further improve the speaker similarity, joint training with a speaker classifier is proposed. Here, a scheme similar to parallel scheduled sampling is proposed to train the transformer model efficiently to avoid breaking the parallel training mechanism when introducing joint training. By using multi-task learning and speaker classifier joint training, in subjective and objective evaluations, the cross-lingual speaker similarity can be consistently improved for both the seen and unseen speakers in the training set.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/22/2022

Improving Cross-lingual Speech Synthesis with Triplet Training Scheme

Recent advances in cross-lingual text-to-speech (TTS) made it possible t...
research
11/07/2022

ERNIE-SAT: Speech and Text Joint Pretraining for Cross-Lingual Multi-Speaker Text-to-Speech

Speech representation learning has improved both speech understanding an...
research
04/25/2023

Multi-Speaker Multi-Lingual VQTTS System for LIMMITS 2023 Challenge

In this paper, we describe the systems developed by the SJTU X-LANCE tea...
research
08/17/2021

Combining speakers of multiple languages to improve quality of neural voices

In this work, we explore multiple architectures and training procedures ...
research
11/17/2021

Cross-lingual Low Resource Speaker Adaptation Using Phonological Features

The idea of using phonological features instead of phonemes as input to ...
research
06/24/2022

SANE-TTS: Stable And Natural End-to-End Multilingual Text-to-Speech

In this paper, we present SANE-TTS, a stable and natural end-to-end mult...
research
08/03/2020

One Model, Many Languages: Meta-learning for Multilingual Text-to-Speech

We introduce an approach to multilingual speech synthesis which uses the...

Please sign up or login with your details

Forgot password? Click here to reset