Improve Cross-lingual Voice Cloning Using Low-quality Code-switched Data

10/14/2021
by   Haitong Zhang, et al.
0

Recently, sequence-to-sequence (seq-to-seq) models have been successfully applied in text-to-speech (TTS) to synthesize speech for single-language text. To synthesize speech for multiple languages usually requires multi-lingual speech from the target speaker. However, it is both laborious and expensive to collect high-quality multi-lingual TTS data for the target speakers. In this paper, we proposed to use low-quality code-switched found data from the non-target speakers to achieve cross-lingual voice cloning for the target speakers. Experiments show that our proposed method can generate high-quality code-switched speech in the target voices in terms of both naturalness and speaker consistency. More importantly, we find that our method can achieve a comparable result to the state-of-the-art (SOTA) performance in cross-lingual voice cloning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/14/2021

Revisiting IPA-based Cross-lingual Text-to-speech

International Phonetic Alphabet (IPA) has been widely used in cross-ling...
research
04/22/2021

Building Bilingual and Code-Switched Voice Conversion with Limited Training Data Using Embedding Consistency Loss

Building cross-lingual voice conversion (VC) systems for multiple speake...
research
08/17/2021

Combining speakers of multiple languages to improve quality of neural voices

In this work, we explore multiple architectures and training procedures ...
research
11/17/2021

Cross-lingual Low Resource Speaker Adaptation Using Phonological Features

The idea of using phonological features instead of phonemes as input to ...
research
10/08/2020

Latent linguistic embedding for cross-lingual text-to-speech and voice conversion

As the recently proposed voice cloning system, NAUTILUS, is capable of c...
research
02/03/2021

Towards Natural and Controllable Cross-Lingual Voice Conversion Based on Neural TTS Model and Phonetic Posteriorgram

Cross-lingual voice conversion (VC) is an important and challenging prob...
research
09/15/2023

Cross-lingual Knowledge Distillation via Flow-based Voice Conversion for Robust Polyglot Text-To-Speech

In this work, we introduce a framework for cross-lingual speech synthesi...

Please sign up or login with your details

Forgot password? Click here to reset