a novel cross-lingual voice cloning approach with a few text-free samples

10/29/2019
by   Xinyong Zhou, et al.
0

In this paper, we present a cross-lingual voice cloning approach. BN features obtained by SI-ASR model are used as a bridge across speakers and language boundaries. The relationships between text and BN features are modeled by the latent prosody model. The acoustic model learns the translation from BN features to acoustic features. The acoustic model is fine-tuned with a few samples of the target speaker to realize voice cloning. This system can generate speech of arbitrary utterance of target language in cross-lingual speakers' voice. We verify that with small amount of audio data, our proposed approach can well handle cross-lingual tasks. And in intra-lingual tasks, our proposed approach also performs better than baseline approach in naturalness and similarity.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/31/2022

Cross-lingual Text-To-Speech with Flow-based Voice Conversion for Improved Pronunciation

This paper presents a method for end-to-end cross-lingual text-to-speech...
research
10/08/2020

Latent linguistic embedding for cross-lingual text-to-speech and voice conversion

As the recently proposed voice cloning system, NAUTILUS, is capable of c...
research
11/12/2021

Deciphering Speech: a Zero-Resource Approach to Cross-Lingual Transfer in ASR

We present a method for cross-lingual training an ASR system using absol...
research
12/28/2020

Building Multi lingual TTS using Cross Lingual Voice Conversion

In this paper we propose a new cross-lingual Voice Conversion (VC) appro...
research
09/15/2023

Cross-lingual Knowledge Distillation via Flow-based Voice Conversion for Robust Polyglot Text-To-Speech

In this work, we introduce a framework for cross-lingual speech synthesi...
research
08/28/2020

Voice Conversion Challenge 2020: Intra-lingual semi-parallel and cross-lingual voice conversion

The voice conversion challenge is a bi-annual scientific event held to c...
research
07/04/2022

GlowVC: Mel-spectrogram space disentangling model for language-independent text-free voice conversion

In this paper, we propose GlowVC: a multilingual multi-speaker flow-base...

Please sign up or login with your details

Forgot password? Click here to reset