Building Bilingual and Code-Switched Voice Conversion with Limited Training Data Using Embedding Consistency Loss

04/22/2021
by   Yaogen Yang, et al.
0

Building cross-lingual voice conversion (VC) systems for multiple speakers and multiple languages has been a challenging task for a long time. This paper describes a parallel non-autoregressive network to achieve bilingual and code-switched voice conversion for multiple speakers when there are only mono-lingual corpora for each language. We achieve cross-lingual VC between Mandarin speech with multiple speakers and English speech with multiple speakers by applying bilingual bottleneck features. To boost voice cloning performance, we use an adversarial speaker classifier with a gradient reversal layer to reduce the source speaker's information from the output of encoder. Furthermore, in order to improve speaker similarity between reference speech and converted speech, we adopt an embedding consistency loss between the synthesized speech and its natural reference speech in our network. Experimental results show that our proposed method can achieve high quality converted speech with mean opinion score (MOS) around 4. The conversion system performs well in terms of speaker similarity for both in-set speaker conversion and out-set-of one-shot conversion.

READ FULL TEXT
research
10/14/2021

Improve Cross-lingual Voice Cloning Using Low-quality Code-switched Data

Recently, sequence-to-sequence (seq-to-seq) models have been successfull...
research
02/03/2021

Towards Natural and Controllable Cross-Lingual Voice Conversion Based on Neural TTS Model and Phonetic Posteriorgram

Cross-lingual voice conversion (VC) is an important and challenging prob...
research
10/16/2020

Towards Natural Bilingual and Code-Switched Speech Synthesis Based on Mix of Monolingual Recordings and Cross-Lingual Voice Conversion

Recent state-of-the-art neural text-to-speech (TTS) synthesis models hav...
research
09/09/2019

DNN-based cross-lingual voice conversion using Bottleneck Features

Cross-lingual voice conversion (CLVC) is a quite challenging task since ...
research
10/08/2020

FastVC: Fast Voice Conversion with non-parallel data

This paper introduces FastVC, an end-to-end model for fast Voice Convers...
research
08/15/2018

Investigation of Using Disentangled and Interpretable Representations for One-shot Cross-lingual Voice Conversion

We study the problem of cross-lingual voice conversion in non-parallel s...
research
02/06/2019

Unsupervised Polyglot Text To Speech

We present a TTS neural network that is able to produce speech in multip...

Please sign up or login with your details

Forgot password? Click here to reset