Disentangled Speech Representation Learning for One-Shot Cross-lingual Voice Conversion Using β-VAE

10/25/2022
by   Hui Lu, et al.
0

We propose an unsupervised learning method to disentangle speech into content representation and speaker identity representation. We apply this method to the challenging one-shot cross-lingual voice conversion task to demonstrate the effectiveness of the disentanglement. Inspired by β-VAE, we introduce a learning objective that balances between the information captured by the content and speaker representations. In addition, the inductive biases from the architectural design and the training dataset further encourage the desired disentanglement. Both objective and subjective evaluations show the effectiveness of the proposed method in speech disentanglement and in one-shot cross-lingual voice conversion.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/15/2018

Investigation of Using Disentangled and Interpretable Representations for One-shot Cross-lingual Voice Conversion

We study the problem of cross-lingual voice conversion in non-parallel s...
research
10/31/2022

Cross-lingual Text-To-Speech with Flow-based Voice Conversion for Improved Pronunciation

This paper presents a method for end-to-end cross-lingual text-to-speech...
research
07/10/2022

A Comparative Study of Self-supervised Speech Representation Based Voice Conversion

We present a large-scale comparative study of self-supervised speech rep...
research
08/18/2022

Speech Representation Disentanglement with Adversarial Mutual Information Learning for One-shot Voice Conversion

One-shot voice conversion (VC) with only a single target speaker's speec...
research
09/08/2020

Predictions of Subjective Ratings and Spoofing Assessments of Voice Conversion Challenge 2020 Submissions

The Voice Conversion Challenge 2020 is the third edition under its flags...
research
10/12/2021

S3PRL-VC: Open-source Voice Conversion Framework with Self-supervised Speech Representations

This paper introduces S3PRL-VC, an open-source voice conversion (VC) fra...
research
02/03/2021

Towards Natural and Controllable Cross-Lingual Voice Conversion Based on Neural TTS Model and Phonetic Posteriorgram

Cross-lingual voice conversion (VC) is an important and challenging prob...

Please sign up or login with your details

Forgot password? Click here to reset