Voice Conversion Challenge 2020: Intra-lingual semi-parallel and cross-lingual voice conversion

by   Yi Zhao, et al.

The voice conversion challenge is a bi-annual scientific event held to compare and understand different voice conversion (VC) systems built on a common dataset. In 2020, we organized the third edition of the challenge and constructed and distributed a new database for two tasks, intra-lingual semi-parallel and cross-lingual VC. After a two-month challenge period, we received 33 submissions, including 3 baselines built on the database. From the results of crowd-sourced listening tests, we observed that VC methods have progressed rapidly thanks to advanced deep learning methods. In particular, speaker similarity scores of several systems turned out to be as high as target speakers in the intra-lingual semi-parallel VC task. However, we confirmed that none of them have achieved human-level naturalness yet for the same task. The cross-lingual conversion task is, as expected, a more difficult task, and the overall naturalness and similarity scores were lower than those for the intra-lingual conversion task. However, we observed encouraging results, and the MOS scores of the best systems were higher than 4.0. We also show a few additional analysis results to aid in understanding cross-lingual VC better.



There are no comments yet.


page 7

page 9

page 11

page 18

page 19


Transfer Learning from Monolingual ASR to Transcription-free Cross-lingual Voice Conversion

Cross-lingual voice conversion (VC) is a task that aims to synthesize ta...

The Academia Sinica Systems of Voice Conversion for VCC2020

This paper describes the Academia Sinica systems for the two tasks of Vo...

Predictions of Subjective Ratings and Spoofing Assessments of Voice Conversion Challenge 2020 Submissions

The Voice Conversion Challenge 2020 is the third edition under its flags...

S3PRL-VC: Open-source Voice Conversion Framework with Self-supervised Speech Representations

This paper introduces S3PRL-VC, an open-source voice conversion (VC) fra...

Latent linguistic embedding for cross-lingual text-to-speech and voice conversion

As the recently proposed voice cloning system, NAUTILUS, is capable of c...

a novel cross-lingual voice cloning approach with a few text-free samples

In this paper, we present a cross-lingual voice cloning approach. BN fea...

Spectrum and Prosody Conversion for Cross-lingual Voice Conversion with CycleGAN

Cross-lingual voice conversion aims to change source speaker's voice to ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.