A Comparative Study of Self-supervised Speech Representation Based Voice Conversion

07/10/2022
by   Wen-Chin Huang, et al.
0

We present a large-scale comparative study of self-supervised speech representation (S3R)-based voice conversion (VC). In the context of recognition-synthesis VC, S3Rs are attractive owing to their potential to replace expensive supervised representations such as phonetic posteriorgrams (PPGs), which are commonly adopted by state-of-the-art VC systems. Using S3PRL-VC, an open-source VC software we previously developed, we provide a series of in-depth objective and subjective analyses under three VC settings: intra-/cross-lingual any-to-one (A2O) and any-to-any (A2A) VC, using the voice conversion challenge 2020 (VCC2020) dataset. We investigated S3R-based VC in various aspects, including model type, multilinguality, and supervision. We also studied the effect of a post-discretization process with k-means clustering and showed how it improves in the A2A setting. Finally, the comparison with state-of-the-art VC systems demonstrates the competitiveness of S3R-based VC and also sheds light on the possible improving directions.

READ FULL TEXT
research
10/12/2021

S3PRL-VC: Open-source Voice Conversion Framework with Self-supervised Speech Representations

This paper introduces S3PRL-VC, an open-source voice conversion (VC) fra...
research
10/25/2022

Disentangled Speech Representation Learning for One-Shot Cross-lingual Voice Conversion Using β-VAE

We propose an unsupervised learning method to disentangle speech into co...
research
05/30/2023

Voice Conversion With Just Nearest Neighbors

Any-to-any voice conversion aims to transform source speech into a targe...
research
12/28/2020

Building Multi lingual TTS using Cross Lingual Voice Conversion

In this paper we propose a new cross-lingual Voice Conversion (VC) appro...
research
10/06/2020

The Academia Sinica Systems of Voice Conversion for VCC2020

This paper describes the Academia Sinica systems for the two tasks of Vo...
research
09/08/2020

Predictions of Subjective Ratings and Spoofing Assessments of Voice Conversion Challenge 2020 Submissions

The Voice Conversion Challenge 2020 is the third edition under its flags...
research
06/07/2020

Parametric Representation for Singing Voice Synthesis: a Comparative Evaluation

Various parametric representations have been proposed to model the speec...

Please sign up or login with your details

Forgot password? Click here to reset