S3PRL-VC: Open-source Voice Conversion Framework with Self-supervised Speech Representations

10/12/2021
by   Wen-Chin Huang, et al.
0

This paper introduces S3PRL-VC, an open-source voice conversion (VC) framework based on the S3PRL toolkit. In the context of recognition-synthesis VC, self-supervised speech representation (S3R) is valuable in its potential to replace the expensive supervised representation adopted by state-of-the-art VC systems. Moreover, we claim that VC is a good probing task for S3R analysis. In this work, we provide a series of in-depth analyses by benchmarking on the two tasks in VCC2020, namely intra-/cross-lingual any-to-one (A2O) VC, as well as an any-to-any (A2A) setting. We also provide comparisons between not only different S3Rs but also top systems in VCC2020 with supervised representations. Systematic objective and subjective evaluation were conducted, and we show that S3R is comparable with VCC2020 top systems in the A2O setting in terms of similarity, and achieves state-of-the-art in S3R-based A2A VC. We believe the extensive analysis, as well as the toolkit itself, contribute to not only the S3R community but also the VC community. The codebase is now open-sourced.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/10/2022

A Comparative Study of Self-supervised Speech Representation Based Voice Conversion

We present a large-scale comparative study of self-supervised speech rep...
research
11/30/2022

EURO: ESPnet Unsupervised ASR Open-source Toolkit

This paper describes the ESPnet Unsupervised ASR Open-source Toolkit (EU...
research
05/16/2023

Adversarial Speaker Disentanglement Using Unannotated External Data for Self-supervised Representation Based Voice Conversion

Nowadays, recognition-synthesis-based methods have been quite popular wi...
research
10/25/2022

Disentangled Speech Representation Learning for One-Shot Cross-lingual Voice Conversion Using β-VAE

We propose an unsupervised learning method to disentangle speech into co...
research
12/20/2021

The PUEVA Inventory: A Toolkit to Evaluate the Personality, Usability and Enjoyability of Voice Agents

The proliferation of voice agents in consumer devices requires new tools...
research
04/11/2018

VoroTop: Voronoi Cell Topology Visualization and Analysis Toolkit

This paper introduces a new open-source software program called VoroTop,...
research
09/14/2023

VoicePAT: An Efficient Open-source Evaluation Toolkit for Voice Privacy Research

Speaker anonymization is the task of modifying a speech recording such t...

Please sign up or login with your details

Forgot password? Click here to reset