StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for Natural-Sounding Voice Conversion

07/21/2021
by   Yinghao Aaron Li, et al.
0

We present an unsupervised non-parallel many-to-many voice conversion (VC) method using a generative adversarial network (GAN) called StarGAN v2. Using a combination of adversarial source classifier loss and perceptual loss, our model significantly outperforms previous VC models. Although our model is trained only with 20 English speakers, it generalizes to a variety of voice conversion tasks, such as any-to-many, cross-lingual, and singing conversion. Using a style encoder, our framework can also convert plain reading speech into stylistic speech, such as emotional and falsetto speech. Subjective and objective evaluation experiments on a non-parallel many-to-many voice conversion task revealed that our model produces natural sounding voices, close to the sound quality of state-of-the-art text-to-speech (TTS) based voice conversion methods without the need for text labels. Moreover, our model is completely convolutional and with a faster-than-real-time vocoder such as Parallel WaveGAN can perform real-time voice conversion.

READ FULL TEXT
research
11/02/2020

CVC: Contrastive Learning for Non-parallel Voice Conversion

Cycle consistent generative adversarial network (CycleGAN) and variation...
research
10/08/2019

MelGAN-VC: Voice Conversion and Audio Style Transfer on arbitrarily long samples using Spectrograms

Traditional voice conversion methods rely on parallel recordings of mult...
research
04/25/2021

An Adaptive Learning based Generative Adversarial Network for One-To-One Voice Conversion

Voice Conversion (VC) emerged as a significant domain of research in the...
research
05/31/2021

Emotional Voice Conversion: Theory, Databases and ESD

In this paper, we first provide a review of the state-of-the-art emotion...
research
01/13/2022

The Effectiveness of Time Stretching for Enhancing Dysarthric Speech for Improved Dysarthric Speech Recognition

In this paper, we investigate several existing and a new state-of-the-ar...
research
08/24/2023

Real-time Detection of AI-Generated Speech for DeepFake Voice Conversion

There are growing implications surrounding generative AI in the speech d...
research
04/04/2017

Voice Conversion from Unaligned Corpora using Variational Autoencoding Wasserstein Generative Adversarial Networks

Building a voice conversion (VC) system from non-parallel speech corpora...

Please sign up or login with your details

Forgot password? Click here to reset