Voice Conversion Using Cycle-Consistent Variational Autoencoder

09/15/2019
by   Keonnyeong Lee, et al.
0

One of the most critical obstacles in voice conversion is the requirement of parallel training data, which contain the same linguistic content utterances spoken by different speakers. Collecting such parallel data is highly expensive process, therefore many works attempted to use non-parallel training data for voice conversion. One of such successful approaches is using cycle-consistent adversarial networks (CycleGAN), which utilize the cycle consistency loss. The major drawback of CycleGAN based methods, however, is that they can handle only one-to-one voice conversion from a source speaker to a target speaker, which makes it difficult to use for general-purpose cases requiring many-to-many voice conversion among multiple speakers. Another group of approaches using variational autoencoder (VAE) can handle many-to-many voice conversion, but their sound qualities are much lower than that of CycleGAN based methods. In this paper, we propose to use a cycle consistency loss for VAE to improve the sound quality of the conventional VAE based methods for many-to-many voice conversion.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/15/2020

Many-to-Many Voice Conversion using Conditional Cycle-Consistent Adversarial Networks

Voice conversion (VC) refers to transforming the speaker characteristics...
research
11/02/2020

CVC: Contrastive Learning for Non-parallel Voice Conversion

Cycle consistent generative adversarial network (CycleGAN) and variation...
research
11/27/2018

Refined WaveNet Vocoder for Variational Autoencoder Based Voice Conversion

This paper presents a refinement framework of WaveNet vocoders for varia...
research
03/04/2021

crank: An Open-Source Software for Nonparallel Voice Conversion Based on Vector-Quantized Variational Autoencoder

In this paper, we present an open-source software for developing a nonpa...
research
08/09/2018

Rhythm-Flexible Voice Conversion without Parallel Data Using Cycle-GAN over Phoneme Posteriorgram Sequences

Speaking rate refers to the average number of phonemes within some unit ...
research
06/09/2022

Speak Like a Dog: Human to Non-human creature Voice Conversion

This paper proposes a new voice conversion (VC) task from human speech t...
research
04/08/2022

Enhanced exemplar autoencoder with cycle consistency loss in any-to-one voice conversion

Recent research showed that an autoencoder trained with speech of a sing...

Please sign up or login with your details

Forgot password? Click here to reset