CycleGAN-VC2: Improved CycleGAN-based Non-parallel Voice Conversion

04/09/2019
by   Takuhiro Kaneko, et al.
0

Non-parallel voice conversion (VC) is a technique for learning the mapping from source to target speech without relying on parallel data. This is an important task, but it has been challenging due to the disadvantages of the training conditions. Recently, CycleGAN-VC has provided a breakthrough and performed comparably to a parallel VC method without relying on any extra data, modules, or time alignment procedures. However, there is still a large gap between the real target and converted speech, and bridging this gap remains a challenge. To reduce this gap, we propose CycleGAN-VC2, which is an improved version of CycleGAN-VC incorporating three new techniques: an improved objective (two-step adversarial losses), improved generator (2-1-2D CNN), and improved discriminator (PatchGAN). We evaluated our method on a non-parallel VC task and analyzed the effect of each technique in detail. An objective evaluation showed that these techniques help bring the converted feature sequence closer to the target in terms of both global and local structures, which we assess by using Mel-cepstral distortion and modulation spectra distance, respectively. A subjective evaluation showed that CycleGAN-VC2 outperforms CycleGAN-VC in terms of naturalness and similarity for every speaker pair, including intra-gender and inter-gender pairs.

READ FULL TEXT

page 1

page 2

page 3

page 4

page 5

research
07/29/2019

StarGAN-VC2: Rethinking Conditional Methods for StarGAN-Based Voice Conversion

Non-parallel multi-domain voice conversion (VC) is a technique for learn...
research
11/30/2017

Parallel-Data-Free Voice Conversion Using Cycle-Consistent Adversarial Networks

We propose a parallel-data-free voice conversion (VC) method that can le...
research
10/22/2020

CycleGAN-VC3: Examining and Improving CycleGAN-VCs for Mel-spectrogram Conversion

Non-parallel voice conversion (VC) is a technique for learning mappings ...
research
06/06/2018

StarGAN-VC: Non-parallel many-to-many voice conversion with star generative adversarial networks

This paper proposes a method that allows for non-parallel many-to-many v...
research
04/05/2021

StarGAN-based Emotional Voice Conversion for Japanese Phrases

This paper shows that StarGAN-VC, a spectral envelope transformation met...
research
11/07/2019

Change your singer: a transfer learning generative adversarial framework for song to song conversion

Have you ever wondered how a song might sound if performed by a differen...
research
02/25/2021

MaskCycleGAN-VC: Learning Non-parallel Voice Conversion with Filling in Frames

Non-parallel voice conversion (VC) is a technique for training voice con...

Please sign up or login with your details

Forgot password? Click here to reset