CycleGAN-VC3: Examining and Improving CycleGAN-VCs for Mel-spectrogram Conversion

10/22/2020
by   Takuhiro Kaneko, et al.
0

Non-parallel voice conversion (VC) is a technique for learning mappings between source and target speeches without using a parallel corpus. Recently, cycle-consistent adversarial network (CycleGAN)-VC and CycleGAN-VC2 have shown promising results regarding this problem and have been widely used as benchmark methods. However, owing to the ambiguity of the effectiveness of CycleGAN-VC/VC2 for mel-spectrogram conversion, they are typically used for mel-cepstrum conversion even when comparative methods employ mel-spectrogram as a conversion target. To address this, we examined the applicability of CycleGAN-VC/VC2 to mel-spectrogram conversion. Through initial experiments, we discovered that their direct applications compromised the time-frequency structure that should be preserved during conversion. To remedy this, we propose CycleGAN-VC3, an improvement of CycleGAN-VC2 that incorporates time-frequency adaptive normalization (TFAN). Using TFAN, we can adjust the scale and bias of the converted features while reflecting the time-frequency structure of the source mel-spectrogram. We evaluated CycleGAN-VC3 on inter-gender and intra-gender non-parallel VC. A subjective evaluation of naturalness and similarity showed that for every VC pair, CycleGAN-VC3 outperforms or is competitive with the two types of CycleGAN-VC2, one of which was applied to mel-cepstrum and the other to mel-spectrogram. Audio samples are available at http://www.kecl.ntt.co.jp/people/kaneko.takuhiro/projects/cyclegan-vc3/index.html.

READ FULL TEXT

page 1

page 2

page 3

page 4

page 5

research
02/25/2021

MaskCycleGAN-VC: Learning Non-parallel Voice Conversion with Filling in Frames

Non-parallel voice conversion (VC) is a technique for training voice con...
research
06/10/2023

Vocoder-Free Non-Parallel Conversion of Whispered Speech With Masked Cycle-Consistent Generative Adversarial Networks

Cycle-consistent generative adversarial networks have been widely used i...
research
04/09/2019

CycleGAN-VC2: Improved CycleGAN-based Non-parallel Voice Conversion

Non-parallel voice conversion (VC) is a technique for learning the mappi...
research
07/29/2019

StarGAN-VC2: Rethinking Conditional Methods for StarGAN-Based Voice Conversion

Non-parallel multi-domain voice conversion (VC) is a technique for learn...
research
04/05/2021

StarGAN-based Emotional Voice Conversion for Japanese Phrases

This paper shows that StarGAN-VC, a spectral envelope transformation met...
research
11/30/2017

Parallel-Data-Free Voice Conversion Using Cycle-Consistent Adversarial Networks

We propose a parallel-data-free voice conversion (VC) method that can le...

Please sign up or login with your details

Forgot password? Click here to reset