Axial Residual Networks for CycleGAN-based Voice Conversion

02/16/2021
by   Jaeseong You, et al.
0

We propose a novel architecture and improved training objectives for non-parallel voice conversion. Our proposed CycleGAN-based model performs a shape-preserving transformation directly on a high frequency-resolution magnitude spectrogram, converting its style (i.e. speaker identity) while preserving the speech content. Throughout the entire conversion process, the model does not resort to compressed intermediate representations of any sort (e.g. mel spectrogram, low resolution spectrogram, decomposed network feature). We propose an efficient axial residual block architecture to support this expensive procedure and various modifications to the CycleGAN losses to stabilize the training process. We demonstrate via experiments that our proposed model outperforms Scyclone and shows a comparable or better performance to that of CycleGAN-VC2 even without employing a neural vocoder.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/17/2020

Optimizing voice conversion network with cycle consistency loss of speaker identity

We propose a novel training scheme to optimize voice conversion network ...
research
08/08/2022

TGAVC: Improving Autoencoder Voice Conversion with Text-Guided and Adversarial Training

Non-parallel many-to-many voice conversion remains an interesting but ch...
research
10/22/2019

SoftGAN: Learning generative models efficiently with application to CycleGAN Voice Conversion

Voice conversion with deep neural networks has become extremely popular ...
research
10/22/2020

Towards Low-Resource StarGAN Voice Conversion using Weight Adaptive Instance Normalization

Many-to-many voice conversion with non-parallel training data has seen s...
research
02/01/2020

Transforming Spectrum and Prosody for Emotional Voice Conversion with Non-Parallel Training Data

Emotional voice conversion is to convert the spectrum and prosody to cha...
research
02/05/2020

Vocoder-free End-to-End Voice Conversion with Transformer Network

Mel-frequency filter bank (MFB) based approaches have the advantage of l...
research
09/26/2021

Frequency Disentangled Residual Network

Residual networks (ResNets) have been utilized for various computer visi...

Please sign up or login with your details

Forgot password? Click here to reset