Towards Low-Resource StarGAN Voice Conversion using Weight Adaptive Instance Normalization

10/22/2020
by   Mingjie Chen, et al.
0

Many-to-many voice conversion with non-parallel training data has seen significant progress in recent years. StarGAN-based models have been interests of voice conversion. However, most of the StarGAN-based methods only focused on voice conversion experiments for the situations where the number of speakers was small, and the amount of training data was large. In this work, we aim at improving the data efficiency of the model and achieving a many-to-many non-parallel StarGAN-based voice conversion for a relatively large number of speakers with limited training samples. In order to improve data efficiency, the proposed model uses a speaker encoder for extracting speaker embeddings and conducts adaptive instance normalization (AdaIN) on convolutional weights. Experiments are conducted with 109 speakers under two low-resource situations, where the number of training samples is 20 and 5 per speaker. An objective evaluation shows the proposed model is better than the baseline methods. Furthermore, a subjective evaluation shows that, for both naturalness and similarity, the proposed model outperforms the baseline method.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/10/2019

One-shot Voice Conversion by Separating Speaker and Content Representations with Instance Normalization

Recently, voice conversion (VC) without parallel data has been successfu...
research
04/30/2019

Many-to-Many Voice Conversion with Out-of-Dataset Speaker Support

We present a Cycle-GAN based many-to-many voice conversion method that c...
research
02/15/2020

Many-to-Many Voice Conversion using Conditional Cycle-Consistent Adversarial Networks

Voice conversion (VC) refers to transforming the speaker characteristics...
research
04/12/2017

Trainable Referring Expression Generation using Overspecification Preferences

Referring expression generation (REG) models that use speaker-dependent ...
research
02/16/2021

Axial Residual Networks for CycleGAN-based Voice Conversion

We propose a novel architecture and improved training objectives for non...
research
06/03/2019

Blow: a single-scale hyperconditioned flow for non-parallel raw-audio voice conversion

End-to-end models for raw audio generation are a challenge, specially if...
research
02/25/2021

MaskCycleGAN-VC: Learning Non-parallel Voice Conversion with Filling in Frames

Non-parallel voice conversion (VC) is a technique for training voice con...

Please sign up or login with your details

Forgot password? Click here to reset