StarGAN-VC: Non-parallel many-to-many voice conversion with star generative adversarial networks

06/06/2018
by   Hirokazu Kameoka, et al.
2

This paper proposes a method that allows for non-parallel many-to-many voice conversion (VC) by using a variant of generative adversarial networks (GANs) called StarGAN. Our method, which we term StarGAN-VC, is remarkable in that it (1) requires neither parallel utterances, transcriptions, nor time alignment procedures for speech generator training, (2) simultaneously learns many-to-many mappings across different attribute domains using a single generator network, (3) is able to generate signals of converted speech quickly enough to allow for real-time implementations and (4) requires only several minutes of training examples to generate reasonably realistic-sounding speech. Subjective evaluation experiments on a non-parallel many-to-many speaker identity conversion task revealed that the proposed method obtained higher sound quality and speaker similarity than a state-of-the-art method based on variational autoencoding GANs.

READ FULL TEXT

page 4

page 6

page 7

research
08/27/2020

Non-Parallel Voice Conversion with Augmented Classifier Star Generative Adversarial Networks

We have previously proposed a method that allows for non-parallel voice ...
research
08/28/2023

Voice Conversion with Denoising Diffusion Probabilistic GAN Models

Voice conversion is a method that allows for the transformation of speak...
research
07/29/2019

StarGAN-VC2: Rethinking Conditional Methods for StarGAN-Based Voice Conversion

Non-parallel multi-domain voice conversion (VC) is a technique for learn...
research
06/10/2023

Vocoder-Free Non-Parallel Conversion of Whispered Speech With Masked Cycle-Consistent Generative Adversarial Networks

Cycle-consistent generative adversarial networks have been widely used i...
research
02/22/2021

Investigating Deep Neural Structures and their Interpretability in the Domain of Voice Conversion

Generative Adversarial Networks (GANs) are machine learning networks bas...
research
04/04/2017

Voice Conversion from Unaligned Corpora using Variational Autoencoding Wasserstein Generative Adversarial Networks

Building a voice conversion (VC) system from non-parallel speech corpora...
research
04/09/2019

CycleGAN-VC2: Improved CycleGAN-based Non-parallel Voice Conversion

Non-parallel voice conversion (VC) is a technique for learning the mappi...

Please sign up or login with your details

Forgot password? Click here to reset