Voice Conversion with Denoising Diffusion Probabilistic GAN Models

08/28/2023
by   Xulong Zhang, et al.
0

Voice conversion is a method that allows for the transformation of speaking style while maintaining the integrity of linguistic information. There are many researchers using deep generative models for voice conversion tasks. Generative Adversarial Networks (GANs) can quickly generate high-quality samples, but the generated samples lack diversity. The samples generated by the Denoising Diffusion Probabilistic Models (DDPMs) are better than GANs in terms of mode coverage and sample diversity. But the DDPMs have high computational costs and the inference speed is slower than GANs. In order to make GANs and DDPMs more practical we proposes DiffGAN-VC, a variant of GANs and DDPMS, to achieve non-parallel many-to-many voice conversion (VC). We use large steps to achieve denoising, and also introduce a multimodal conditional GANs to model the denoising diffusion generative adversarial network. According to both objective and subjective evaluation experiments, DiffGAN-VC has been shown to achieve high voice quality on non-parallel data sets. Compared with the CycleGAN-VC method, DiffGAN-VC achieves speaker similarity, naturalness and higher sound quality.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/06/2018

StarGAN-VC: Non-parallel many-to-many voice conversion with star generative adversarial networks

This paper proposes a method that allows for non-parallel many-to-many v...
research
12/15/2021

Tackling the Generative Learning Trilemma with Denoising Diffusion GANs

A wide variety of deep generative models has been developed in the past ...
research
02/22/2021

Investigating Deep Neural Structures and their Interpretability in the Domain of Voice Conversion

Generative Adversarial Networks (GANs) are machine learning networks bas...
research
05/28/2021

DiffSVC: A Diffusion Probabilistic Model for Singing Voice Conversion

Singing voice conversion (SVC) is one promising technique which can enri...
research
10/03/2022

WaveFit: An Iterative and Non-autoregressive Neural Vocoder based on Fixed-Point Iteration

Denoising diffusion probabilistic models (DDPMs) and generative adversar...
research
05/25/2023

DDDM-VC: Decoupled Denoising Diffusion Models with Disentangled Representation and Prior Mixup for Verified Robust Voice Conversion

Diffusion-based generative models have exhibited powerful generative per...
research
02/08/2022

InferGrad: Improving Diffusion Models for Vocoder by Considering Inference in Training

Denoising diffusion probabilistic models (diffusion models for short) re...

Please sign up or login with your details

Forgot password? Click here to reset