DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANs

01/28/2022
by   Songxiang Liu, et al.
0

Denoising diffusion probabilistic models (DDPMs) are expressive generative models that have been used to solve a variety of speech synthesis problems. However, because of their high sampling costs, DDPMs are difficult to use in real-time speech processing applications. In this paper, we introduce DiffGAN-TTS, a novel DDPM-based text-to-speech (TTS) model achieving high-fidelity and efficient speech synthesis. DiffGAN-TTS is based on denoising diffusion generative adversarial networks (GANs), which adopt an adversarially-trained expressive model to approximate the denoising distribution. We show with multi-speaker TTS experiments that DiffGAN-TTS can generate high-fidelity speech samples within only 4 denoising steps. We present an active shallow diffusion mechanism to further speed up inference. A two-stage training scheme is proposed, with a basic TTS acoustic model trained at stage one providing valuable prior information for a DDPM trained at stage two. Our experiments show that DiffGAN-TTS can achieve high synthesis performance with only 1 denoising step.

READ FULL TEXT

page 7

page 16

research
08/03/2023

Adversarial Training of Denoising Diffusion Model Using Dual Discriminators for High-Fidelity Multi-Speaker TTS

The diffusion model is capable of generating high-quality data through a...
research
11/22/2022

Can denoising diffusion probabilistic models generate realistic astrophysical fields?

Score-based generative models have emerged as alternatives to generative...
research
08/31/2023

LightGrad: Lightweight Diffusion Probabilistic Model for Text-to-Speech

Recent advances in neural text-to-speech (TTS) models bring thousands of...
research
09/18/2023

Speeding Up Speech Synthesis In Diffusion Models By Reducing Data Distribution Recovery Steps Via Content Transfer

Diffusion based vocoders have been criticised for being slow due to the ...
research
06/07/2021

Learning to Efficiently Sample from Diffusion Probabilistic Models

Denoising Diffusion Probabilistic Models (DDPMs) have emerged as a power...
research
07/13/2022

ProDiff: Progressive Fast Diffusion Model For High-Quality Text-to-Speech

Denoising diffusion probabilistic models (DDPMs) have recently achieved ...
research
04/03/2021

Diff-TTS: A Denoising Diffusion Model for Text-to-Speech

Although neural text-to-speech (TTS) models have attracted a lot of atte...

Please sign up or login with your details

Forgot password? Click here to reset