Speeding Up Speech Synthesis In Diffusion Models By Reducing Data Distribution Recovery Steps Via Content Transfer

09/18/2023
by   Peter Ochieng, et al.
0

Diffusion based vocoders have been criticised for being slow due to the many steps required during sampling. Moreover, the model's loss function that is popularly implemented is designed such that the target is the original input x_0 or error ϵ_0. For early time steps of the reverse process, this results in large prediction errors, which can lead to speech distortions and increase the learning time. We propose a setup where the targets are the different outputs of forward process time steps with a goal to reduce the magnitude of prediction errors and reduce the training time. We use the different layers of a neural network (NN) to perform denoising by training them to learn to generate representations similar to the noised outputs in the forward process of the diffusion. The NN layers learn to progressively denoise the input in the reverse process until finally the final layer estimates the clean speech. To avoid 1:1 mapping between layers of the neural network and the forward process steps, we define a skip parameter τ>1 such that an NN layer is trained to cumulatively remove the noise injected in the τ steps in the forward process. This significantly reduces the number of data distribution recovery steps and, consequently, the time to generate speech. We show through extensive evaluation that the proposed technique generates high-fidelity speech in competitive time that outperforms current state-of-the-art tools. The proposed technique is also able to generalize well to unseen speech.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/28/2022

DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANs

Denoising diffusion probabilistic models (DDPMs) are expressive generati...
research
08/03/2023

Adversarial Training of Denoising Diffusion Model Using Dual Discriminators for High-Fidelity Multi-Speaker TTS

The diffusion model is capable of generating high-quality data through a...
research
02/19/2022

Truncated Diffusion Probabilistic Models

Employing a forward Markov diffusion chain to gradually map the data to ...
research
02/18/2023

Modelos Generativos basados en Mecanismos de Difusión

Diffusion-based generative models are a design framework that allows gen...
research
07/13/2022

ProDiff: Progressive Fast Diffusion Model For High-Quality Text-to-Speech

Denoising diffusion probabilistic models (DDPMs) have recently achieved ...
research
09/16/2022

Denoising Diffusion Error Correction Codes

Error correction code (ECC) is an integral part of the physical communic...
research
03/12/2015

Deep Unsupervised Learning using Nonequilibrium Thermodynamics

A central problem in machine learning involves modeling complex data-set...

Please sign up or login with your details

Forgot password? Click here to reset