DeepAI AI Chat
Log In Sign Up

DiffWave: A Versatile Diffusion Model for Audio Synthesis

09/21/2020
by   Zhifeng Kong, et al.
9

In this work, we propose DiffWave, a versatile Diffusion probabilistic model for conditional and unconditional Waveform generation. The model is non-autoregressive, and converts the white noise signal into structured waveform through a Markov chain with a constant number of steps at synthesis. It is efficiently trained by optimizing a variant of variational bound on the data likelihood. DiffWave produces high-fidelity audios in Different Waveform generation tasks, including neural vocoding conditioned on mel spectrogram, class-conditional generation, and unconditional generation. We demonstrate that DiffWave matches a strong WaveNet vocoder in terms of speech quality (MOS: 4.44 versus 4.43), while synthesizing orders of magnitude faster. In particular, it significantly outperforms autoregressive and GAN-based waveform models in the challenging unconditional generation task in terms of audio quality and sample diversity from various automatic and human evaluations.

READ FULL TEXT

page 1

page 2

page 3

page 4

09/02/2020

WaveGrad: Estimating Gradients for Waveform Generation

This paper introduces WaveGrad, a conditional model for waveform generat...
06/17/2021

WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis

This paper introduces WaveGrad 2, a non-autoregressive generative model ...
10/26/2022

Full-band General Audio Synthesis with Score-based Diffusion

Recent works have shown the capability of deep generative models to tack...
02/23/2019

GANSynth: Adversarial Neural Audio Synthesis

Efficient audio synthesis is an inherently difficult machine learning ta...
05/16/2023

SoundStorm: Efficient Parallel Audio Generation

We present SoundStorm, a model for efficient, non-autoregressive audio g...
05/30/2022

BinauralGrad: A Two-Stage Conditional Diffusion Probabilistic Model for Binaural Audio Synthesis

Binaural audio plays a significant role in constructing immersive augmen...
06/01/2023

UnDiff: Unsupervised Voice Restoration with Unconditional Diffusion Model

This paper introduces UnDiff, a diffusion probabilistic model capable of...

Code Repositories

diffwave

DiffWave is a fast, high-quality neural vocoder and waveform synthesizer.


view repo

DiffWave-Vocoder

Pytorch Reimplementation of DiffWave Vocoder: a high quality, fast, and small neural vocoder.


view repo

tf-diffwave

Tensorflow implementation of DiffWave: A Versatile Diffusion Model for Audio Synthesis


view repo

jax-variational-diffwave

DiffWave with variaitional diffusion models


view repo