diffwave
DiffWave is a fast, high-quality neural vocoder and waveform synthesizer.
view repo
In this work, we propose DiffWave, a versatile Diffusion probabilistic model for conditional and unconditional Waveform generation. The model is non-autoregressive, and converts the white noise signal into structured waveform through a Markov chain with a constant number of steps at synthesis. It is efficiently trained by optimizing a variant of variational bound on the data likelihood. DiffWave produces high-fidelity audios in Different Waveform generation tasks, including neural vocoding conditioned on mel spectrogram, class-conditional generation, and unconditional generation. We demonstrate that DiffWave matches a strong WaveNet vocoder in terms of speech quality (MOS: 4.44 versus 4.43), while synthesizing orders of magnitude faster. In particular, it significantly outperforms autoregressive and GAN-based waveform models in the challenging unconditional generation task in terms of audio quality and sample diversity from various automatic and human evaluations.
READ FULL TEXTDiffWave is a fast, high-quality neural vocoder and waveform synthesizer.
Pytorch Reimplementation of DiffWave Vocoder: a high quality, fast, and small neural vocoder.
Tensorflow implementation of DiffWave: A Versatile Diffusion Model for Audio Synthesis
None
DiffWave with variaitional diffusion models