SpecGrad: Diffusion Probabilistic Model based Neural Vocoder with Adaptive Noise Spectral Shaping

03/31/2022
by   Yuma Koizumi, et al.
0

Neural vocoder using denoising diffusion probabilistic model (DDPM) has been improved by adaptation of the diffusion noise distribution to given acoustic features. In this study, we propose SpecGrad that adapts the diffusion noise so that its time-varying spectral envelope becomes close to the conditioning log-mel spectrogram. This adaptation by time-varying filtering improves the sound quality especially in the high-frequency bands. It is processed in the time-frequency domain to keep the computational cost almost the same as the conventional DDPM-based neural vocoders. Experimental results showed that SpecGrad generates higher-fidelity speech waveform than conventional DDPM-based neural vocoders in both analysis-synthesis and speech enhancement scenarios. Audio demos are available at wavegrad.github.io/specgrad/.

READ FULL TEXT
research
07/25/2021

A Study on Speech Enhancement Based on Diffusion Probabilistic Model

Diffusion probabilistic models have demonstrated an outstanding capabili...
research
08/27/2019

Neural Harmonic-plus-Noise Waveform Model with Trainable Maximum Voice Frequency for Text-to-Speech Synthesis

Neural source-filter (NSF) models are deep neural networks that produce ...
research
12/21/2018

Multi-Domain Processing via Hybrid Denoising Networks for Speech Enhancement

We present a hybrid framework that leverages the trade-off between tempo...
research
11/09/2018

ExcitNet vocoder: A neural excitation model for parametric speech synthesis systems

This paper proposes a WaveNet-based neural excitation model (ExcitNet) f...
research
03/03/2023

An investigation into the adaptability of a diffusion-based TTS model

Given the recent success of diffusion in producing natural-sounding synt...
research
11/27/2022

Diffusion Probabilistic Model Made Slim

Despite the recent visually-pleasing results achieved, the massive compu...
research
12/21/2017

On the Use of a Spectral Glottal Model for the Source-filter Separation of Speech

The estimation of glottal flow from a speech waveform is a key method fo...

Please sign up or login with your details

Forgot password? Click here to reset