StoRM: A Diffusion-based Stochastic Regeneration Model for Speech Enhancement and Dereverberation

12/22/2022
by   Jean-Marie Lemercier, et al.
0

Diffusion models have shown a great ability at bridging the performance gap between predictive and generative approaches for speech enhancement. We have shown that they may even outperform their predictive counterparts for non-additive corruption types or when they are evaluated on mismatched conditions. However, diffusion models suffer from a high computational burden, mainly as they require to run a neural network for each reverse diffusion step, whereas predictive approaches only require one pass. As diffusion models are generative approaches they may also produce vocalizing and breathing artifacts in adverse conditions. In comparison, in such difficult scenarios, predictive models typically do not produce such artifacts but tend to distort the target speech instead, thereby degrading the speech quality. In this work, we present a stochastic regeneration approach where an estimate given by a predictive model is provided as a guide for further diffusion. We show that the proposed approach uses the predictive model to remove the vocalizing and breathing artifacts while producing very high quality samples thanks to the diffusion model, even in adverse conditions. We further show that this approach enables to use lighter sampling schemes with fewer diffusion steps without sacrificing quality, thus lifting the computational burden by an order of magnitude. Source code and audio examples are available online (https://uhh.de/inf-sp-storm).

READ FULL TEXT

page 1

page 4

page 5

page 6

page 9

research
02/10/2022

Conditional Diffusion Probabilistic Model for Speech Enhancement

Speech enhancement is a critical component of many user-oriented audio a...
research
11/04/2022

Cold Diffusion for Speech Enhancement

Diffusion models have recently shown promising results for difficult enh...
research
11/04/2022

Analysing Diffusion-based Generative Approaches versus Discriminative Approaches for Speech Restoration

Diffusion-based generative models have had a high impact on the computer...
research
09/18/2023

Single and Few-step Diffusion for Generative Speech Enhancement

Diffusion models have shown promising results in speech enhancement, usi...
research
11/08/2022

DiffPhase: Generative Diffusion-based STFT Phase Retrieval

Diffusion probabilistic models have been recently used in a variety of t...
research
10/30/2022

SRTNet: Time Domain Speech Enhancement Via Stochastic Refinement

Diffusion model, as a new generative model which is very popular in imag...
research
06/14/2023

Variance-Preserving-Based Interpolation Diffusion Models for Speech Enhancement

The goal of this study is to implement diffusion models for speech enhan...

Please sign up or login with your details

Forgot password? Click here to reset