Speech Enhancement with Score-Based Generative Models in the Complex STFT Domain

03/31/2022
by   Simon Welker, et al.
0

Score-based generative models (SGMs) have recently shown impressive results for difficult generative tasks such as the unconditional and conditional generation of natural images and audio signals. In this work, we extend these models to the complex short-time Fourier transform (STFT) domain, proposing a novel training task for speech enhancement using a complex-valued deep neural network. We derive this training task within the formalism of stochastic differential equations, thereby enabling the use of predictor-corrector samplers. We provide alternative formulations inspired by previous publications on using SGMs for speech enhancement, avoiding the need for any prior assumptions on the noise distribution and making the training task purely generative which, as we show, results in improved enhancement performance.

READ FULL TEXT

page 1

page 4

research
07/27/2020

On the Use of Audio Fingerprinting Features for Speech Enhancement with Generative Adversarial Network

The advent of learning-based methods in speech enhancement has revived t...
research
06/02/2023

Audio-Visual Speech Enhancement with Score-Based Generative Models

This paper introduces an audio-visual speech enhancement system that lev...
research
08/11/2022

Speech Enhancement and Dereverberation with Diffusion-based Generative Models

Recently, diffusion-based generative models have been introduced to the ...
research
06/23/2022

Efficient Transformer-based Speech Enhancement Using Long Frames and STFT Magnitudes

The SepFormer architecture shows very good results in speech separation....
research
01/02/2019

End-to-End Model for Speech Enhancement by Consistent Spectrogram Masking

Recently, phase processing is attracting increasinginterest in speech en...
research
04/06/2022

FFC-SE: Fast Fourier Convolution for Speech Enhancement

Fast Fourier convolution (FFC) is the recently proposed neural operator ...
research
05/06/2021

Speech Enhancement using Separable Polling Attention and Global Layer Normalization followed with PReLU

Single channel speech enhancement is a challenging task in speech commun...

Please sign up or login with your details

Forgot password? Click here to reset