Know Your Enemy, Know Yourself: A Unified Two-Stage Framework for Speech Enhancement

by   Wenzhe Liu, et al.

Traditional spectral subtraction-type single channel speech enhancement (SE) algorithms often need to estimate interference components including noise and/or reverberation before subtracting them while deep neural network-based SE methods often aim to realize the end-to-end target mapping. In this paper, we show that both denoising and dereverberation can be unified into a common problem by introducing a two-stage paradigm, namely for interference components estimation and speech recovery. In the first stage, we propose to explicitly extract the magnitude of interference components, which serves as the prior information. In the second stage, with the guidance of this estimated magnitude prior, we can expect to better recover the target speech. In addition, we propose a transform module to facilitate the interaction between interference components and the desired speech modalities. Meanwhile, a temporal fusion module is designed to model long-term dependencies without ignoring short-term details. We conduct the experiments on the WSJ0-SI84 corpus and the results on both denoising and dereverberation tasks show that our approach outperforms previous advanced systems and achieves state-of-the-art performance in terms of many objective metrics.



There are no comments yet.


page 4


A Two-stage Complex Network using Cycle-consistent Generative Adversarial Networks for Speech Enhancement

Cycle-consistent generative adversarial networks (CycleGAN) have shown t...

DBT-Net: Dual-branch federative magnitude and phase estimation with attention-in-attention transformer for monaural speech enhancement

The decoupling-style concept begins to ignite in the speech enhancement ...

MDNet: Learning Monaural Speech Enhancement from Deep Prior Gradient

While traditional statistical signal processing model-based methods can ...

WaveCRN: An Efficient Convolutional Recurrent Neural Network for End-to-end Speech Enhancement

Due to the simple design pipeline, end-to-end (E2E) neural models for sp...

Monaural Speech Enhancement using Deep Neural Networks by Maximizing a Short-Time Objective Intelligibility Measure

In this paper we propose a Deep Neural Network (DNN) based Speech Enhanc...

Taylor, Can You Hear Me Now? A Taylor-Unfolding Framework for Monaural Speech Enhancement

While the deep learning techniques promote the rapid development of the ...

A unified convolutional beamformer for simultaneous denoising and dereverberation

This paper proposes a method for estimating a convolutional beamformer t...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.