Glance and Gaze: A Collaborative Learning Framework for Single-channel Speech Enhancement

06/22/2021
by   Andong Li, et al.
0

The capability of the human to pay attention to both coarse and fine-grained regions has been applied to computer vision tasks. Motivated by that, we propose a collaborative learning framework in the complex domain for monaural noise suppression. The proposed system consists of two principal modules, namely spectral feature extraction module (FEM) and stacked glance-gaze modules (GGMs). In FEM, the UNet-block is introduced after each convolution layer, enabling the feature recalibration from multiple scales. In each GGM, we decompose the multi-target optimization in the complex spectrum into two sub-tasks. Specifically, the glance path aims to suppress the noise in the magnitude domain to obtain a coarse estimation, and meanwhile, the gaze path attempts to compensate for the lost spectral detail in the complex domain. The two paths work collaboratively and facilitate spectral estimation from complementary perspectives. Besides, by repeatedly unfolding the GGMs, the intermediate result can be iteratively refined across stages and lead to the ultimate estimation of the spectrum. The experiments are conducted on the WSJ0-SI84, DNS-Challenge dataset, and Voicebank+Demand dataset. Results show that the proposed approach achieves state-of-the-art performance over previous advanced systems on the WSJ0-SI84 and DNS-Challenge dataset, and meanwhile, competitive performance is achieved on the Voicebank+Demand corpus.

READ FULL TEXT
research
02/16/2022

DBT-Net: Dual-branch federative magnitude and phase estimation with attention-in-attention transformer for monaural speech enhancement

The decoupling-style concept begins to ignite in the speech enhancement ...
research
10/13/2021

Dual-branch Attention-In-Attention Transformer for single-channel speech enhancement

Curriculum learning begins to thrive in the speech enhancement area, whi...
research
11/11/2021

Uformer: A Unet based dilated complex real dual-path conformer network for simultaneous speech enhancement and dereverberation

Complex spectrum and magnitude are considered as two major features of s...
research
10/26/2022

Parallel Gated Neural Network With Attention Mechanism For Speech Enhancement

Deep learning algorithm are increasingly used for speech enhancement (SE...
research
10/12/2021

Foster Strengths and Circumvent Weaknesses: a Speech Enhancement Framework with Two-branch Collaborative Learning

Recent single-channel speech enhancement methods usually convert wavefor...
research
11/03/2020

Two Heads Are Better Than One: A Two-Stage Approach for Monaural Noise Reduction in the Complex Domain

In low signal-to-noise ratio conditions, it is difficult to effectively ...
research
10/12/2022

Explore Contextual Information for 3D Scene Graph Generation

3D scene graph generation (SGG) has been of high interest in computer vi...

Please sign up or login with your details

Forgot password? Click here to reset