Deep Interaction between Masking and Mapping Targets for Single-Channel Speech Enhancement

06/09/2021
by   Lu Zhang, et al.
0

The most recent deep neural network (DNN) models exhibit impressive denoising performance in the time-frequency (T-F) magnitude domain. However, the phase is also a critical component of the speech signal that is easily overlooked. In this paper, we propose a multi-branch dilated convolutional network (DCN) to simultaneously enhance the magnitude and phase of noisy speech. A causal and robust monaural speech enhancement system is achieved based on the multi-objective learning framework of the complex spectrum and the ideal ratio mask (IRM) targets. In the process of joint learning, the intermediate estimation of IRM targets is used as a way of generating feature attention factors to realize the information interaction between the two targets. Moreover, the proposed multi-scale dilated convolution enables the DCN model to have a more efficient temporal modeling capability. Experimental results show that compared with other state-of-the-art models, this model achieves better speech quality and intelligibility with less computation.

READ FULL TEXT
research
02/14/2020

Consistency-aware multi-channel speech enhancement using deep neural networks

This paper proposes a deep neural network (DNN)-based multi-channel spee...
research
09/22/2022

CMGAN: Conformer-Based Metric-GAN for Monaural Speech Enhancement

Convolution-augmented transformers (Conformers) are recently proposed in...
research
02/16/2022

DBT-Net: Dual-branch federative magnitude and phase estimation with attention-in-attention transformer for monaural speech enhancement

The decoupling-style concept begins to ignite in the speech enhancement ...
research
05/16/2020

Sparse Mixture of Local Experts for Efficient Speech Enhancement

In this paper, we investigate a deep learning approach for speech denois...
research
10/12/2021

Foster Strengths and Circumvent Weaknesses: a Speech Enhancement Framework with Two-branch Collaborative Learning

Recent single-channel speech enhancement methods usually convert wavefor...
research
11/12/2019

PHASEN: A Phase-and-Harmonics-Aware Speech Enhancement Network

Time-frequency (T-F) domain masking is a mainstream approach for single-...
research
02/05/2021

Real-time Denoising and Dereverberation with Tiny Recurrent U-Net

Modern deep learning-based models have seen outstanding performance impr...

Please sign up or login with your details

Forgot password? Click here to reset