Deep Interaction between Masking and Mapping Targets for Single-Channel Speech Enhancement

by   Lu Zhang, et al.

The most recent deep neural network (DNN) models exhibit impressive denoising performance in the time-frequency (T-F) magnitude domain. However, the phase is also a critical component of the speech signal that is easily overlooked. In this paper, we propose a multi-branch dilated convolutional network (DCN) to simultaneously enhance the magnitude and phase of noisy speech. A causal and robust monaural speech enhancement system is achieved based on the multi-objective learning framework of the complex spectrum and the ideal ratio mask (IRM) targets. In the process of joint learning, the intermediate estimation of IRM targets is used as a way of generating feature attention factors to realize the information interaction between the two targets. Moreover, the proposed multi-scale dilated convolution enables the DCN model to have a more efficient temporal modeling capability. Experimental results show that compared with other state-of-the-art models, this model achieves better speech quality and intelligibility with less computation.



There are no comments yet.


page 2


Consistency-aware multi-channel speech enhancement using deep neural networks

This paper proposes a deep neural network (DNN)-based multi-channel spee...

DNN-Based Multi-Frame MVDR Filtering for Single-Microphone Speech Enhancement

Multi-frame approaches for single-microphone speech enhancement, e.g., t...

Foster Strengths and Circumvent Weaknesses: a Speech Enhancement Framework with Two-branch Collaborative Learning

Recent single-channel speech enhancement methods usually convert wavefor...

PHASEN: A Phase-and-Harmonics-Aware Speech Enhancement Network

Time-frequency (T-F) domain masking is a mainstream approach for single-...

Real-time Denoising and Dereverberation with Tiny Recurrent U-Net

Modern deep learning-based models have seen outstanding performance impr...

DF-Conformer: Integrated architecture of Conv-TasNet and Conformer using linear complexity self-attention for speech enhancement

Single-channel speech enhancement (SE) is an important task in speech pr...

Know Your Enemy, Know Yourself: A Unified Two-Stage Framework for Speech Enhancement

Traditional spectral subtraction-type single channel speech enhancement ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.