Improving Dual-Microphone Speech Enhancement by Learning Cross-Channel Features with Multi-Head Attention

05/03/2022
by   Xinmeng Xu, et al.
0

Hand-crafted spatial features, such as inter-channel intensity difference (IID) and inter-channel phase difference (IPD), play a fundamental role in recent deep learning based dual-microphone speech enhancement (DMSE) systems. However, learning the mutual relationship between artificially designed spatial and spectral features is hard in the end-to-end DMSE. In this work, a novel architecture for DMSE using a multi-head cross-attention based convolutional recurrent network (MHCA-CRN) is presented. The proposed MHCA-CRN model includes a channel-wise encoding structure for preserving intra-channel features and a multi-head cross-attention mechanism for fully exploiting cross-channel features. In addition, the proposed approach specifically formulates the decoder with an extra SNR estimator to estimate frame-level SNR under a multi-task learning framework, which is expected to avoid speech distortion led by end-to-end DMSE module. Finally, a spectral gain function is adopted to further suppress the unnatural residual noise. Experiment results demonstrated superior performance of the proposed model against several state-of-the-art models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/09/2020

Enhancing End-to-End Multi-channel Speech Separation via Spatial Feature Learning

Hand-crafted spatial features (e.g., inter-channel phase difference, IPD...
research
06/30/2022

Improving Visual Speech Enhancement Network by Learning Audio-visual Affinity with Multi-head Attention

Audio-visual speech enhancement system is regarded as one of promising s...
research
11/08/2021

Inter-channel Conv-TasNet for multichannel speech enhancement

Speech enhancement in multichannel settings has been realized by utilizi...
research
11/04/2020

DESNet: A Multi-channel Network for Simultaneous Speech Dereverberation, Enhancement and Separation

In this paper, we propose a multi-channel network for simultaneous speec...
research
06/16/2021

DCCRN+: Channel-wise Subband DCCRN with SNR Estimation for Speech Enhancement

Deep complex convolution recurrent network (DCCRN), which extends CRN wi...
research
11/01/2018

End-to-end Models with auditory attention in Multi-channel Keyword Spotting

In this paper, we propose an attention-based end-to-end model for multi-...
research
03/28/2022

An attention mechanism based convolutional network for satellite precipitation downscaling over China

Precipitation is a key part of hydrological circulation and is a sensiti...

Please sign up or login with your details

Forgot password? Click here to reset