Log In Sign Up

Monaural Speech Enhancement with Complex Convolutional Block Attention Module and Joint Time Frequency Losses

by   Shengkui Zhao, et al.

Deep complex U-Net structure and convolutional recurrent network (CRN) structure achieve state-of-the-art performance for monaural speech enhancement. Both deep complex U-Net and CRN are encoder and decoder structures with skip connections, which heavily rely on the representation power of the complex-valued convolutional layers. In this paper, we propose a complex convolutional block attention module (CCBAM) to boost the representation power of the complex-valued convolutional layers by constructing more informative features. The CCBAM is a lightweight and general module which can be easily integrated into any complex-valued convolutional layers. We integrate CCBAM with the deep complex U-Net and CRN to enhance their performance for speech enhancement. We further propose a mixed loss function to jointly optimize the complex models in both time-frequency (TF) domain and time domain. By integrating CCBAM and the mixed loss, we form a new end-to-end (E2E) complex speech enhancement framework. Ablation experiments and objective evaluations show the superior performance of the proposed approaches.


page 1

page 2

page 3

page 4


FRCRN: Boosting Feature Representation using Frequency Recurrence for Monaural Speech Enhancement

Convolutional recurrent networks (CRN) integrating a convolutional encod...

Complex-Valued Time-Frequency Self-Attention for Speech Dereverberation

Several speech processing systems have demonstrated considerable perform...

Using recurrences in time and frequency within U-net architecture for speech enhancement

When designing fully-convolutional neural network, there is a trade-off ...

Time-Frequency Attention for Monaural Speech Enhancement

Most studies on speech enhancement generally don't consider the energy d...

Channel-Attention Dense U-Net for Multichannel Speech Enhancement

Supervised deep learning has gained significant attention for speech enh...

DCCRN+: Channel-wise Subband DCCRN with SNR Estimation for Speech Enhancement

Deep complex convolution recurrent network (DCCRN), which extends CRN wi...