D2Former: A Fully Complex Dual-Path Dual-Decoder Conformer Network using Joint Complex Masking and Complex Spectral Mapping for Monaural Speech Enhancement

02/23/2023
by   Shengkui Zhao, et al.
0

Monaural speech enhancement has been widely studied using real networks in the time-frequency (TF) domain. However, the input and the target are naturally complex-valued in the TF domain, a fully complex network is highly desirable for effectively learning the feature representation and modelling the sequence in the complex domain. Moreover, phase, an important factor for perceptual quality of speech, has been proved learnable together with magnitude from noisy speech using complex masking or complex spectral mapping. Many recent studies focus on either complex masking or complex spectral mapping, ignoring their performance boundaries. To address above issues, we propose a fully complex dual-path dual-decoder conformer network (D2Former) using joint complex masking and complex spectral mapping for monaural speech enhancement. In D2Former, we extend the conformer network into the complex domain and form a dual-path complex TF self-attention architecture for effectively modelling the complex-valued TF sequence. We further boost the TF feature representation in the encoder and the decoders using a dual-path learning structure by exploiting complex dilated convolutions on time dependency and complex feedforward sequential memory networks (CFSMN) for frequency recurrence. In addition, we improve the performance boundaries of complex masking and complex spectral mapping by combining the strengths of the two training targets into a joint-learning framework. As a consequence, D2Former takes fully advantages of the complex-valued operations, the dual-path processing, and the joint-training targets. Compared to the previous models, D2Former achieves state-of-the-art results on the VoiceBank+Demand benchmark with the smallest model size of 0.87M parameters.

READ FULL TEXT
research
06/15/2022

FRCRN: Boosting Feature Representation using Frequency Recurrence for Monaural Speech Enhancement

Convolutional recurrent networks (CRN) integrating a convolutional encod...
research
11/11/2021

Uformer: A Unet based dilated complex real dual-path conformer network for simultaneous speech enhancement and dereverberation

Complex spectrum and magnitude are considered as two major features of s...
research
04/12/2021

Complex Spectral Mapping With Attention Based Convolution Recurrent Neural Network for Speech Enhancement

Speech enhancement has benefited from the success of deep learning in te...
research
06/09/2023

Efficient Encoder-Decoder and Dual-Path Conformer for Comprehensive Feature Learning in Speech Enhancement

Current speech enhancement (SE) research has largely neglected channel a...
research
10/26/2022

Parallel Gated Neural Network With Attention Mechanism For Speech Enhancement

Deep learning algorithm are increasingly used for speech enhancement (SE...
research
05/15/2023

ForkNet: Simultaneous Time and Time-Frequency Domain Modeling for Speech Enhancement

Previous research in speech enhancement has mostly focused on modeling t...
research
09/04/2023

Single-Channel Speech Enhancement with Deep Complex U-Networks and Probabilistic Latent Space Models

In this paper, we propose to extend the deep, complex U-Network architec...

Please sign up or login with your details

Forgot password? Click here to reset