End-to-End Complex-Valued Multidilated Convolutional Neural Network for Joint Acoustic Echo Cancellation and Noise Suppression

by   Karn N. Watcharasupat, et al.

Echo and noise suppression is an integral part of a full-duplex communication system. Many recent acoustic echo cancellation (AEC) systems rely on a separate adaptive filtering module for linear echo suppression and a neural module for residual echo suppression. However, not only do adaptive filtering modules require convergence and remain susceptible to changes in acoustic environments, but this two-stage framework also often introduces unnecessary delays to the AEC system when neural modules are already capable of both linear and nonlinear echo suppression. In this paper, we exploit the offset-compensating ability of complex time-frequency masks and propose an end-to-end complex-valued neural network architecture. The building block of the proposed model is a pseudocomplex extension based on the densely-connected multidilated DenseNet (D3Net) building block, resulting in a very small network of only 354K parameters. The architecture utilized the multi-resolution nature of the D3Net building blocks to eliminate the need for pooling, allowing the network to extract features using large receptive fields without any loss of output resolution. We also propose a dual-mask technique for joint echo and noise suppression with simultaneous speech enhancement. Evaluation on both synthetic and real test sets demonstrated promising results across multiple energy-based metrics and perceptual proxies.


page 1

page 2

page 3

page 4


Monaural Speech Enhancement with Complex Convolutional Block Attention Module and Joint Time Frequency Losses

Deep complex U-Net structure and convolutional recurrent network (CRN) s...

Task splitting for DNN-based acoustic echo and noise removal

Neural networks have led to tremendous performance gains for single-task...

Efficient Context Aggregation for End-to-End Speech Enhancement Using a Densely Connected Convolutional and Recurrent Network

In speech enhancement, an end-to-end deep neural network converts a nois...

NN3A: Neural Network supported Acoustic Echo Cancellation, Noise Suppression and Automatic Gain Control for Real-Time Communications

Acoustic echo cancellation (AEC), noise suppression (NS) and automatic g...

Bandwidth-Scalable Fully Mask-Based Deep FCRN Acoustic Echo Cancellation and Postfiltering

Although today's speech communication systems support various bandwidths...

A Multiscale Autoencoder (MSAE) Framework for End-to-End Neural Network Speech Enhancement

Neural network approaches to single-channel speech enhancement have rece...

Nonlinear Residual Echo Suppression Based on Multi-stream Conv-TasNet

Acoustic echo cannot be entirely removed by linear adaptive filters due ...

Please sign up or login with your details

Forgot password? Click here to reset