DF-Conformer: Integrated architecture of Conv-TasNet and Conformer using linear complexity self-attention for speech enhancement

06/30/2021
by   Yuma Koizumi, et al.
0

Single-channel speech enhancement (SE) is an important task in speech processing. A widely used framework combines an analysis/synthesis filterbank with a mask prediction network, such as the Conv-TasNet architecture. In such systems, the denoising performance and computational efficiency are mainly affected by the structure of the mask prediction network. In this study, we aim to improve the sequential modeling ability of Conv-TasNet architectures by integrating Conformer layers into a new mask prediction network. To make the model computationally feasible, we extend the Conformer using linear complexity attention and stacked 1-D dilated depthwise convolution layers. We trained the model on 3,396 hours of noisy speech data, and show that (i) the use of linear complexity attention avoids high computational complexity, and (ii) our model achieves higher scale-invariant signal-to-noise ratio than the improved time-dilated convolution network (TDCN++), an extended version of Conv-TasNet.

READ FULL TEXT
research
06/20/2019

Parameter Enhancement for MELP Speech Codec in Noisy Communication Environment

In this paper, we propose a deep learning (DL)-based parameter enhanceme...
research
07/28/2023

PCNN: A Lightweight Parallel Conformer Neural Network for Efficient Monaural Speech Enhancement

Convolutional neural networks (CNN) and Transformer have wildly succeede...
research
09/07/2023

Causal Signal-Based DCCRN with Overlapped-Frame Prediction for Online Speech Enhancement

The aim of speech enhancement is to improve speech signal quality and in...
research
08/04/2023

Efficient Monaural Speech Enhancement using Spectrum Attention Fusion

Speech enhancement is a demanding task in automated speech processing pi...
research
10/30/2021

Cross-attention conformer for context modeling in speech enhancement for ASR

This work introduces cross-attention conformer, an attention-based archi...
research
09/14/2023

Complexity Scaling for Speech Denoising

Computational complexity is critical when deploying deep learning-based ...
research
05/28/2021

Phoneme-Based Ratio Mask Estimation for Reverberant Speech Enhancement in Cochlear Implant Processors

Cochlear implant (CI) users have considerable difficulty in understandin...

Please sign up or login with your details

Forgot password? Click here to reset