Efficient Monaural Speech Enhancement using Spectrum Attention Fusion

08/04/2023
∙
by   Jinyu Long, et al.
∙
0
∙

Speech enhancement is a demanding task in automated speech processing pipelines, focusing on separating clean speech from noisy channels. Transformer based models have recently bested RNN and CNN models in speech enhancement, however at the same time they are much more computationally expensive and require much more high quality training data, which is always hard to come by. In this paper, we present an improvement for speech enhancement models that maintains the expressiveness of self-attention while significantly reducing model complexity, which we have termed Spectrum Attention Fusion. We carefully construct a convolutional module to replace several self-attention layers in a speech Transformer, allowing the model to more efficiently fuse spectral features. Our proposed model is able to achieve comparable or better results against SOTA models but with significantly smaller parameters (0.58M) on the Voice Bank + DEMAND dataset.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
∙ 07/28/2023

PCNN: A Lightweight Parallel Conformer Neural Network for Efficient Monaural Speech Enhancement

Convolutional neural networks (CNN) and Transformer have wildly succeede...
research
∙ 02/06/2022

On Using Transformers for Speech-Separation

Transformers have enabled major improvements in deep learning. They ofte...
research
∙ 05/06/2021

Speech Enhancement using Separable Polling Attention and Global Layer Normalization followed with PReLU

Single channel speech enhancement is a challenging task in speech commun...
research
∙ 05/15/2023

Ripple sparse self-attention for monaural speech enhancement

The use of Transformer represents a recent success in speech enhancement...
research
∙ 09/04/2023

Single-Channel Speech Enhancement with Deep Complex U-Networks and Probabilistic Latent Space Models

In this paper, we propose to extend the deep, complex U-Network architec...
research
∙ 10/13/2019

T-GSA: Transformer with Gaussian-weighted self-attention for speech enhancement

Transformer neural networks (TNN) demonstrated state-of-art performance ...
research
∙ 06/30/2021

DF-Conformer: Integrated architecture of Conv-TasNet and Conformer using linear complexity self-attention for speech enhancement

Single-channel speech enhancement (SE) is an important task in speech pr...

Please sign up or login with your details

Forgot password? Click here to reset