An Efficient Speech Separation Network Based on Recurrent Fusion Dilated Convolution and Channel Attention

06/09/2023
by   Junyu Wang, et al.
0

We present an efficient speech separation neural network, ARFDCN, which combines dilated convolutions, multi-scale fusion (MSF), and channel attention to overcome the limited receptive field of convolution-based networks and the high computational cost of transformer-based networks. The suggested network architecture is encoder-decoder based. By using dilated convolutions with gradually increasing dilation value to learn local and global features and fusing them at adjacent stages, the model can learn rich feature content. Meanwhile, by adding channel attention modules to the network, the model can extract channel weights, learn more important features, and thus improve its expressive power and robustness. Experimental results indicate that the model achieves a decent balance between performance and computational efficiency, making it a promising alternative to current mainstream models for practical applications.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/30/2022

An efficient encoder-decoder architecture with top-down attention for speech separation

Deep neural networks have shown excellent prospects in speech separation...
research
12/14/2022

Multi-Scale Feature Fusion Transformer Network for End-to-End Single Channel Speech Separation

Recently studies on time-domain audio separation networks (TasNets) have...
research
03/01/2023

PCF: ECAPA-TDNN with Progressive Channel Fusion for Speaker Verification

ECAPA-TDNN is currently the most popular TDNN-series model for speaker v...
research
02/28/2022

FusionCount: Efficient Crowd Counting via Multiscale Feature Fusion

State-of-the-art crowd counting models follow an encoder-decoder approac...
research
04/19/2022

CTCNet: A CNN-Transformer Cooperation Network for Face Image Super-Resolution

Recently, deep convolution neural networks (CNNs) steered face super-res...
research
12/04/2021

Speech Separation Using an Asynchronous Fully Recurrent Convolutional Neural Network

Recent advances in the design of neural network architectures, in partic...
research
02/05/2020

Spatial and spectral deep attention fusion for multi-channel speech separation using deep embedding features

Multi-channel deep clustering (MDC) has acquired a good performance for ...

Please sign up or login with your details

Forgot password? Click here to reset