An efficient encoder-decoder architecture with top-down attention for speech separation

09/30/2022
by   Kai Li, et al.
0

Deep neural networks have shown excellent prospects in speech separation tasks. However, obtaining good results while keeping a low model complexity remains challenging in real-world applications. In this paper, we provide a bio-inspired efficient encoder-decoder architecture by mimicking the brain's top-down attention, called TDANet, with decreased model complexity without sacrificing performance. The top-down attention in TDANet is extracted by the global attention (GA) module and the cascaded local attention (LA) layers. The GA module takes multi-scale acoustic features as input to extract global attention signal, which then modulates features of different scales by direct top-down connections. The LA layers use features of adjacent layers as input to extract the local attention signal, which is used to modulate the lateral input in a top-down manner. On three benchmark datasets, TDANet consistently achieved competitive separation performance to previous state-of-the-art (SOTA) methods with higher efficiency. Specifically, TDANet's multiply-accumulate operations (MACs) are only 5% of Sepformer, one of the previous SOTA models, and CPU inference time is only 10% of Sepformer. In addition, a large-size version of TDANet obtained SOTA results on three datasets, with MACs still only 10% of Sepformer and the CPU inference time only 24% of Sepformer. Our study suggests that top-down attention can be a more efficient strategy for speech separation.

READ FULL TEXT
research
06/09/2023

An Efficient Speech Separation Network Based on Recurrent Fusion Dilated Convolution and Channel Attention

We present an efficient speech separation neural network, ARFDCN, which ...
research
01/12/2023

Adaptive Context Selection for Polyp Segmentation

Accurate polyp segmentation is of great significance for the diagnosis a...
research
11/29/2020

A comparison of handcrafted, parameterized, and learnable features for speech separation

The design of acoustic features is important for speech separation. It c...
research
11/01/2017

TasNet: time-domain audio separation network for real-time, single-channel speech separation

Robust speech processing in multi-talker environments requires effective...
research
10/10/2022

LAPFormer: A Light and Accurate Polyp Segmentation Transformer

Polyp segmentation is still known as a difficult problem due to the larg...
research
10/20/2016

Jointly Learning to Align and Convert Graphemes to Phonemes with Neural Attention Models

We propose an attention-enabled encoder-decoder model for the problem of...
research
12/04/2021

Speech Separation Using an Asynchronous Fully Recurrent Convolutional Neural Network

Recent advances in the design of neural network architectures, in partic...

Please sign up or login with your details

Forgot password? Click here to reset