Dual-branch Attention-In-Attention Transformer for single-channel speech enhancement

10/13/2021
by   Guochen Yu, et al.
3

Curriculum learning begins to thrive in the speech enhancement area, which decouples the original spectrum estimation task into multiple easier sub-tasks to achieve better performance. Motivated by that, we propose a dual-branch attention-in-attention transformer dubbed DB-AIAT to handle both coarse- and fine-grained regions of the spectrum in parallel. From a complementary perspective, a magnitude masking branch is proposed to coarsely estimate the overall magnitude spectrum, and simultaneously a complex refining branch is elaborately designed to compensate for the missing spectral details and implicitly derive phase information. Within each branch, we propose a novel attention-in-attention transformer-based module to replace the conventional RNNs and temporal convolutional networks for temporal sequence modeling. Specifically, the proposed attention-in-attention transformer consists of adaptive temporal-frequency attention transformer blocks and an adaptive hierarchical attention module, aiming to capture long-term temporal-frequency dependencies and further aggregate global hierarchical contextual information. Experimental results on Voice Bank + DEMAND demonstrate that DB-AIAT yields state-of-the-art performance (e.g., 3.31 PESQ, 95.6 over previous advanced systems with a relatively small model size (2.81M).

READ FULL TEXT
research
02/16/2022

DBT-Net: Dual-branch federative magnitude and phase estimation with attention-in-attention transformer for monaural speech enhancement

The decoupling-style concept begins to ignite in the speech enhancement ...
research
07/28/2021

CycleGAN-based Non-parallel Speech Enhancement with an Adaptive Attention-in-attention Mechanism

Non-parallel training is a difficult but essential task for DNN-based sp...
research
06/22/2021

Glance and Gaze: A Collaborative Learning Framework for Single-channel Speech Enhancement

The capability of the human to pay attention to both coarse and fine-gra...
research
09/04/2023

Single-Channel Speech Enhancement with Deep Complex U-Networks and Probabilistic Latent Space Models

In this paper, we propose to extend the deep, complex U-Network architec...
research
03/27/2022

Adaptive Frequency Learning in Two-branch Face Forgery Detection

Face forgery has attracted increasing attention in recent applications o...
research
02/19/2021

Frequency-Temporal Attention Network for Singing Melody Extraction

Musical audio is generally composed of three physical properties: freque...
research
06/12/2021

Dynamic Clone Transformer for Efficient Convolutional Neural Netwoks

Convolutional networks (ConvNets) have shown impressive capability to so...

Please sign up or login with your details

Forgot password? Click here to reset