U-shaped Transformer with Frequency-Band Aware Attention for Speech Enhancement

12/11/2021
by   Yi Li, et al.
0

The state-of-the-art speech enhancement has limited performance in speech estimation accuracy. Recently, in deep learning, the Transformer shows the potential to exploit the long-range dependency in speech by self-attention. Therefore, it is introduced in speech enhancement to improve the speech estimation accuracy from a noise mixture. However, to address the computational cost issue in Transformer with self-attention, the axial attention is the option i.e., to split a 2D attention into two 1D attentions. Inspired by the axial attention, in the proposed method we calculate the attention map along both time- and frequency-axis to generate time and frequency sub-attention maps. Moreover, different from the axial attention, the proposed method provides two parallel multi-head attentions for time- and frequency-axis. Furthermore, it is proven in the literature that the lower frequency-band in speech, generally, contains more desired information than the higher frequency-band, in a noise mixture. Therefore, the frequency-band aware attention is proposed i.e., high frequency-band attention (HFA), and low frequency-band attention (LFA). The U-shaped Transformer is also first time introduced in the proposed method to further improve the speech estimation accuracy. The extensive evaluations over four public datasets, confirm the efficacy of the proposed method.

READ FULL TEXT

page 2

page 3

page 7

research
09/24/2022

Speech Enhancement with Perceptually-motivated Optimization and Dual Transformations

To address the monaural speech enhancement problem, numerous research st...
research
02/11/2023

Local spectral attention for full-band speech enhancement

Attention mechanism has been widely utilized in speech enhancement (SE) ...
research
07/28/2023

PCNN: A Lightweight Parallel Conformer Neural Network for Efficient Monaural Speech Enhancement

Convolutional neural networks (CNN) and Transformer have wildly succeede...
research
12/09/2021

Noise-robust blind reverberation time estimation using noise-aware time-frequency masking

The reverberation time is one of the most important parameters used to c...
research
05/15/2023

Ripple sparse self-attention for monaural speech enhancement

The use of Transformer represents a recent success in speech enhancement...
research
08/17/2017

An instrumental intelligibility metric based on information theory

We propose a new monaural intrusive instrumental intelligibility metric ...
research
04/27/2021

DPT-FSNet:Dual-path Transformer Based Full-band and Sub-band Fusion Network for Speech Enhancement

Recently, dual-path networks have achieved promising performance due to ...

Please sign up or login with your details

Forgot password? Click here to reset