Efficient Transformer-based Speech Enhancement Using Long Frames and STFT Magnitudes

06/23/2022
by   Danilo de Oliveira, et al.
0

The SepFormer architecture shows very good results in speech separation. Like other learned-encoder models, it uses short frames, as they have been shown to obtain better performance in these cases. This results in a large number of frames at the input, which is problematic; since the SepFormer is transformer-based, its computational complexity drastically increases with longer sequences. In this paper, we employ the SepFormer in a speech enhancement task and show that by replacing the learned-encoder features with a magnitude short-time Fourier transform (STFT) representation, we can use long frames without compromising perceptual enhancement performance. We obtained equivalent quality and intelligibility evaluation scores while reducing the number of operations by a factor of approximately 8 for a 10-second utterance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/20/2020

Efficient Trainable Front-Ends for Neural Speech Enhancement

Many neural speech enhancement and source separation systems operate in ...
research
05/11/2020

Online Monaural Speech Enhancement Using Delayed Subband LSTM

This paper proposes a delayed subband LSTM network for online monaural (...
research
03/31/2022

Speech Enhancement with Score-Based Generative Models in the Complex STFT Domain

Score-based generative models (SGMs) have recently shown impressive resu...
research
03/30/2022

Phase-Aware Deep Speech Enhancement: It's All About The Frame Length

While phase-aware speech processing has been receiving increasing attent...
research
05/15/2023

Ripple sparse self-attention for monaural speech enhancement

The use of Transformer represents a recent success in speech enhancement...
research
04/15/2022

Improving Frame-Online Neural Speech Enhancement with Overlapped-Frame Prediction

Frame-online speech enhancement systems in the short-time Fourier transf...
research
06/18/2020

Boosting Objective Scores of Speech Enhancement Model through MetricGAN Post-Processing

The Transformer architecture has shown its superior ability than recurre...

Please sign up or login with your details

Forgot password? Click here to reset