On Using Transformers for Speech-Separation

02/06/2022
by   Cem Subakan, et al.
24

Transformers have enabled major improvements in deep learning. They often outperform recurrent and convolutional models in many tasks while taking advantage of parallel processing. Recently, we have proposed SepFormer, which uses self-attention and obtains state-of-the art results on WSJ0-2/3 Mix datasets for speech separation. In this paper, we extend our previous work by providing results on more datasets including LibriMix, and WHAM!, WHAMR! which include noisy and noisy-reverberant conditions. Moreover we provide denoising, and denoising+dereverberation results in the context of speech enhancement, respectively on WHAM! and WHAMR! datasets. We also investigate incorporating recently proposed efficient self-attention mechanisms inside the SepFormer model, and show that by using efficient self-attention mechanisms it is possible to reduce the memory requirements significantly while performing better than the popular convtasnet model on WSJ0-2Mix dataset.

READ FULL TEXT
research
08/04/2023

Efficient Monaural Speech Enhancement using Spectrum Attention Fusion

Speech enhancement is a demanding task in automated speech processing pi...
research
06/19/2022

Resource-Efficient Separation Transformer

Transformers have recently achieved state-of-the-art performance in spee...
research
02/03/2023

PSST! Prosodic Speech Segmentation with Transformers

Self-attention mechanisms have enabled transformers to achieve superhuma...
research
10/13/2019

Transformer with Gaussian weighted self-attention for speech enhancement

The Transformer architecture recently replaced recurrent neural networks...
research
10/30/2021

Cross-attention conformer for context modeling in speech enhancement for ASR

This work introduces cross-attention conformer, an attention-based archi...
research
10/14/2021

Attention-Free Keyword Spotting

Till now, attention-based models have been used with great success in th...
research
02/15/2022

Speech Denoising in the Waveform Domain with Self-Attention

In this work, we present CleanUNet, a causal speech denoising model on t...

Please sign up or login with your details

Forgot password? Click here to reset