On TasNet for Low-Latency Single-Speaker Speech Enhancement

by   Morten Kolbæk, et al.

In recent years, speech processing algorithms have seen tremendous progress primarily due to the deep learning renaissance. This is especially true for speech separation where the time-domain audio separation network (TasNet) has led to significant improvements. However, for the related task of single-speaker speech enhancement, which is of obvious importance, it is yet unknown, if the TasNet architecture is equally successful. In this paper, we show that TasNet improves state-of-the-art also for speech enhancement, and that the largest gains are achieved for modulated noise sources such as speech. Furthermore, we show that TasNet learns an efficient inner-domain representation, where target and noise signal components are highly separable. This is especially true for noise in terms of interfering speech signals, which might explain why TasNet performs so well on the separation task. Additionally, we show that TasNet performs poorly for large frame hops and conjecture that aliasing might be the main cause of this performance drop. Finally, we show that TasNet consistently outperforms a state-of-the-art single-speaker speech enhancement system.


page 1

page 2

page 3

page 4


Single-Microphone Speech Enhancement and Separation Using Deep Learning

The cocktail party problem comprises the challenging task of understandi...

Universal Speech Enhancement with Score-based Diffusion

Removing background noise from speech audio has been the subject of cons...

High Fidelity Speech Regeneration with Application to Speech Enhancement

Speech enhancement has seen great improvement in recent years mainly thr...

RadioSES: mmWave-Based Audioradio Speech Enhancement and Separation System

Speech enhancement and separation have been a long-standing problem, esp...

Exploring Tradeoffs in Models for Low-latency Speech Enhancement

We explore a variety of neural networks configurations for one- and two-...

Cognitive-driven convolutional beamforming using EEG-based auditory attention decoding

The performance of speech enhancement algorithms in a multi-speaker scen...

Utterance-Wise Meeting Transcription System Using Asynchronous Distributed Microphones

A novel framework for meeting transcription using asynchronous microphon...