TasNet: time-domain audio separation network for real-time, single-channel speech separation

11/01/2017
by   Yi Luo, et al.
0

Robust speech processing in multi-talker environments requires effective speech separation. Recent deep learning systems have made significant progress toward solving this problem, yet it remains challenging particularly in real-time, short latency applications. Most methods attempt to construct a mask for each source in time-frequency representation of the mixture signal which is not necessarily an optimal representation for speech separation. In addition, time-frequency decomposition results in inherent problems such as phase/magnitude decoupling and long time window which is required to achieve sufficient frequency resolution. We propose Time-domain Audio Separation Network (TasNet) to overcome these limitations. We directly model the signal in the time-domain using encoder-decoder framework and perform the source separation on nonnegative encoder outputs. This method removes the frequency decomposition step and reduces the separation problem to estimation of source masks on encoder outputs which is then synthesized by the decoder. Our system outperforms the current state-of-the-art causal speech separation algorithms, reduces the computational cost of speech separation, and significantly reduces the minimum required latency of the output. This makes TasNet suitable for applications where low-power, real-time implementation is desirable such as in hearable and telecommunication devices.

READ FULL TEXT
research
09/20/2018

TasNet: Surpassing Ideal Time-Frequency Masking for Speech Separation

Robust speech processing in multitalker acoustic environments requires a...
research
09/02/2017

A Recurrent Encoder-Decoder Approach with Skip-filtering Connections for Monaural Singing Voice Separation

The objective of deep learning methods based on encoder-decoder architec...
research
12/09/2019

MITAS: A Compressed Time-Domain Audio Separation Network with Parameter Sharing

Deep learning methods have brought substantial advancements in speech se...
research
11/16/2020

Block-Online Guided Source Separation

We propose a block-online algorithm of guided source separation (GSS). G...
research
10/28/2022

UX-NET: Filter-and-Process-based Improved U-Net for Real-time Time-domain Audio Separation

This study presents UX-Net, a time-domain audio separation network (TasN...
research
12/17/2019

A Unified Framework for Speech Separation

Speech separation refers to extracting each individual speech source in ...
research
09/30/2022

An efficient encoder-decoder architecture with top-down attention for speech separation

Deep neural networks have shown excellent prospects in speech separation...

Please sign up or login with your details

Forgot password? Click here to reset