Utterance-level Permutation Invariant Training with Latency-controlled BLSTM for Single-channel Multi-talker Speech Separation

12/25/2019
by   Lu Huang, et al.
0

Utterance-level permutation invariant training (uPIT) has achieved promising progress on single-channel multi-talker speech separation task. Long short-term memory (LSTM) and bidirectional LSTM (BLSTM) are widely used as the separation networks of uPIT, i.e. uPIT-LSTM and uPIT-BLSTM. uPIT-LSTM has lower latency but worse performance, while uPIT-BLSTM has better performance but higher latency. In this paper, we propose using latency-controlled BLSTM (LC-BLSTM) during inference to fulfill low-latency and good-performance speech separation. To find a better training strategy for BLSTM-based separation network, chunk-level PIT (cPIT) and uPIT are compared. The experimental results show that uPIT outperforms cPIT when LC-BLSTM is used during inference. It is also found that the inter-chunk speaker tracing (ST) can further improve the separation performance of uPIT-LC-BLSTM. Evaluated on the WSJ0 two-talker mixed-speech separation task, the absolute gap of signal-to-distortion ratio (SDR) between uPIT-BLSTM and uPIT-LC-BLSTM is reduced to within 0.7 dB.

READ FULL TEXT
research
03/26/2021

Guided Training: A Simple Method for Single-channel Speaker Separation

Deep learning has shown a great potential for speech separation, especia...
research
01/26/2022

SkiM: Skipping Memory LSTM for Low-Latency Real-Time Continuous Speech Separation

Continuous speech separation for meeting pre-processing has recently bec...
research
02/08/2023

Short-Term Memory Convolutions

The real-time processing of time series signals is a critical issue for ...
research
04/13/2019

Low-Latency Speaker-Independent Continuous Speech Separation

Speaker independent continuous speech separation (SI-CSS) is a task of c...
research
02/02/2019

Is CQT more suitable for monaural speech separation than STFT? an empirical study

Short-time Fourier transform (STFT) is used as the front end of many pop...
research
08/07/2023

Improving Deep Attractor Network by BGRU and GMM for Speech Separation

Deep Attractor Network (DANet) is the state-of-the-art technique in spee...
research
08/31/2017

Joint Separation and Denoising of Noisy Multi-talker Speech using Recurrent Neural Networks and Permutation Invariant Training

In this paper we propose to use utterance-level Permutation Invariant Tr...

Please sign up or login with your details

Forgot password? Click here to reset