Toward the pre-cocktail party problem with TasTas+

09/07/2020
by   Anyan Shi, et al.
0

Deep neural network with dual-path bi-directional long short-term memory (BiLSTM) block has been proved to be very effective in sequence modeling, especially in speech separation, e.g. DPRNN-TasNet <cit.>, TasTas <cit.>. In this paper, we propose two improvements of TasTas <cit.> for end-to-end approach to monaural speech separation in pre-cocktail party problems, which consists of 1) generate new training data through the original training batch in real time, and 2) train each module in TasTas separately. The new approach is called TasTas+, which takes the mixed utterance of five speakers and map it to five separated utterances, where each utterance contains only one speaker's voice. For the objective, we train the network by directly optimizing the utterance level scale-invariant signal-to-distortion ratio (SI-SDR) in a permutation invariant training (PIT) style. Our experiments on the public WSJ0-5mix data corpus results in 11.14dB SDR improvement, which shows our proposed networks can lead to performance improvement on the speaker separation task. We have open-sourced our re-implementation of the DPRNN-TasNet in https://github.com/ShiZiqiang/dual-path-RNNs-DPRNNs-based-speech-separation, and our TasTas+ is realized based on this implementation of DPRNN-TasNet, it is believed that the results in this paper can be reproduced with ease.

READ FULL TEXT

page 3

page 5

research
01/23/2020

La Furca: Iterative Context-Aware End-to-End Monaural Speech Separation Based on Dual-Path Deep Parallel Inter-Intra Bi-LSTM with Attention

Deep neural network with dual-path bi-directional long short-term memory...
research
08/06/2020

Speech Separation Based on Multi-Stage Elaborated Dual-Path Deep BiLSTM with Auxiliary Identity Loss

Deep neural network with dual-path bi-directional long short-term memory...
research
02/02/2019

FurcaNet: An end-to-end deep gated convolutional, long short-term memory, deep neural networks for single channel speech separation

Deep gated convolutional networks have been proved to be very effective ...
research
02/12/2019

FurcaNeXt: End-to-end monaural speech separation with dynamic gated dilated temporal convolutional networks

Deep dilated temporal convolutional networks (TCN) have been proved to b...
research
08/31/2017

Joint Separation and Denoising of Noisy Multi-talker Speech using Recurrent Neural Networks and Permutation Invariant Training

In this paper we propose to use utterance-level Permutation Invariant Tr...
research
02/23/2021

Dual-Path Modeling for Long Recording Speech Separation in Meetings

The continuous speech separation (CSS) is a task to separate the speech ...
research
05/15/2020

Reverberation Modeling for Source-Filter-based Neural Vocoder

This paper presents a reverberation module for source-filter-based neura...

Please sign up or login with your details

Forgot password? Click here to reset