La Furca: Iterative Context-Aware End-to-End Monaural Speech Separation Based on Dual-Path Deep Parallel Inter-Intra Bi-LSTM with Attention

01/23/2020
by   Ziqiang Shi, et al.
0

Deep neural network with dual-path bi-directional long short-term memory (BiLSTM) block has been proved to be very effective in sequence modeling, especially in speech separation, e.g. DPRNN-TasNet <cit.>. In this paper, we propose several improvements of dual-path BiLSTM based network for end-to-end approach to monaural speech separation, which consists of 1) dual-path network with intra-parallel BiLSTM and inter-parallel BiLSTM components, 2) global context aware inter-intra cross-parallel BiLSTM, 3) local context-aware network with attention BiLSTM, 4) multiple spiral iterative refinement dual-path BiLSTM, that all these networks take the mixed utterance of two speakers and map it to two separated utterances, where each utterance contains only one speaker's voice. For the objective, we propose to train the network by directly optimizing utterance level signal-to-distortion ratio (SDR) in a permutation invariant training (PIT) style. Our experiments on the public WSJ0-2mix data corpus results in 19.86dB SDR improvement, which shows our proposed networks can lead to performance improvement on the speaker separation task. We have open-sourced our re-implementation of the DPRNN-TasNet in https://github.com/ShiZiqiang/dual-path-RNNs-DPRNNs-based-speech-separation, and our `La Furca' is realized based on this implementation of DPRNN-TasNet, it is believed that the results in this paper can be smoothly reproduced.

READ FULL TEXT

page 3

page 4

page 5

page 8

research
09/07/2020

Toward the pre-cocktail party problem with TasTas+

Deep neural network with dual-path bi-directional long short-term memory...
research
07/28/2020

Dual-Path Transformer Network: Direct Context-Aware Modeling for End-to-End Monaural Speech Separation

The dominant speech separation models are based on complex recurrent or ...
research
02/12/2019

FurcaNeXt: End-to-end monaural speech separation with dynamic gated dilated temporal convolutional networks

Deep dilated temporal convolutional networks (TCN) have been proved to b...
research
12/14/2022

Multi-Scale Feature Fusion Transformer Network for End-to-End Single Channel Speech Separation

Recently studies on time-domain audio separation networks (TasNets) have...
research
08/06/2020

Speech Separation Based on Multi-Stage Elaborated Dual-Path Deep BiLSTM with Auxiliary Identity Loss

Deep neural network with dual-path bi-directional long short-term memory...
research
02/02/2019

FurcaNet: An end-to-end deep gated convolutional, long short-term memory, deep neural networks for single channel speech separation

Deep gated convolutional networks have been proved to be very effective ...
research
02/23/2021

Dual-Path Modeling for Long Recording Speech Separation in Meetings

The continuous speech separation (CSS) is a task to separate the speech ...

Please sign up or login with your details

Forgot password? Click here to reset