FurcaNeXt: End-to-end monaural speech separation with dynamic gated dilated temporal convolutional networks

02/12/2019
by   Ziqiang Shi, et al.
0

Deep dilated temporal convolutional networks (TCN) have been proved to be very effective in sequence modeling. In this paper we propose several improvements of TCN for end-to-end approach to monaural speech separation, which consists of 1) multi-scale dynamic weighted gated dilated convolutional pyramids network (FurcaPy), 2) gated TCN with intra-parallel convolutional components (FurcaPa), 3) weight-shared multi-scale gated TCN (FurcaSh), 4) dilated TCN with gated difference-convolutional component (FurcaSu), that all these networks take the mixed utterance of two speakers and maps it to two separated utterances, where each utterance contains only one speaker's voice. For the objective, we propose to train the network by directly optimizing utterance level signal-to-distortion ratio (SDR) in a permutation invariant training (PIT) style. Our experiments on the the public WSJ0-2mix data corpus results in 18.1dB SDR improvement, which shows our proposed networks can leads to performance improvement on the speaker separation task.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/02/2019

FurcaNet: An end-to-end deep gated convolutional, long short-term memory, deep neural networks for single channel speech separation

Deep gated convolutional networks have been proved to be very effective ...
research
01/23/2020

La Furca: Iterative Context-Aware End-to-End Monaural Speech Separation Based on Dual-Path Deep Parallel Inter-Intra Bi-LSTM with Attention

Deep neural network with dual-path bi-directional long short-term memory...
research
09/07/2020

Toward the pre-cocktail party problem with TasTas+

Deep neural network with dual-path bi-directional long short-term memory...
research
08/06/2020

Speech Separation Based on Multi-Stage Elaborated Dual-Path Deep BiLSTM with Auxiliary Identity Loss

Deep neural network with dual-path bi-directional long short-term memory...
research
05/17/2022

Utterance Weighted Multi-Dilation Temporal Convolutional Networks for Monaural Speech Dereverberation

Speech dereverberation is an important stage in many speech technology a...
research
04/16/2020

Speech Paralinguistic Approach for Detecting Dementia Using Gated Convolutional Neural Network

We propose a non-invasive and cost-effective method to automatically det...
research
10/27/2022

Deformable Temporal Convolutional Networks for Monaural Noisy Reverberant Speech Separation

Speech separation models are used for isolating individual speakers in m...

Please sign up or login with your details

Forgot password? Click here to reset