Multi-path RNN for hierarchical modeling of long sequential data and its application to speaker stream separation

06/24/2020
by   Keisuke Kinoshita, et al.
0

Recently, the source separation performance was greatly improved by time-domain audio source separation based on dual-path recurrent neural network (DPRNN). DPRNN is a simple but effective model for a long sequential data. While DPRNN is quite efficient in modeling a sequential data of the length of an utterance, i.e., about 5 to 10 second data, it is harder to apply it to longer sequences such as whole conversations consisting of multiple utterances. It is simply because, in such a case, the number of time steps consumed by its internal module called inter-chunk RNN becomes extremely large. To mitigate this problem, this paper proposes a multi-path RNN (MPRNN), a generalized version of DPRNN, that models the input data in a hierarchical manner. In the MPRNN framework, the input data is represented at several (>3) time-resolutions, each of which is modeled by a specific RNN sub-module. For example, the RNN sub-module that deals with the finest resolution may model temporal relationship only within a phoneme, while the RNN sub-module handling the most coarse resolution may capture only the relationship between utterances such as speaker information. We perform experiments using simulated dialogue-like mixtures and show that MPRNN has greater model capacity, and it outperforms the current state-of-the-art DPRNN framework especially in online processing scenarios.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/14/2019

Dual-path RNN: efficient long sequence modeling for time-domain single-channel speech separation

Recent studies in deep learning-based speech separation have proven the ...
research
08/24/2018

Multi-scenario deep learning for multi-speaker source separation

Research in deep learning for multi-speaker source separation has receiv...
research
09/01/2020

Analysis of memory in LSTM-RNNs for source separation

Long short-term memory recurrent neural networks (LSTM-RNNs) are conside...
research
08/06/2020

Speech Separation Based on Multi-Stage Elaborated Dual-Path Deep BiLSTM with Auxiliary Identity Loss

Deep neural network with dual-path bi-directional long short-term memory...
research
06/15/2022

On the Design and Training Strategies for RNN-based Online Neural Speech Separation Systems

While the performance of offline neural speech separation systems has be...
research
07/30/2021

Speeding Up Permutation Invariant Training for Source Separation

Permutation invariant training (PIT) is a widely used training criterion...

Please sign up or login with your details

Forgot password? Click here to reset