Analysis of memory in LSTM-RNNs for source separation

09/01/2020
by   Jeroen Zegers, et al.
0

Long short-term memory recurrent neural networks (LSTM-RNNs) are considered state-of-the art in many speech processing tasks. The recurrence in the network, in principle, allows any input to be remembered for an indefinite time, a feature very useful for sequential data like speech. However, very little is known about which information is actually stored in the LSTM and for how long. We address this problem by using a memory reset approach which allows us to evaluate network performance depending on the allowed memory time span. We apply this approach to the task of multi-speaker source separation, but it can be used for any task using RNNs. We find a strong performance effect of short-term (shorter than 100 milliseconds) linguistic processes. Only speaker characteristics are kept in the memory for longer than 400 milliseconds. Furthermore, we confirm that performance-wise it is sufficient to implement longer memory in deeper layers. Finally, in a bidirectional model, the backward models contributes slightly more to the separation performance than the forward model.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/24/2018

Memory Time Span in LSTMs for Multi-Speaker Source Separation

With deep learning approaches becoming state-of-the-art in many speech (...
research
12/19/2019

CNN-LSTM models for Multi-Speaker Source Separation using Bayesian Hyper Parameter Optimization

In recent years there have been many deep learning approaches towards th...
research
05/07/2018

MMDenseLSTM: An efficient combination of convolutional and recurrent neural networks for audio source separation

Deep neural networks have become an indispensable technique for audio so...
research
08/02/2016

RETURNN: The RWTH Extensible Training framework for Universal Recurrent Neural Networks

In this work we release our extensible and easily configurable neural ne...
research
06/24/2020

Multi-path RNN for hierarchical modeling of long sequential data and its application to speaker stream separation

Recently, the source separation performance was greatly improved by time...
research
06/13/2017

Modelling prosodic structure using Artificial Neural Networks

The ability to accurately perceive whether a speaker is asking a questio...
research
10/09/2020

Recurrent babbling: evaluating the acquisition of grammar from limited input data

Recurrent Neural Networks (RNNs) have been shown to capture various aspe...

Please sign up or login with your details

Forgot password? Click here to reset