Quantifying the vanishing gradient and long distance dependency problem in recursive neural networks and recursive LSTMs

03/01/2016
by   Phong Le, et al.
0

Recursive neural networks (RNN) and their recently proposed extension recursive long short term memory networks (RLSTM) are models that compute representations for sentences, by recursively combining word embeddings according to an externally provided parse tree. Both models thus, unlike recurrent networks, explicitly make use of the hierarchical structure of a sentence. In this paper, we demonstrate that RNNs nevertheless suffer from the vanishing gradient and long distance dependency problem, and that RLSTMs greatly improve over RNN's on these problems. We present an artificial learning task that allows us to quantify the severity of these problems for both models. We further show that a ratio of gradients (at the root node and a focal leaf node) is highly indicative of the success of backpropagation at optimizing the relevant weights low in the tree. This paper thus provides an explanation for existing, superior results of RLSTMs on tasks such as sentiment analysis, and suggests that the benefits of including hierarchical structure and of including LSTM-style gating are complementary.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/09/2015

Compositional Distributional Semantics with Long Short Term Memory

We are proposing an extension of the recursive neural network that makes...
research
11/08/2016

Dependency Sensitive Convolutional Neural Networks for Modeling Sentences and Documents

The goal of sentence and document modeling is to accurately represent th...
research
01/18/2018

Overcoming the vanishing gradient problem in plain recurrent networks

Plain recurrent networks greatly suffer from the vanishing gradient prob...
research
05/09/2018

Long Short-Term Memory as a Dynamically Computed Element-wise Weighted Sum

LSTMs were introduced to combat vanishing gradients in simple RNNs by au...
research
04/20/2015

Self-Adaptive Hierarchical Sentence Model

The ability to accurately model a sentence at varying stages (e.g., word...
research
03/18/2019

Effects of padding on LSTMs and CNNs

Long Short-Term Memory (LSTM) Networks and Convolutional Neural Networks...

Please sign up or login with your details

Forgot password? Click here to reset