Overcoming the vanishing gradient problem in plain recurrent networks

01/18/2018
by   Yuhuang Hu, et al.
0

Plain recurrent networks greatly suffer from the vanishing gradient problem while Gated Neural Networks (GNNs) such as Long-short Term Memory (LSTM) and Gated Recurrent Unit (GRU) deliver promising results in many sequence learning tasks through sophisticated network designs. This paper shows how we can address this problem in a plain recurrent network by analyzing the gating mechanisms in GNNs. We propose a novel network called the Recurrent Identity Network (RIN) which allows a plain recurrent network to overcome the vanishing gradient problem while training very deep models without the use of gates. We compare this model with IRNNs and LSTMs on multiple sequence modeling benchmarks. The RINs demonstrate competitive performance and converge faster in all tasks. Notably, small RIN models produce 12 Sequential and Permuted MNIST datasets and reach state-of-the-art performance on the bAbI question answering dataset.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/12/2022

Orthogonal Gated Recurrent Unit with Neumann-Cayley Transformation

In recent years, using orthogonal matrices has been shown to be a promis...
research
11/21/2017

Cross Temporal Recurrent Networks for Ranking Question Answer Pairs

Temporal gates play a significant role in modern recurrent-based neural ...
research
05/09/2018

Long Short-Term Memory as a Dynamically Computed Element-wise Weighted Sum

LSTMs were introduced to combat vanishing gradients in simple RNNs by au...
research
03/01/2016

Quantifying the vanishing gradient and long distance dependency problem in recursive neural networks and recursive LSTMs

Recursive neural networks (RNN) and their recently proposed extension re...
research
05/30/2019

A Lightweight Recurrent Network for Sequence Modeling

Recurrent networks have achieved great success on various sequential tas...
research
03/01/2017

The Statistical Recurrent Unit

Sophisticated gated recurrent neural network architectures like LSTMs an...
research
09/12/2017

RRA: Recurrent Residual Attention for Sequence Learning

In this paper, we propose a recurrent neural network (RNN) with residual...

Please sign up or login with your details

Forgot password? Click here to reset