A Simple Way to Initialize Recurrent Networks of Rectified Linear Units

by   Quoc V. Le, et al.

Learning long term dependencies in recurrent networks is difficult due to vanishing and exploding gradients. To overcome this difficulty, researchers have developed sophisticated optimization techniques and network architectures. In this paper, we propose a simpler solution that use recurrent neural networks composed of rectified linear units. Key to our solution is the use of the identity matrix or its scaled version to initialize the recurrent weight matrix. We find that our solution is comparable to LSTM on our four benchmarks: two toy problems involving long-range temporal structures, a large language modeling problem and a benchmark speech recognition problem.


page 1

page 2

page 3

page 4


Improving performance of recurrent neural network with relu nonlinearity

In recent years significant progress has been made in successfully train...

Learning Longer Memory in Recurrent Neural Networks

Recurrent neural network is a powerful model that learns temporal patter...

Do Neural Networks for Segmentation Understand Insideness?

The insideness problem is an aspect of image segmentation that consists ...

A Lightweight Recurrent Network for Sequence Modeling

Recurrent networks have achieved great success on various sequential tas...

Learning Long Term Dependencies via Fourier Recurrent Units

It is a known fact that training recurrent neural networks for tasks tha...

On orthogonality and learning recurrent networks with long term dependencies

It is well known that it is challenging to train deep neural networks an...

Training RNNs as Fast as CNNs

Common recurrent neural network architectures scale poorly due to the in...

Code Repositories