Unitary Evolution Recurrent Neural Networks

11/20/2015
by   Martin Arjovsky, et al.
0

Recurrent neural networks (RNNs) are notoriously difficult to train. When the eigenvalues of the hidden to hidden weight matrix deviate from absolute value 1, optimization becomes difficult due to the well studied issue of vanishing and exploding gradients, especially when trying to learn long-term dependencies. To circumvent this problem, we propose a new architecture that learns a unitary weight matrix, with eigenvalues of absolute value exactly 1. The challenge we address is that of parametrizing unitary matrices in a way that does not require expensive computations (such as eigendecomposition) after each weight update. We construct an expressive unitary weight matrix by composing several structured matrices that act as building blocks with parameters to be learned. Optimization with this parameterization becomes feasible only when considering hidden states in the complex domain. We demonstrate the potential of this architecture by achieving state of the art results in several hard tasks involving very long-term dependencies.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/31/2017

On orthogonality and learning recurrent networks with long term dependencies

It is well known that it is challenging to train deep neural networks an...
research
07/29/2017

Orthogonal Recurrent Neural Networks with Scaled Cayley Transform

Recurrent Neural Networks (RNNs) are designed to handle sequential data ...
research
10/31/2016

Full-Capacity Unitary Recurrent Neural Networks

Recurrent neural networks are powerful models for processing sequential ...
research
04/03/2015

A Simple Way to Initialize Recurrent Networks of Rectified Linear Units

Learning long term dependencies in recurrent networks is difficult due t...
research
03/17/2018

Learning Long Term Dependencies via Fourier Recurrent Units

It is a known fact that training recurrent neural networks for tasks tha...
research
03/25/2018

Stabilizing Gradients for Deep Neural Networks via Efficient SVD Parameterization

Vanishing and exploding gradients are two of the main obstacles in train...
research
07/14/2020

Shuffling Recurrent Neural Networks

We propose a novel recurrent neural network model, where the hidden stat...

Please sign up or login with your details

Forgot password? Click here to reset