Stabilizing Gradients for Deep Neural Networks via Efficient SVD Parameterization

03/25/2018
by   Jiong Zhang, et al.
0

Vanishing and exploding gradients are two of the main obstacles in training deep neural networks, especially in capturing long range dependencies in recurrent neural networks (RNNs). In this paper, we present an efficient parametrization of the transition matrix of an RNN that allows us to stabilize the gradients that arise in its training. Specifically, we parameterize the transition matrix by its singular value decomposition(SVD), which allows us to explicitly track and control its singular values. We attain efficiency by using tools that are common in numerical linear algebra, namely Householder reflectors for representing the orthogonal matrices that arise in the SVD. By explicitly controlling the singular values, our proposed Spectral-RNN method allows us to easily solve the exploding gradient problem and we observe that it empirically solves the vanishing gradient issue to a large extent. We note that the SVD parameterization can be used for any rectangular weight matrix, hence it can be easily extended to any deep neural network, such as a multi-layer perceptron. Theoretically, we demonstrate that our parameterization does not lose any expressive power, and show how it controls generalization of RNN for the classification task. process easier. Our extensive experimental results also demonstrate that the proposed framework converges faster, and has good generalization, especially in capturing long range dependencies, as shown on the synthetic addition and copy tasks, as well as on MNIST and Penn Tree Bank data sets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/20/2020

Learning Low-rank Deep Neural Networks via Singular Vector Orthogonality Regularization and Singular Value Sparsification

Modern deep neural networks (DNNs) often require high memory consumption...
research
03/09/2021

UnICORNN: A recurrent model for learning very long time dependencies

The design of recurrent neural networks (RNNs) to accurately process seq...
research
05/11/2018

State Gradients for RNN Memory Analysis

We present a framework for analyzing what the state in RNNs remembers fr...
research
01/31/2017

On orthogonality and learning recurrent networks with long term dependencies

It is well known that it is challenging to train deep neural networks an...
research
11/20/2015

Unitary Evolution Recurrent Neural Networks

Recurrent neural networks (RNNs) are notoriously difficult to train. Whe...
research
03/17/2018

Learning Long Term Dependencies via Fourier Recurrent Units

It is a known fact that training recurrent neural networks for tasks tha...
research
10/03/2022

Random orthogonal additive filters: a solution to the vanishing/exploding gradient of deep neural networks

Since the recognition in the early nineties of the vanishing/exploding (...

Please sign up or login with your details

Forgot password? Click here to reset