MomentumRNN: Integrating Momentum into Recurrent Neural Networks

06/12/2020

∙

Designing deep neural networks is an art that often involves an expensive search over candidate architectures. To overcome this for recurrent neural nets (RNNs), we establish a connection between the hidden state dynamics in an RNN and gradient descent (GD). We then integrate momentum into this framework and propose a new family of RNNs, called MomentumRNNs. We theoretically prove and numerically demonstrate that MomentumRNNs alleviate the vanishing gradient issue in training RNNs. We study the momentum long-short term memory (MomentumLSTM) and verify its advantages in convergence speed and accuracy over its LSTM counterpart across a variety of benchmarks, with little compromise in computational or memory efficiency. We also demonstrate that MomentumRNN is applicable to many types of recurrent cells, including those in the state-of-the-art orthogonal RNNs. Finally, we show that other advanced momentum-based optimization methods, such as Adam and Nesterov accelerated gradients with a restart, can be easily incorporated into the MomentumRNN framework for designing new recurrent cells with even better performance. The code is available at <https://github.com/minhtannguyen/MomentumRNN>.

READ FULL TEXT

MomentumRNN: Integrating Momentum into Recurrent Neural Networks

Dilated Recurrent Neural Networks

How Does Momentum Benefit Deep Neural Networks Architecture Design? A Few Case Studies

Adaptive-saturated RNN: Remember more with less instability

Hebbian and Gradient-based Plasticity Enables Robust Memory and Rapid Learning in RNNs

Deep Unrolling Networks with Recurrent Momentum Acceleration for Nonlinear Inverse Problems

Gating Revisited: Deep Multi-layer RNNs That Can Be Trained

Hidden-Fold Networks: Random Recurrent Residuals Using Sparse Supermasks

MomentumRNN: Integrating Momentum into Recurrent Neural Networks

Related Research

Dilated Recurrent Neural Networks

How Does Momentum Benefit Deep Neural Networks Architecture Design? A Few Case Studies

Adaptive-saturated RNN: Remember more with less instability

Hebbian and Gradient-based Plasticity Enables Robust Memory and Rapid Learning in RNNs

Deep Unrolling Networks with Recurrent Momentum Acceleration for Nonlinear Inverse Problems

Gating Revisited: Deep Multi-layer RNNs That Can Be Trained

Hidden-Fold Networks: Random Recurrent Residuals Using Sparse Supermasks