MomentumRNN: Integrating Momentum into Recurrent Neural Networks

06/12/2020
by   Tan M. Nguyen, et al.
62

Designing deep neural networks is an art that often involves an expensive search over candidate architectures. To overcome this for recurrent neural nets (RNNs), we establish a connection between the hidden state dynamics in an RNN and gradient descent (GD). We then integrate momentum into this framework and propose a new family of RNNs, called MomentumRNNs. We theoretically prove and numerically demonstrate that MomentumRNNs alleviate the vanishing gradient issue in training RNNs. We study the momentum long-short term memory (MomentumLSTM) and verify its advantages in convergence speed and accuracy over its LSTM counterpart across a variety of benchmarks, with little compromise in computational or memory efficiency. We also demonstrate that MomentumRNN is applicable to many types of recurrent cells, including those in the state-of-the-art orthogonal RNNs. Finally, we show that other advanced momentum-based optimization methods, such as Adam and Nesterov accelerated gradients with a restart, can be easily incorporated into the MomentumRNN framework for designing new recurrent cells with even better performance. The code is available at <https://github.com/minhtannguyen/MomentumRNN>.

READ FULL TEXT

page 5

page 10

research
10/05/2017

Dilated Recurrent Neural Networks

Learning with recurrent neural networks (RNNs) on long sequences is a no...
research
10/13/2021

How Does Momentum Benefit Deep Neural Networks Architecture Design? A Few Case Studies

We present and review an algorithmic and theoretical framework for impro...
research
04/24/2023

Adaptive-saturated RNN: Remember more with less instability

Orthogonal parameterization is a compelling solution to the vanishing gr...
research
02/07/2023

Hebbian and Gradient-based Plasticity Enables Robust Memory and Rapid Learning in RNNs

Rapidly learning from ongoing experiences and remembering past events wi...
research
07/30/2023

Deep Unrolling Networks with Recurrent Momentum Acceleration for Nonlinear Inverse Problems

Combining the strengths of model-based iterative algorithms and data-dri...
research
11/25/2019

Gating Revisited: Deep Multi-layer RNNs That Can Be Trained

We propose a new stackable recurrent cell (STAR) for recurrent neural ne...
research
11/24/2021

Hidden-Fold Networks: Random Recurrent Residuals Using Sparse Supermasks

Deep neural networks (DNNs) are so over-parametrized that recent researc...

Please sign up or login with your details

Forgot password? Click here to reset