How Does Momentum Benefit Deep Neural Networks Architecture Design? A Few Case Studies

10/13/2021
by   Bao Wang, et al.
18

We present and review an algorithmic and theoretical framework for improving neural network architecture design via momentum. As case studies, we consider how momentum can improve the architecture design for recurrent neural networks (RNNs), neural ordinary differential equations (ODEs), and transformers. We show that integrating momentum into neural network architectures has several remarkable theoretical and empirical benefits, including 1) integrating momentum into RNNs and neural ODEs can overcome the vanishing gradient issues in training RNNs and neural ODEs, resulting in effective learning long-term dependencies. 2) momentum in neural ODEs can reduce the stiffness of the ODE dynamics, which significantly enhances the computational efficiency in training and testing. 3) momentum can improve the efficiency and accuracy of transformers.

READ FULL TEXT

page 6

page 21

research
06/12/2020

MomentumRNN: Integrating Momentum into Recurrent Neural Networks

Designing deep neural networks is an art that often involves an expensiv...
research
08/01/2022

Momentum Transformer: Closing the Performance Gap Between Self-attention and Its Linearization

Transformers have achieved remarkable success in sequence modeling and b...
research
02/26/2019

AntisymmetricRNN: A Dynamical System View on Recurrent Neural Networks

Recurrent neural networks have gained widespread use in modeling sequent...
research
09/04/2023

Gated recurrent neural networks discover attention

Recent architectural developments have enabled recurrent neural networks...
research
10/10/2021

Heavy Ball Neural Ordinary Differential Equations

We propose heavy ball neural ordinary differential equations (HBNODEs), ...
research
02/15/2021

Momentum Residual Neural Networks

The training of deep residual neural networks (ResNets) with backpropaga...
research
01/12/2023

Universality of neural dynamics on complex networks

This paper discusses the capacity of graph neural networks to learn the ...

Please sign up or login with your details

Forgot password? Click here to reset