Adaptive-saturated RNN: Remember more with less instability

04/24/2023
by   Khoi Minh Nguyen-Duy, et al.
0

Orthogonal parameterization is a compelling solution to the vanishing gradient problem (VGP) in recurrent neural networks (RNNs). With orthogonal parameters and non-saturated activation functions, gradients in such models are constrained to unit norms. On the other hand, although the traditional vanilla RNNs are seen to have higher memory capacity, they suffer from the VGP and perform badly in many applications. This work proposes Adaptive-Saturated RNNs (asRNN), a variant that dynamically adjusts its saturation level between the two mentioned approaches. Consequently, asRNN enjoys both the capacity of a vanilla RNN and the training stability of orthogonal RNNs. Our experiments show encouraging results of asRNN on challenging sequence learning benchmarks compared to several strong competitors. The research code is accessible at https://github.com/ndminhkhoi46/asRNN/.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/12/2020

MomentumRNN: Integrating Momentum into Recurrent Neural Networks

Designing deep neural networks is an art that often involves an expensiv...
research
10/11/2019

Deep Independently Recurrent Neural Network (IndRNN)

Recurrent neural networks (RNNs) are known to be difficult to train due ...
research
10/05/2017

Dilated Recurrent Neural Networks

Learning with recurrent neural networks (RNNs) on long sequences is a no...
research
05/31/2019

Improved memory in recurrent neural networks with sequential non-normal dynamics

Training recurrent neural networks (RNNs) is a hard problem due to degen...
research
11/29/2016

Capacity and Trainability in Recurrent Neural Networks

Two potential bottlenecks on the expressiveness of recurrent neural netw...
research
09/09/2019

An Adaptive Stochastic Nesterov Accelerated Quasi Newton Method for Training RNNs

A common problem in training neural networks is the vanishing and/or exp...
research
05/28/2019

Non-normal Recurrent Neural Network (nnRNN): learning long time dependencies while improving expressivity with transient dynamics

A recent strategy to circumvent the exploding and vanishing gradient pro...

Please sign up or login with your details

Forgot password? Click here to reset