Gating Revisited: Deep Multi-layer RNNs That Can Be Trained

11/25/2019
by   Mehmet Ozgur Turkoglu, et al.
1

We propose a new stackable recurrent cell (STAR) for recurrent neural networks (RNNs) that has significantly less parameters than widely used LSTM and GRU while being more robust against vanishing or exploding gradients. Stacking multiple layers of recurrent units has two major drawbacks: i) many recurrent cells (e.g., LSTM cells) are extremely eager in terms of parameters and computation resources, ii) deep RNNs are prone to vanishing or exploding gradients during training. We investigate the training of multi-layer RNNs and examine the magnitude of the gradients as they propagate through the network in the "vertical" direction. We show that, depending on the structure of the basic recurrent unit, the gradients are systematically attenuated or amplified. Based on our analysis we design a new type of gated cell that better preserves gradient magnitude. We validate our design on a large number of sequence modelling tasks and demonstrate that the proposed STAR cell allows to build and train deeper recurrent architectures, ultimately leading to improved performance while being computationally efficient.

READ FULL TEXT
research
02/22/2018

High Order Recurrent Neural Networks for Acoustic Modelling

Vanishing long-term gradients are a major issue in training standard rec...
research
08/12/2022

Orthogonal Gated Recurrent Unit with Neumann-Cayley Transformation

In recent years, using orthogonal matrices has been shown to be a promis...
research
06/11/2020

Recurrent Neural Networks for Handover Management in Next-Generation Self-Organized Networks

In this paper, we discuss a handover management scheme for Next Generati...
research
08/28/2018

Layer Trajectory LSTM

It is popular to stack LSTM layers to get better modeling power, especia...
research
06/12/2020

MomentumRNN: Integrating Momentum into Recurrent Neural Networks

Designing deep neural networks is an art that often involves an expensiv...
research
09/09/2019

An Adaptive Stochastic Nesterov Accelerated Quasi Newton Method for Training RNNs

A common problem in training neural networks is the vanishing and/or exp...
research
04/19/2020

MuBiNN: Multi-Level Binarized Recurrent Neural Network for EEG signal Classification

Recurrent Neural Networks (RNN) are widely used for learning sequences i...

Please sign up or login with your details

Forgot password? Click here to reset