Reversible Recurrent Neural Networks

10/25/2018
by   Matthew MacKay, et al.
6

Recurrent neural networks (RNNs) provide state-of-the-art performance in processing sequential data but are memory intensive to train, limiting the flexibility of RNN models which can be trained. Reversible RNNs---RNNs for which the hidden-to-hidden transition can be reversed---offer a path to reduce the memory requirements of training, as hidden states need not be stored and instead can be recomputed during backpropagation. We first show that perfectly reversible RNNs, which require no storage of the hidden activations, are fundamentally limited because they cannot forget information from their hidden state. We then provide a scheme for storing a small number of bits in order to allow perfect reversal with forgetting. Our method achieves comparable performance to traditional models while reducing the activation memory cost by a factor of 10--15. We extend our technique to attention-based sequence-to-sequence models, where it maintains performance while reducing activation memory cost by a factor of 5--10 in the encoder, and a factor of 10--15 in the decoder.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/24/2019

Reversible designs for extreme memory cost reduction of CNN training

Training Convolutional Neural Networks (CNN) is a resource intensive tas...
research
05/22/2018

State-Denoised Recurrent Neural Networks

Recurrent neural networks (RNNs) are difficult to train on sequence proc...
research
05/27/2021

Efficient and Accurate Gradients for Neural SDEs

Neural SDEs combine many of the best qualities of both RNNs and SDEs, an...
research
02/18/2020

Assessing the Memory Ability of Recurrent Neural Networks

It is known that Recurrent Neural Networks (RNNs) can remember, in their...
research
12/02/2019

Long Distance Relationships without Time Travel: Boosting the Performance of a Sparse Predictive Autoencoder in Sequence Modeling

In sequence learning tasks such as language modelling, Recurrent Neural ...
research
06/15/2023

PaReprop: Fast Parallelized Reversible Backpropagation

The growing size of datasets and deep learning models has made faster an...
research
05/30/2023

Inverse Approximation Theory for Nonlinear Recurrent Neural Networks

We prove an inverse approximation theorem for the approximation of nonli...

Please sign up or login with your details

Forgot password? Click here to reset