Backpropagation for long sequences: beyond memory constraints with constant overheads

05/22/2018
by   Navjot Kukreja, et al.
0

Naive backpropagation through time has a memory footprint that grows linearly in the sequence length, due to the need to store each state of the forward propagation. This is a problem for large networks. Strategies have been developed to trade memory for added computations, which results in a sublinear growth of memory footprint or computation overhead. In this work, we present a library that uses asynchronous storing and prefetching to move data to and from slow and cheap stor- age. The library only stores and prefetches states as frequently as possible without delaying the computation, and uses the optimal Revolve backpropagation strategy for the computations in between. The memory footprint of the backpropagation can thus be reduced to any size (e.g. to fit into DRAM), while the computational overhead is constant in the sequence length, and only depends on the ratio between compute and transfer times on a given hardware. We show in experiments that by exploiting asyncronous data transfer, our strategy is always at least as fast, and usually faster than the previously studied "optimal" strategies.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/10/2016

Memory-Efficient Backpropagation Through Time

We propose a novel approach to reduce memory consumption of the backprop...
research
02/19/2021

Symplectic Adjoint Method for Exact Gradient of Neural ODE with Minimal Memory

A neural network model of a differential equation, namely neural ODE, ha...
research
09/18/2018

HDTCat: let's make HDT scale

HDT (Header, Dictionary, Triples) is a serialization for RDF. HDT has be...
research
05/28/2019

A Graph Theoretic Framework of Recomputation Algorithms for Memory-Efficient Backpropagation

Recomputation algorithms collectively refer to a family of methods that ...
research
02/01/2022

Few-Bit Backward: Quantized Gradients of Activation Functions for Memory Footprint Reduction

Memory footprint is one of the main limiting factors for large neural ne...
research
06/15/2023

PaReprop: Fast Parallelized Reversible Backpropagation

The growing size of datasets and deep learning models has made faster an...
research
04/13/2018

μ-cuDNN: Accelerating Deep Learning Frameworks with Micro-Batching

NVIDIA cuDNN is a low-level library that provides GPU kernels frequently...

Please sign up or login with your details

Forgot password? Click here to reset