Optimal checkpointing for heterogeneous chains: how to train deep neural networks with limited memory

11/27/2019
by   Julien Herrmann, et al.
0

This paper introduces a new activation checkpointing method which allows to significantly decrease memory usage when training Deep Neural Networks with the back-propagation algorithm. Similarly to checkpoint-ing techniques coming from the literature on Automatic Differentiation, it consists in dynamically selecting the forward activations that are saved during the training phase, and then automatically recomputing missing activations from those previously recorded. We propose an original computation model that combines two types of activation savings: either only storing the layer inputs, or recording the complete history of operations that produced the outputs (this uses more memory, but requires fewer recomputations in the backward phase), and we provide an algorithm to compute the optimal computation sequence for this model. This paper also describes a PyTorch implementation that processes the entire chain, dealing with any sequential DNN whose internal layers may be arbitrarily complex and automatically executing it according to the optimal checkpointing strategy computed given a memory limit. Through extensive experiments, we show that our implementation consistently outperforms existing checkpoint-ing approaches for a large class of networks, image sizes and batch sizes.

READ FULL TEXT
research
10/24/2019

Reversible designs for extreme memory cost reduction of CNN training

Training Convolutional Neural Networks (CNN) is a resource intensive tas...
research
09/22/2022

Nesting Forward Automatic Differentiation for Memory-Efficient Deep Neural Network Training

An activation function is an element-wise mathematical function and play...
research
01/17/2022

Efficient DNN Training with Knowledge-Guided Layer Freezing

Training deep neural networks (DNNs) is time-consuming. While most exist...
research
07/20/2020

Improving Memory Utilization in Convolutional Neural Network Accelerators

While the accuracy of convolutional neural networks has achieved vast im...
research
11/22/2021

Mesa: A Memory-saving Training Framework for Transformers

There has been an explosion of interest in designing high-performance Tr...
research
05/19/2017

Espresso: Efficient Forward Propagation for BCNNs

There are many applications scenarios for which the computational perfor...
research
01/16/2018

Empirical Explorations in Training Networks with Discrete Activations

We present extensive experiments training and testing hidden units in de...

Please sign up or login with your details

Forgot password? Click here to reset