Reversible designs for extreme memory cost reduction of CNN training

10/24/2019
by   Tristan Hascoet, et al.
16

Training Convolutional Neural Networks (CNN) is a resource intensive task that requires specialized hardware for efficient computation. One of the most limiting bottleneck of CNN training is the memory cost associated with storing the activation values of hidden layers needed for the computation of the weights gradient during the backward pass of the backpropagation algorithm. Recently, reversible architectures have been proposed to reduce the memory cost of training large CNN by reconstructing the input activation values of hidden layers from their output during the backward pass, circumventing the need to accumulate these activations in memory during the forward pass. In this paper, we push this idea to the extreme and analyze reversible network designs yielding minimal training memory footprint. We investigate the propagation of numerical errors in long chains of invertible operations and analyze their effect on training. We introduce the notion of pixel-wise memory cost to characterize the memory footprint of model training, and propose a new model architecture able to efficiently train arbitrarily deep neural networks with a minimum memory cost of 352 bytes per input pixel. This new kind of architecture enables training large neural networks on very limited memory, opening the door for neural network training on embedded devices or non-specialized hardware. For instance, we demonstrate training of our model to 93.3 CIFAR10 dataset within 67 minutes on a low-end Nvidia GTX750 GPU with only 1GB of memory.

READ FULL TEXT

page 16

page 17

page 18

page 19

research
05/24/2019

Fully Hyperbolic Convolutional Neural Networks

Convolutional Neural Networks (CNN) have recently seen tremendous succes...
research
08/16/2023

Towards Zero Memory Footprint Spiking Neural Network Training

Biologically-inspired Spiking Neural Networks (SNNs), processing informa...
research
10/25/2018

Reversible Recurrent Neural Networks

Recurrent neural networks (RNNs) provide state-of-the-art performance in...
research
11/27/2019

Optimal checkpointing for heterogeneous chains: how to train deep neural networks with limited memory

This paper introduces a new activation checkpointing method which allows...
research
06/15/2023

PaReprop: Fast Parallelized Reversible Backpropagation

The growing size of datasets and deep learning models has made faster an...
research
12/05/2022

MobileTL: On-device Transfer Learning with Inverted Residual Blocks

Transfer learning on edge is challenging due to on-device limited resour...
research
11/08/2016

A backward pass through a CNN using a generative model of its activations

Neural networks have shown to be a practical way of building a very comp...

Please sign up or login with your details

Forgot password? Click here to reset