Low-Memory Neural Network Training: A Technical Report

04/24/2019
by   Nimit Sharad Sohoni, et al.
0

Memory is increasingly often the bottleneck when training neural network models. Despite this, techniques to lower the overall memory requirements of training have been less widely studied compared to the extensive literature on reducing the memory requirements of inference. In this paper we study a fundamental question: How much memory is actually needed to train a neural network? To answer this question, we profile the overall memory usage of training on two representative deep learning benchmarks -- the WideResNet model for image classification and the DynamicConv Transformer model for machine translation -- and comprehensively evaluate four standard techniques for reducing the training memory requirements: (1) imposing sparsity on the model, (2) using low precision, (3) microbatching, and (4) gradient checkpointing. We explore how each of these techniques in isolation affects both the peak memory usage of training and the quality of the end model, and explore the memory, accuracy, and computation tradeoffs incurred when combining these techniques. Using appropriate combinations of these techniques, we show that it is possible to the reduce the memory required to train a WideResNet-28-2 on CIFAR-10 by up to 60.7x with a 0.4 a DynamicConv model on IWSLT'14 German to English translation by up to 8.7x with a BLEU score drop of 0.15.

READ FULL TEXT
research
04/23/2017

Neural Machine Translation via Binary Code Prediction

In this paper, we propose a new method for calculating the output layer ...
research
01/05/2016

Multi-Source Neural Translation

We build a multi-source machine translation model and train it to maximi...
research
05/03/2021

OpTorch: Optimized deep learning architectures for resource limited environments

Deep learning algorithms have made many breakthroughs and have various a...
research
09/25/2018

Fast and Simple Mixture of Softmaxes with BPE and Hybrid-LightRNN for Language Generation

Mixture of Softmaxes (MoS) has been shown to be effective at addressing ...
research
12/23/2020

Adaptive Precision Training for Resource Constrained Devices

Learn in-situ is a growing trend for Edge AI. Training deep neural netwo...
research
10/24/2022

OLLA: Decreasing the Memory Usage of Neural Networks by Optimizing the Lifetime and Location of Arrays

The size of deep neural networks has grown exponentially in recent years...
research
07/07/2023

Neural Abstraction-Based Controller Synthesis and Deployment

Abstraction-based techniques are an attractive approach for synthesizing...

Please sign up or login with your details

Forgot password? Click here to reset