Memory Optimization for Deep Networks

10/27/2020
by   Aashaka Shah, et al.
15

Deep learning is slowly, but steadily, hitting a memory bottleneck. While the tensor computation in top-of-the-line GPUs increased by 32x over the last five years, the total available memory only grew by 2.5x. This prevents researchers from exploring larger architectures, as training large networks requires more memory for storing intermediate outputs. In this paper, we present MONeT, an automatic framework that minimizes both the memory footprint and computational overhead of deep networks. MONeT jointly optimizes the checkpointing schedule and the implementation of various operators. MONeT is able to outperform all prior hand-tuned operations as well as automated checkpointing. MONeT reduces the overall memory requirement by 3x for various PyTorch models, with a 9-16 overhead in computation. For the same computation cost, MONeT requires 1.2-1.8x less memory than current state-of-the-art automated checkpointing frameworks. Our code is available at https://github.com/utsaslab/MONeT.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/29/2021

BitTrain: Sparse Bitmap Compression for Memory-Efficient Training on the Edge

Training on the Edge enables neural networks to learn continuously from ...
research
02/28/2022

DropIT: Dropping Intermediate Tensors for Memory-Efficient DNN Training

A standard hardware bottleneck when training deep neural networks is GPU...
research
01/12/2018

High-level python abstractions for optimal checkpointing in inversion problems

Inversion and PDE-constrained optimization problems often rely on solvin...
research
08/22/2016

Lets keep it simple, Using simple architectures to outperform deeper and more complex architectures

Major winning Convolutional Neural Networks (CNNs), such as AlexNet, VGG...
research
09/27/2022

OSDP: Optimal Sharded Data Parallel for Distributed Deep Learning

Large-scale deep learning models contribute to significant performance i...
research
02/06/2023

Colossal-Auto: Unified Automation of Parallelization and Activation Checkpoint for Large-scale Models

In recent years, large-scale models have demonstrated state-of-the-art p...
research
06/09/2023

Error Feedback Can Accurately Compress Preconditioners

Leveraging second-order information at the scale of deep networks is one...

Please sign up or login with your details

Forgot password? Click here to reset