DELTA: Dynamically Optimizing GPU Memory beyond Tensor Recomputation

03/30/2022
by   Yu Tang, et al.
0

The further development of deep neural networks is hampered by the limited GPU memory resource. Therefore, the optimization of GPU memory resources is highly demanded. Swapping and recomputation are commonly applied to make better use of GPU memory in deep learning. However, as an emerging domain, several challenges remain:1)The efficiency of recomputation is limited for both static and dynamic methods. 2)Swapping requires offloading parameters manually, which incurs a great time cost. 3) There is no such dynamic and fine-grained method that involves tensor swapping together with tensor recomputation nowadays. To remedy the above issues, we propose a novel scheduler manager named DELTA(Dynamic tEnsor offLoad and recompuTAtion). To the best of our knowledge, we are the first to make a reasonable dynamic runtime scheduler on the combination of tensor swapping and tensor recomputation without user oversight. In DELTA, we propose a filter algorithm to select the optimal tensors to be released out of GPU memory and present a director algorithm to select a proper action for each of these tensors. Furthermore, prefetching and overlapping are deliberately considered to overcome the time cost caused by swapping and recomputing tensors. Experimental results show that DELTA not only saves 40 but also gets comparable convergence results as the baseline with acceptable time delay. Also, DELTA gains 2.04× maximum batchsize when training ResNet-50 and 2.25× when training ResNet-101 compared with the baseline. Besides, comparisons between the swapping cost and recomputation cost in our experiments demonstrate the importance of making a reasonable dynamic scheduler on tensor swapping and tensor recomputation, which refutes the arguments in some related work that swapping should be the first and best choice.

READ FULL TEXT
research
09/06/2022

Mimose: An Input-Aware Checkpointing Planner for Efficient Training on GPU

Larger deep learning models usually lead to higher model quality with an...
research
10/07/2019

Checkmate: Breaking the Memory Wall with Optimal Tensor Rematerialization

Modern neural networks are increasingly bottlenecked by the limited capa...
research
06/17/2020

Dynamic Tensor Rematerialization

Checkpointing enables training deep learning models under restricted mem...
research
07/31/2018

Cutting Down Training Memory by Re-fowarding

Deep Neutral Networks(DNN) require huge GPU memory when training on mode...
research
02/28/2022

DropIT: Dropping Intermediate Tensors for Memory-Efficient DNN Training

A standard hardware bottleneck when training deep neural networks is GPU...
research
01/21/2019

AccUDNN: A GPU Memory Efficient Accelerator for Training Ultra-deep Deep Neural Networks

Typically, Ultra-deep neural network(UDNN) tends to yield high-quality m...
research
01/13/2018

SuperNeurons: Dynamic GPU Memory Management for Training Deep Neural Networks

Going deeper and wider in neural architectures improves the accuracy, wh...

Please sign up or login with your details

Forgot password? Click here to reset