Harmony: Overcoming the hurdles of GPU memory capacity to train massive DNN models on commodity servers

02/02/2022
by   Youjie Li, et al.
0

Deep neural networks (DNNs) have grown exponentially in complexity and size over the past decade, leaving only those who have access to massive datacenter-based resources with the ability to develop and train such models. One of the main challenges for the long tail of researchers who might have access to only limited resources (e.g., a single multi-GPU server) is limited GPU memory capacity compared to model size. The problem is so acute that the memory requirement of training large DNN models can often exceed the aggregate capacity of all available GPUs on commodity servers; this problem only gets worse with the trend of ever-growing model sizes. Current solutions that rely on virtualizing GPU memory (by swapping to/from CPU memory) incur excessive swapping overhead. In this paper, we present a new training framework, Harmony, and advocate rethinking how DNN frameworks schedule computation and move data to push the boundaries of training large models efficiently on modest multi-GPU deployments. Across many large DNN models, Harmony is able to reduce swap load by up to two orders of magnitude and obtain a training throughput speedup of up to 7.6x over highly optimized baselines with virtualized memory.

READ FULL TEXT

page 4

page 6

page 9

page 10

page 20

research
02/25/2016

vDNN: Virtualized Deep Neural Networks for Scalable, Memory-Efficient Neural Network Design

The most widely used machine learning frameworks require users to carefu...
research
11/29/2018

Data-parallel distributed training of very large models beyond GPU capacity

GPUs have limited memory and it is difficult to train wide and/or deep m...
research
07/24/2018

Supporting Very Large Models using Automatic Dataflow Graph Partitioning

There is a trend towards using very large deep neural networks (DNN) to ...
research
07/11/2019

Profiling based Out-of-core Hybrid Method for Large Neural Networks

GPUs are widely used to accelerate deep learning with NNs (NNs). On the ...
research
01/18/2022

An efficient and flexible inference system for serving heterogeneous ensembles of deep neural networks

Ensembles of Deep Neural Networks (DNNs) has achieved qualitative predic...
research
01/21/2019

AccUDNN: A GPU Memory Efficient Accelerator for Training Ultra-deep Deep Neural Networks

Typically, Ultra-deep neural network(UDNN) tends to yield high-quality m...
research
09/28/2020

Accelerating Multi-Model Inference by Merging DNNs of Different Weights

Standardized DNN models that have been proved to perform well on machine...

Please sign up or login with your details

Forgot password? Click here to reset