A Computational-Graph Partitioning Method for Training Memory-Constrained DNNs

08/19/2020
by   Fareed Qararyah, et al.
0

We propose ParDNN, an automatic, generic, and non-intrusive partitioning strategy for large DNN models that do not fit into single device memory.ParDNN decides a placement of DNN's underlying computational graph operations across multiple devices so that the devices' memory constraints are met and the training time is minimized.ParDNN is completely independent of the deep learning aspects of a DNN and requires no modification neither at the model nor at the systems level implementation of operation kernels. It partitions DNNs having billions of parameters and hundreds of thousands of operations in seconds to a few minutes. Our experiments with TensorFlow on 16 GPUs demonstrate efficient training of 5 very large models while achieving super-linear scaling for both the batch size and training throughput. In comparison to related work (Mesh-TensorFlow and gradient Checkpointing), ParDNN either outperforms or qualitatively improves upon them.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/24/2018

Supporting Very Large Models using Automatic Dataflow Graph Partitioning

There is a trend towards using very large deep neural networks (DNN) to ...
research
07/05/2018

TFLMS: Large Model Support in TensorFlow by Graph Rewriting

While accelerators such as GPUs have limited memory, deep neural network...
research
10/21/2022

Partitioning and Placement of Deep Neural Networks on Distributed Edge Devices to Maximize Inference Throughput

Edge inference has become more widespread, as its diverse applications r...
research
06/29/2020

Efficient Algorithms for Device Placement of DNN Graph Operators

Modern machine learning workloads use large models, with complex structu...
research
04/24/2023

Exploring shared memory architectures for end-to-end gigapixel deep learning

Deep learning has made great strides in medical imaging, enabled by hard...
research
05/05/2022

dPRO: A Generic Profiling and Optimization System for Expediting Distributed DNN Training

Distributed training using multiple devices (e.g., GPUs) has been widely...
research
06/13/2017

Device Placement Optimization with Reinforcement Learning

The past few years have witnessed a growth in size and computational req...

Please sign up or login with your details

Forgot password? Click here to reset