dPRO: A Generic Profiling and Optimization System for Expediting Distributed DNN Training

05/05/2022
by   Hanpeng Hu, et al.
0

Distributed training using multiple devices (e.g., GPUs) has been widely adopted for learning DNN models over large datasets. However, the performance of large-scale distributed training tends to be far from linear speed-up in practice. Given the complexity of distributed systems, it is challenging to identify the root cause(s) of inefficiency and exercise effective performance optimizations when unexpected low training speed occurs. To date, there exists no software tool which diagnoses performance issues and helps expedite distributed DNN training, while the training can be run using different deep learning frameworks. This paper proposes dPRO, a toolkit that includes: (1) an efficient profiler that collects runtime traces of distributed DNN training across multiple frameworks, especially fine-grained communication traces, and constructs global data flow graphs including detailed communication operations for accurate replay; (2) an optimizer that effectively identifies performance bottlenecks and explores optimization strategies (from computation, communication, and memory aspects) for training acceleration. We implement dPRO on multiple deep learning frameworks (TensorFlow, MXNet) and representative communication schemes (AllReduce and Parameter Server). Extensive experiments show that dPRO predicts the performance of distributed training in various settings with < 5 up to 3.48x speed-up over the baselines.

READ FULL TEXT

page 9

page 10

research
03/06/2020

Communication Optimization Strategies for Distributed Deep Learning: A Survey

Recent trends in high-performance computing and deep learning lead to a ...
research
08/16/2020

Domain-specific Communication Optimization for Distributed DNN Training

Communication overhead poses an important obstacle to distributed DNN tr...
research
06/05/2020

Daydream: Accurately Estimating the Efficacy of Optimizations for DNN Training

Modern deep neural network (DNN) training jobs use complex and heterogen...
research
10/06/2020

Towards a Scalable and Distributed Infrastructure for Deep Learning Applications

Although recent scaling up approaches to train deep neural networks have...
research
08/10/2017

Distributed Training Large-Scale Deep Architectures

Scale of data and scale of computation infrastructures together enable t...
research
09/26/2022

Optimizing DNN Compilation for Distributed Training with Joint OP and Tensor Fusion

This paper proposes DisCo, an automatic deep learning compilation module...
research
08/19/2020

A Computational-Graph Partitioning Method for Training Memory-Constrained DNNs

We propose ParDNN, an automatic, generic, and non-intrusive partitioning...

Please sign up or login with your details

Forgot password? Click here to reset