Decoupled Model Schedule for Deep Learning Training

02/16/2023
by   Hongzheng Chen, et al.
0

Recent years have seen an increase in the development of large deep learning (DL) models, which makes training efficiency crucial. Common practice is struggling with the trade-off between usability and performance. On one hand, DL frameworks such as PyTorch use dynamic graphs to facilitate model developers at a price of sub-optimal model training performance. On the other hand, practitioners propose various approaches to improving the training efficiency by sacrificing some of the flexibility, ranging from making the graph static for more thorough optimization (e.g., XLA) to customizing optimization towards large-scale distributed training (e.g., DeepSpeed and Megatron-LM). In this paper, we aim to address the tension between usability and training efficiency through separation of concerns. Inspired by DL compilers that decouple the platform-specific optimizations of a tensor-level operator from its arithmetic definition, this paper proposes a schedule language to decouple model execution from definition. Specifically, the schedule works on a PyTorch model and uses a set of schedule primitives to convert the model for common model training optimizations such as high-performance kernels, effective 3D parallelism, and efficient activation checkpointing. Compared to existing optimization solutions, we optimize the model as-needed through high-level primitives, and thus preserving programmability and debuggability for users to a large extent. Our evaluation results show that by scheduling the existing hand-crafted optimizations in a systematic way, we are able to improve training throughput by up to 3.35x on a single machine with 8 NVIDIA V100 GPUs, and by up to 1.32x on multiple machines with up to 64 GPUs, when compared to the out-of-the-box performance of DeepSpeed and Megatron-LM.

READ FULL TEXT

page 3

page 4

page 10

page 11

research
03/08/2023

RAF: Holistic Compilation for Deep Learning Model Training

As deep learning is pervasive in modern applications, many deep learning...
research
10/16/2021

Hydra: A System for Large Multi-Model Deep Learning

In many deep learning (DL) applications, the desire for ever higher accu...
research
05/02/2018

GraphIt - A High-Performance DSL for Graph Analytics

The performance bottlenecks of graph applications depend not only on the...
research
05/13/2023

TIPS: Topologically Important Path Sampling for Anytime Neural Networks

Anytime neural networks (AnytimeNNs) are a promising solution to adaptiv...
research
06/04/2020

Nimble: Efficiently Compiling Dynamic Neural Networks for Model Inference

Modern deep neural networks increasingly make use of features such as dy...
research
09/03/2023

Saturn: An Optimized Data System for Large Model Deep Learning Workloads

Large language models such as GPT-3 ChatGPT have transformed deep le...
research
08/09/2023

Understanding Auto-Scheduling Optimizations for Model Deployment via Visualizations

After completing the design and training phases, deploying a deep learni...

Please sign up or login with your details

Forgot password? Click here to reset