SuperScaler: Supporting Flexible DNN Parallelization via a Unified Abstraction

01/21/2023
by   Zhiqi Lin, et al.
6

With the growing model size, deep neural networks (DNN) are increasingly trained over massive GPU accelerators, which demands a proper parallelization plan that transforms a DNN model into fine-grained tasks and then schedules them to GPUs for execution. Due to the large search space, the contemporary parallelization plan generators often rely on empirical rules that couple transformation and scheduling, and fall short in exploring more flexible schedules that yield better memory usage and compute efficiency. This tension can be exacerbated by the emerging models with increasing complexity in their structure and model size. SuperScaler is a system that facilitates the design and generation of highly flexible parallelization plans. It formulates the plan design and generation into three sequential phases explicitly: model transformation, space-time scheduling, and data dependency preserving. Such a principled approach decouples multiple seemingly intertwined factors and enables the composition of highly flexible parallelization plans. As a result, SuperScaler can not only generate empirical parallelization plans, but also construct new plans that achieve up to 3.5X speedup compared to state-of-the-art solutions like DeepSpeed, Megatron and Alpa, for emerging DNN models like Swin-Transformer and AlphaFold2, as well as well-optimized models like GPT-3.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/05/2018

Workload-aware Automatic Parallelization for Multi-GPU DNN Training

Deep neural networks (DNNs) have emerged as successful solutions for var...
research
06/04/2023

Proteus: Simulating the Performance of Distributed DNN Training

DNN models are becoming increasingly larger to achieve unprecedented acc...
research
07/08/2020

Auto-MAP: A DQN Framework for Exploring Distributed Execution Plans for DNN Workloads

The last decade has witnessed growth in the computational requirements f...
research
06/26/2011

AltAltp: Online Parallelization of Plans with Heuristic State Search

Despite their near dominance, heuristic state search planners still lag ...
research
12/21/2020

New plans orthogonal through the block factor

In the present paper we construct plans orthogonal through the block fac...
research
06/15/2022

Understanding and Optimizing Deep Learning Cold-Start Latency on Edge Devices

DNNs are ubiquitous on edge devices nowadays. With its increasing import...
research
04/20/2023

Scaling the leading accuracy of deep equivariant models to biomolecular simulations of realistic size

This work brings the leading accuracy, sample efficiency, and robustness...

Please sign up or login with your details

Forgot password? Click here to reset