Auto-Parallelizing Large Models with Rhino: A Systematic Approach on Production AI Platform

02/16/2023
by   Shiwei Zhang, et al.
0

We present Rhino, a system for accelerating tensor programs with automatic parallelization on AI platform for real production environment. It transforms a tensor program written for a single device into an equivalent distributed program that is capable of scaling up to thousands of devices with no user configuration. Rhino firstly works on a semantically independent intermediate representation of tensor programs, which facilitates its generalization to unprecedented applications. Additionally, it implements a task-oriented controller and a distributed runtime for optimal performance. Rhino explores on a complete and systematic parallelization strategy space that comprises all the paradigms commonly employed in deep learning (DL), in addition to strided partitioning and pipeline parallelism on non-linear models. Aiming to efficiently search for a near-optimal parallel execution plan, our analysis of production clusters reveals general heuristics to speed up the strategy search. On top of it, two optimization levels are designed to offer users flexible trade-offs between the search time and strategy quality. Our experiments demonstrate that Rhino can not only re-discover the expert-crafted strategies of classic, research and production DL models, but also identify novel parallelization strategies which surpass existing systems for novel models.

READ FULL TEXT
research
04/16/2020

TensorOpt: Exploring the Tradeoffs in Distributed DNN Training with Auto-Parallelism

A good parallelization strategy can significantly improve the efficiency...
research
05/10/2021

GSPMD: General and Scalable Parallelization for ML Computation Graphs

We present GSPMD, an automatic, compiler-based parallelization system fo...
research
01/05/2021

Equality Saturation for Tensor Graph Superoptimization

One of the major optimizations employed in deep learning frameworks is g...
research
01/18/2019

On-line Application Autotuning Exploiting Ensemble Models

Application autotuning is a promising path investigated in literature to...
research
09/26/2022

Optimizing DNN Compilation for Distributed Training with Joint OP and Tensor Fusion

This paper proposes DisCo, an automatic deep learning compilation module...
research
06/02/2021

Optimization of Heterogeneous Systems with AI Planning Heuristics and Machine Learning: A Performance and Energy Aware Approach

Heterogeneous computing systems provide high performance and energy effi...
research
10/07/2022

Automatic Discovery of Composite SPMD Partitioning Strategies in PartIR

Large neural network models are commonly trained through a combination o...

Please sign up or login with your details

Forgot password? Click here to reset