DistIR: An Intermediate Representation and Simulator for Efficient Neural Network Distribution

by   Keshav Santhanam, et al.

The rapidly growing size of deep neural network (DNN) models and datasets has given rise to a variety of distribution strategies such as data, tensor-model, pipeline parallelism, and hybrid combinations thereof. Each of these strategies offers its own trade-offs and exhibits optimal performance across different models and hardware topologies. Selecting the best set of strategies for a given setup is challenging because the search space grows combinatorially, and debugging and testing on clusters is expensive. In this work we propose DistIR, an expressive intermediate representation for distributed DNN computation that is tailored for efficient analyses, such as simulation. This enables automatically identifying the top-performing strategies without having to execute on physical hardware. Unlike prior work, DistIR can naturally express many distribution strategies including pipeline parallelism with arbitrary schedules. Our evaluation on MLP training and GPT-2 inference models demonstrates how DistIR and its simulator enable fast grid searches over complex distribution spaces spanning up to 1000+ configurations, reducing optimization time by an order of magnitude for certain regimes.



There are no comments yet.


page 1

page 2

page 3

page 4


BaPipe: Exploration of Balanced Pipeline Parallelism for DNN Training

The size of deep neural networks (DNNs) grows rapidly as the complexity ...

Beyond Data and Model Parallelism for Deep Neural Networks

The computational requirements for training deep neural networks (DNNs) ...

An Oracle for Guiding Large-Scale Model/Hybrid Parallel Training of Convolutional Neural Networks

Deep Neural Network (DNN) frameworks use distributed training to enable ...

Auto-MAP: A DQN Framework for Exploring Distributed Execution Plans for DNN Workloads

The last decade has witnessed growth in the computational requirements f...

Automap: Towards Ergonomic Automated Parallelism for ML Models

The rapid rise in demand for training large neural network architectures...

Learning Intermediate Representations using Graph Neural Networks for NUMA and Prefetchers Optimization

There is a large space of NUMA and hardware prefetcher configurations th...

PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep Learning Clusters

DNN learning jobs are common in today's clusters due to the advances in ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.