TAP: Accelerating Large-Scale DNN Training Through Tensor Automatic Parallelisation

02/01/2023
by   Ziji Shi, et al.
0

Model parallelism has become necessary to train large neural networks. However, finding a suitable model parallel schedule for an arbitrary neural network is a non-trivial task due to the exploding search space. In this work, we present a model parallelism framework TAP that automatically searches for the best data and tensor parallel schedules. Leveraging the key insight that a neural network can be represented as a directed acyclic graph, within which may only exist a limited set of frequent subgraphs, we design a graph pruning algorithm to fold the search space efficiently. TAP runs at sub-linear complexity concerning the neural network size. Experiments show that TAP is 20×- 160× faster than the state-of-the-art automatic parallelism framework, and the performance of its discovered schedules is competitive with the expert-engineered ones.

READ FULL TEXT
research
01/20/2023

ATP: Adaptive Tensor Parallelism for Foundation Models

Foundation models have impressive performance and generalization capabil...
research
11/25/2022

Galvatron: Efficient Transformer Training over Multiple GPUs Using Automatic Parallelism

Transformer models have achieved state-of-the-art performance on various...
research
03/11/2023

A Hybrid Tensor-Expert-Data Parallelism Approach to Optimize Mixture-of-Experts Training

Mixture-of-Experts (MoE) is a neural network architecture that adds spar...
research
11/09/2021

DistIR: An Intermediate Representation and Simulator for Efficient Neural Network Distribution

The rapidly growing size of deep neural network (DNN) models and dataset...
research
07/05/2023

Improving Automatic Parallel Training via Balanced Memory Workload Optimization

Transformer models have emerged as the leading approach for achieving st...
research
12/10/2022

Parallel Exploration of Directed Acyclic Graphs using the Actor Model

In this paper we describe a generic scheme for the parallel exploration ...
research
03/25/2022

Efficient k-clique Listing with Set Intersection Speedup [Technical Report]

Listing all k-cliques is a fundamental problem in graph mining, with app...

Please sign up or login with your details

Forgot password? Click here to reset