Automatic Model Parallelism for Deep Neural Networks with Compiler and Hardware Support

06/11/2019
by   Sanket Tavarageri, et al.
0

The deep neural networks (DNNs) have been enormously successful in tasks that were hitherto in the human-only realm such as image recognition, and language translation. Owing to their success the DNNs are being explored for use in ever more sophisticated tasks. One of the ways that the DNNs are made to scale for the complex undertakings is by increasing their size -- deeper and wider networks can model well the additional complexity. Such large models are trained using model parallelism on multiple compute devices such as multi-GPUs and multi-node systems. In this paper, we develop a compiler-driven approach to achieve model parallelism. We model the computation and communication costs of a dataflow graph that embodies the neural network training process and then, partition the graph using heuristics in such a manner that the communication between compute devices is minimal and we have a good load balance. The hardware scheduling assistants are proposed to assist the compiler in fine tuning the distribution of work at runtime.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/31/2023

DiviML: A Module-based Heuristic for Mapping Neural Networks onto Heterogeneous Platforms

Datacenters are increasingly becoming heterogeneous, and are starting to...
research
01/07/2019

HyPar: Towards Hybrid Parallelism for Deep Learning Accelerator Array

With the rise of artificial intelligence in recent years, Deep Neural Ne...
research
02/21/2022

Survey on Large Scale Neural Network Training

Modern Deep Neural Networks (DNNs) require significant memory to store w...
research
06/04/2020

A Linear Algebraic Approach to Model Parallelism in Deep Learning

Training deep neural networks (DNNs) in large-cluster computing environm...
research
05/30/2021

Maximizing Parallelism in Distributed Training for Huge Neural Networks

The recent Natural Language Processing techniques have been refreshing t...
research
04/10/2023

RESPECT: Reinforcement Learning based Edge Scheduling on Pipelined Coral Edge TPUs

Deep neural networks (DNNs) have substantial computational and memory re...
research
09/05/2023

Generalizing Hierarchical Parallelism

Since the days of OpenMP 1.0 computer hardware has become more complex, ...

Please sign up or login with your details

Forgot password? Click here to reset