Efficient Algorithms for Device Placement of DNN Graph Operators

06/29/2020
by   Jakub Tarnawski, et al.
0

Modern machine learning workloads use large models, with complex structures, that are very expensive to execute. The devices that execute complex models are becoming increasingly heterogeneous as we see a flourishing of domain-specific accelerators being offered as hardware accelerators in addition to CPUs. These trends necessitate distributing the workload across multiple devices. Recent work has shown that significant gains can be obtained with model parallelism, i.e, partitioning a neural network's computational graph onto multiple devices. In particular, this form of parallelism assumes a pipeline of devices, which is fed a stream of samples and yields high throughput for training and inference of DNNs. However, for such settings (large models and multiple heterogeneous devices), we require automated algorithms and toolchains that can partition the ML workload across devices. In this paper, we identify and isolate the structured optimization problem at the core of device placement of DNN operators, for both inference and training, especially in modern pipelined settings. We then provide algorithms that solve this problem to optimality. We demonstrate the applicability and efficiency of our approaches using several contemporary DNN computation graphs.

READ FULL TEXT
research
01/07/2019

HyPar: Towards Hybrid Parallelism for Deep Learning Accelerator Array

With the rise of artificial intelligence in recent years, Deep Neural Ne...
research
07/06/2023

OmniBoost: Boosting Throughput of Heterogeneous Embedded Devices under Multi-DNN Workload

Modern Deep Neural Networks (DNNs) exhibit profound efficiency and accur...
research
01/21/2022

Accelerate Model Parallel Training by Using Efficient Graph Traversal Order in Device Placement

Modern neural networks require long training to reach decent performance...
research
03/23/2022

Pathways: Asynchronous Distributed Dataflow for ML

We present the design of a new large scale orchestration layer for accel...
research
08/19/2020

A Computational-Graph Partitioning Method for Training Memory-Constrained DNNs

We propose ParDNN, an automatic, generic, and non-intrusive partitioning...
research
07/30/2022

Celeritas: Fast Optimizer for Large Dataflow Graphs

The rapidly enlarging neural network models are becoming increasingly ch...
research
01/20/2023

Baechi: Fast Device Placement of Machine Learning Graphs

Machine Learning graphs (or models) can be challenging or impossible to ...

Please sign up or login with your details

Forgot password? Click here to reset