Automatic Task Parallelization of Dataflow Graphs in ML/DL models

08/22/2023
by   Srinjoy Das, et al.
0

Several methods exist today to accelerate Machine Learning(ML) or Deep-Learning(DL) model performance for training and inference. However, modern techniques that rely on various graph and operator parallelism methodologies rely on search space optimizations which are costly in terms of power and hardware usage. Especially in the case of inference, when the batch size is 1 and execution is on CPUs or for power-constrained edge devices, current techniques can become costly, complicated or inapplicable. To ameliorate this, we present a Critical-Path-based Linear Clustering approach to exploit inherent parallel paths in ML dataflow graphs. Our task parallelization approach further optimizes the structure of graphs via cloning and prunes them via constant propagation and dead-code elimination. Contrary to other work, we generate readable and executable parallel Pytorch+Python code from input ML models in ONNX format via a new tool that we have built called Ramiel. This allows us to benefit from other downstream acceleration techniques like intra-op parallelism and potentially pipeline parallelism. Our preliminary results on several ML graphs demonstrate up to 1.9× speedup over serial execution and outperform some of the current mechanisms in both compile and runtimes. Lastly, our methods are lightweight and fast enough so that they can be used effectively for power and resource-constrained devices, while still enabling downstream optimizations.

READ FULL TEXT

page 1

page 10

research
07/14/2021

Model-Parallel Model Selection for Deep Learning Systems

As deep learning becomes more expensive, both in terms of time and compu...
research
11/02/2020

IOS: Inter-Operator Scheduler for CNN Acceleration

To accelerate CNN inference, existing deep learning frameworks focus on ...
research
06/03/2019

Willump: A Statistically-Aware End-to-end Optimizer for Machine Learning Inference

Machine learning (ML) has become increasingly important and performance-...
research
01/04/2021

Robust Machine Learning Systems: Challenges, Current Trends, Perspectives, and the Road Ahead

Machine Learning (ML) techniques have been rapidly adopted by smart Cybe...
research
06/22/2020

LAMP: Large Deep Nets with Automated Model Parallelism for Image Segmentation

Deep Learning (DL) models are becoming larger, because the increase in m...
research
05/10/2021

GSPMD: General and Scalable Parallelization for ML Computation Graphs

We present GSPMD, an automatic, compiler-based parallelization system fo...
research
12/19/2013

Structure-Aware Dynamic Scheduler for Parallel Machine Learning

Training large machine learning (ML) models with many variables or param...

Please sign up or login with your details

Forgot password? Click here to reset