Baechi: Fast Device Placement of Machine Learning Graphs

01/20/2023
by   Beomyeol Jeon, et al.
0

Machine Learning graphs (or models) can be challenging or impossible to train when either devices have limited memory, or models are large. To split the model across devices, learning-based approaches are still popular. While these result in model placements that train fast on data (i.e., low step times), learning-based model-parallelism is time-consuming, taking many hours or days to create a placement plan of operators on devices. We present the Baechi system, the first to adopt an algorithmic approach to the placement problem for running machine learning training graphs on small clusters of memory-constrained devices. We integrate our implementation of Baechi into two popular open-source learning frameworks: TensorFlow and PyTorch. Our experimental results using GPUs show that: (i) Baechi generates placement plans 654 X - 206K X faster than state-of-the-art learning-based approaches, and (ii) Baechi-placed model's step (training) time is comparable to expert placements in PyTorch, and only up to 6.2 prove mathematically that our two algorithms are within a constant factor of the optimal. Our work shows that compared to learning-based approaches, algorithmic approaches can face different challenges for adaptation to Machine learning systems, but also they offer proven bounds, and significant performance benefits.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/30/2022

Celeritas: Fast Optimizer for Large Dataflow Graphs

The rapidly enlarging neural network models are becoming increasingly ch...
research
01/21/2022

Accelerate Model Parallel Training by Using Efficient Graph Traversal Order in Device Placement

Modern neural networks require long training to reach decent performance...
research
06/20/2019

Placeto: Learning Generalizable Device Placement Algorithms for Distributed Machine Learning

We present Placeto, a reinforcement learning (RL) approach to efficientl...
research
05/23/2023

GiPH: Generalizable Placement Learning for Adaptive Heterogeneous Computing

Careful placement of a computational application within a target device ...
research
06/29/2020

Efficient Algorithms for Device Placement of DNN Graph Operators

Modern machine learning workloads use large models, with complex structu...
research
12/23/2022

RMove: Recommending Move Method Refactoring Opportunities using Structural and Semantic Representations of Code

Incorrect placement of methods within classes is a typical code smell ca...
research
05/17/2022

Moving Stuff Around: A study on efficiency of moving documents into memory for Neural IR models

When training neural rankers using Large Language Models, it's expected ...

Please sign up or login with your details

Forgot password? Click here to reset