The TensorFlow Partitioning and Scheduling Problem: It's the Critical Path!

11/06/2017
by   Ruben Mayer, et al.
0

State-of-the-art data flow systems such as TensorFlow impose iterative calculations on large graphs that need to be partitioned on heterogeneous devices such as CPUs, GPUs, and TPUs. However, partitioning can not be viewed in isolation. Each device has to select the next graph vertex to be executed, i.e., perform local scheduling decisions. Both problems, partitioning and scheduling, are NP-complete by themselves but have to be solved in combination in order to minimize overall execution time of an iteration. In this paper, we propose several heuristic strategies to solve the partitioning and scheduling problem in TensorFlow. We simulate the performance of the proposed strategies in heterogeneous environments with communication-intensive workloads that are common to TensorFlow. Our findings indicate that the best partitioning and scheduling heuristics are those that focus on minimizing the execution time of the critical path in the graph. Those strategies provide a speed-up of up to 4 times in comparison to strategies that are agnostic to the critical path, such as hash-based partitioning and FIFO scheduling.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/03/2017

Graph Partitioning with Acyclicity Constraints

Graphs are widely used to model execution dependencies in applications. ...
research
09/09/2022

Machine Learning-based Selection of Graph Partitioning Strategy Using the Characteristics of Graph Data and Algorithm

Analyzing large graph data is an essential part of many modern applicati...
research
03/11/2020

Utilization Difference Based Partitioned Scheduling of Mixed-Criticality Systems

Mixed-Criticality (MC) systems consolidate multiple functionalities with...
research
05/19/2018

Partitioning SKA Dataflows for Optimal Graph Execution

Optimizing data-intensive workflow execution is essential to many modern...
research
11/16/2021

Self-encoding Barnacle Mating Optimizer Algorithm for Manpower Scheduling in Flow Shop

Flow Shop Scheduling (FSS) has been widely researched due to its applica...
research
12/12/2022

Minimum-weight partitioning of a set with associated subsets

The paper presents complexity results and performance guaranties for a f...
research
08/25/2019

Extending TensorFlow's Semantics with Pipelined Execution

TensorFlow is a popular cloud computing framework that targets machine l...

Please sign up or login with your details

Forgot password? Click here to reset