Labyrinth: Compiling Imperative Control Flow to Parallel Dataflows

09/18/2018
by   Gábor E. Gévay, et al.
0

Parallel dataflow systems have become a standard technology for large-scale data analytics. Complex data analysis programs in areas such as machine learning and graph analytics often involve control flow, i.e., iterations and branching. Therefore, systems for advanced analytics should include control flow constructs that are efficient and easy to use. A natural approach is to provide imperative control flow constructs similar to those of mainstream programming languages: while-loops, if-statements, and mutable variables, whose values can change between iteration steps. However, current parallel dataflow systems execute programs written using imperative control flow constructs by launching a separate dataflow job after every control flow decision (e.g., for every step of a loop). The performance of this approach is suboptimal, because (a) launching a dataflow job incurs scheduling overhead; and (b) it prevents certain optimizations across iteration steps. In this paper, we introduce Labyrinth, a method to compile programs written using imperative control flow constructs to a single dataflow job, which executes the whole program, including all iteration steps. This way, we achieve both efficiency and ease of use. We also conduct an experimental evaluation, which shows that Labyrinth has orders of magnitude smaller per-iteration-step overhead than launching new dataflow jobs, and also allows for significant optimizations across iteration steps.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/07/2022

Can We Run in Parallel? Automating Loop Parallelization for TornadoVM

With the advent of multi-core systems, GPUs and FPGAs, loop parallelizat...
research
11/29/2022

Maximal Atomic irRedundant Sets: a Usage-based Dataflow Partitioning Algorithm

Programs admitting a polyhedral representation can be transformed in man...
research
02/01/2018

Towards Reliable (and Efficient) Job Executions in a Practical Geo-distributed Data Analytics System

Geo-distributed data analytics are increasingly common to derive useful ...
research
04/28/2023

Quantum Control Machine: The Limits of Quantum Programs as Data

Quantum algorithms for factorization, search, and simulation obtain comp...
research
11/29/2018

Sequential Effect Systems with Control Operators

Sequential effect systems are a class of effect system that exploits inf...
research
12/12/2018

STEP : A Distributed Multi-threading Framework Towards Efficient Data Analytics

Various general-purpose distributed systems have been proposed to cope w...
research
04/03/2023

Automated Expected Value Analysis of Recursive Programs

In this work, we study the fully automated inference of expected result ...

Please sign up or login with your details

Forgot password? Click here to reset