Efficient Iterative Programs with Distributed Data Collections

06/13/2023
by   Sarah Chlyah, et al.
0

Big data programming frameworks have become increasingly important for the development of applications for which performance and scalability are critical. In those complex frameworks, optimizing code by hand is hard and time-consuming, making automated optimization particularly necessary. In order to automate optimization, a prerequisite is to find suitable abstractions to represent programs; for instance, algebras based on monads or monoids to represent distributed data collections. Currently, however, such algebras do not represent recursive programs in a way which allows for analyzing or rewriting them. In this paper, we extend a monoid algebra with a fixpoint operator for representing recursion as a first class citizen and show how it enables new optimizations. Experiments with the Spark platform illustrate performance gains brought by these systematic optimizations.

READ FULL TEXT

page 29

page 30

research
08/05/2021

An Abstract View of Big Data Processing Programs

This paper proposes a model for specifying data flow based parallel data...
research
07/12/2022

Supercharging the APGAS Programming Model with Relocatable Distributed Collections

In this article we present our relocatable distributed collections libra...
research
03/21/2020

Translation of Array-Based Loops to Distributed Data-Parallel Programs

Large volumes of data generated by scientific experiments and simulation...
research
01/18/2022

Lambda the Ultimate SSA: Optimizing Functional Programs in SSA

Static Single Assignment (SSA) is the workhorse of modern optimizing com...
research
07/29/2019

Proposition d'un modèle pour l'optimisation automatique de boucles dans le compilateur Tiramisu : cas d'optimisation de déroulage

Computer architectures become more and more complex. It requires more ef...
research
01/09/2020

Lazy object copy as a platform for population-based probabilistic programming

This work considers dynamic memory management for population-based proba...
research
04/25/2022

Automatic Datapath Optimization using E-Graphs

Manual optimization of Register Transfer Level (RTL) datapath is commonp...

Please sign up or login with your details

Forgot password? Click here to reset