Translation of Array-Based Loops to Distributed Data-Parallel Programs

03/21/2020
by   Leonidas Fegaras, et al.
0

Large volumes of data generated by scientific experiments and simulations come in the form of arrays, while programs that analyze these data are frequently expressed in terms of array operations in an imperative, loop-based language. But, as datasets grow larger, new frameworks in distributed Big Data analytics have become essential tools to large-scale scientific computing. Scientists, who are typically comfortable with numerical analysis tools but are not familiar with the intricacies of Big Data analytics, must now learn to convert their loop-based programs to distributed data-parallel programs. We present a novel framework for translating programs expressed as array-based loops to distributed data parallel programs that is more general and efficient than related work. Although our translations are over sparse arrays, we extend our framework to handle packed arrays, such as tiled matrices, without sacrificing performance. We report on a prototype implementation on top of Spark and evaluate the performance of our system relative to hand-written programs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/05/2021

An Abstract View of Big Data Processing Programs

This paper proposes a model for specifying data flow based parallel data...
research
07/12/2017

Verifying Array Manipulating Programs by Tiling

Formally verifying properties of programs that manipulate arrays in loop...
research
04/16/2018

BigDL: A Distributed Deep Learning Framework for Big Data

In this paper, we present BigDL, a distributed deep learning framework f...
research
10/27/2020

A Fast, Scalable, Universal Approach For Distributed Data Aggregations

In the current era of Big Data, data engineering has transformed into an...
research
06/13/2023

Efficient Iterative Programs with Distributed Data Collections

Big data programming frameworks have become increasingly important for...
research
11/19/2021

Improving a High Productivity Data Analytics Chapel Framework

Most state of the art exploratory data analysis frameworks fall into one...
research
12/12/2018

STEP : A Distributed Multi-threading Framework Towards Efficient Data Analytics

Various general-purpose distributed systems have been proposed to cope w...

Please sign up or login with your details

Forgot password? Click here to reset