A Programming Model for Hybrid Workflows: combining Task-based Workflows and Dataflows all-in-one

07/09/2020
by   Cristian Ramon-Cortes, et al.
0

This paper tries to reduce the effort of learning, deploying, and integrating several frameworks for the development of e-Science applications that combine simulations with High-Performance Data Analytics (HPDA). We propose a way to extend task-based management systems to support continuous input and output data to enable the combination of task-based workflows and dataflows (Hybrid Workflows from now on) using a single programming model. Hence, developers can build complex Data Science workflows with different approaches depending on the requirements. To illustrate the capabilities of Hybrid Workflows, we have built a Distributed Stream Library and a fully functional prototype extending COMPSs, a mature, general-purpose, task-based, parallel programming model. The library can be easily integrated with existing task-based frameworks to provide support for dataflows. Also, it provides a homogeneous, generic, and simple representation of object and file streams in both Java and Python; enabling complex workflows to handle any data type without dealing directly with the streaming back-end.

READ FULL TEXT

page 16

page 25

page 33

research
02/18/2021

A Unified System for Data Analytics and In Situ Query Processing

In today's world data is being generated at a high rate due to which it ...
research
03/10/2020

JS-son – A Lean, Extensible JavaScript Agent Programming Library

A multitude of agent-oriented software engineering frameworks exist, mos...
research
01/10/2018

DuctTeip: An efficient programming model for distributed task based parallel computing

Current high-performance computer systems used for scientific computing ...
research
01/23/2018

Task-parallel Analysis of Molecular Dynamics Trajectories

Different frameworks for implementing parallel data analytics applicatio...
research
07/27/2021

HPTMT: Operator-Based Architecture for Scalable High-Performance Data-Intensive Frameworks

Data-intensive applications impact many domains, and their steadily incr...
research
03/23/2023

Towards Transparent, Reusable, and Customizable Data Science in Computational Notebooks

Data science workflows are human-centered processes involving on-demand ...
research
11/18/2021

Stateful Entities: Object-oriented Cloud Applications as Distributed Dataflows

Programming stateful cloud applications remains a very painful experienc...

Please sign up or login with your details

Forgot password? Click here to reset