Partitioning SKA Dataflows for Optimal Graph Execution

05/19/2018
by   Chen Wu, et al.
0

Optimizing data-intensive workflow execution is essential to many modern scientific projects such as the Square Kilometre Array (SKA), which will be the largest radio telescope in the world, collecting terabytes of data per second for the next few decades. At the core of the SKA Science Data Processor is the graph execution engine, scheduling tens of thousands of algorithmic components to ingest and transform millions of parallel data chunks in order to solve a series of large-scale inverse problems within the power budget. To tackle this challenge, we have developed the Data Activated Liu Graph Engine (DALiuGE) to manage data processing pipelines for several SKA pathfinder projects. In this paper, we discuss the DALiuGE graph scheduling sub-system. By extending previous studies on graph scheduling and partitioning, we lay the foundation on which we can develop polynomial time optimization methods that minimize both workflow execution time and resource footprint while satisfying resource constraints imposed by individual algorithms. We show preliminary results obtained from three radio astronomy data pipelines.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/28/2022

Adding Workflow Management Flexibility to LSST Pipelines Execution

Data processing pipelines need to be executed at scales ranging from sma...
research
09/13/2020

Efficiency Near the Edge: Increasing the Energy Efficiency of FFTs on GPUs for Real-time Edge Computing

The Square Kilometre Array (SKA) is an international initiative for deve...
research
11/06/2017

The TensorFlow Partitioning and Scheduling Problem: It's the Critical Path!

State-of-the-art data flow systems such as TensorFlow impose iterative c...
research
09/09/2022

Machine Learning-based Selection of Graph Partitioning Strategy Using the Characteristics of Graph Data and Algorithm

Analyzing large graph data is an essential part of many modern applicati...
research
12/18/2019

Scheduling Algorithms for Efficient Execution of Stream Workflow Applications in Multicloud Environments

Big data processing applications are becoming more and more complex. The...
research
11/15/2017

Modular Resource Centric Learning for Workflow Performance Prediction

Workflows provide an expressive programming model for fine-grained contr...
research
06/29/2022

The Vera C. Rubin Observatory Data Butler and Pipeline Execution System

The Rubin Observatory's Data Butler is designed to allow data file locat...

Please sign up or login with your details

Forgot password? Click here to reset