Enel: Context-Aware Dynamic Scaling of Distributed Dataflow Jobs using Graph Propagation

08/27/2021
by   Dominik Scheinert, et al.
0

Distributed dataflow systems like Spark and Flink enable the use of clusters for scalable data analytics. While runtime prediction models can be used to initially select appropriate cluster resources given target runtimes, the actual runtime performance of dataflow jobs depends on several factors and varies over time. Yet, in many situations, dynamic scaling can be used to meet formulated runtime targets despite significant performance variance. This paper presents Enel, a novel dynamic scaling approach that uses message propagation on an attributed graph to model dataflow jobs and, thus, allows for deriving effective rescaling decisions. For this, Enel incorporates descriptive properties that capture the respective execution context, considers statistics from individual dataflow tasks, and propagates predictions through the job graph to eventually find an optimized new scale-out. Our evaluation of Enel with four iterative Spark jobs shows that our approach is able to identify effective rescaling actions, reacting for instance to node failures, and can be reused across different execution contexts.

READ FULL TEXT
research
07/29/2021

Bellamy: Reusing Performance Models for Distributed Dataflow Jobs Across Contexts

Distributed dataflow systems enable the use of clusters for scalable dat...
research
11/15/2021

Training Data Reduction for Performance Models of Data Analytics Jobs in the Cloud

Distributed dataflow systems like Apache Flink and Apache Spark simplify...
research
11/16/2020

Towards Collaborative Optimization of Cluster Configurations for Distributed Dataflow Jobs

Analyzing large datasets with distributed dataflow systems requires the ...
research
04/15/2023

SerPyTor: A distributed context-aware computational graph execution framework for durable execution

Distributed computation is always a tricky topic to deal with, especiall...
research
04/09/2018

PingAn: An Insurance Scheme for Job Acceleration in Geo-distributed Big Data Analytics System

Geo-distributed data analysis in a cloud-edge system is emerging as a da...
research
12/22/2018

Trevor: Automatic configuration and scaling of stream processing pipelines

Operating a distributed data stream processing workload efficiently at s...
research
03/10/2022

Efficient Runtime Profiling for Black-box Machine Learning Services on Sensor Streams

In highly distributed environments such as cloud, edge and fog computing...

Please sign up or login with your details

Forgot password? Click here to reset