PingAn: An Insurance Scheme for Job Acceleration in Geo-distributed Big Data Analytics System

04/09/2018
by   Tiantian Wang, et al.
0

Geo-distributed data analysis in a cloud-edge system is emerging as a daily demand. Out of saving time in wide area data transfer, some tasks are dispersed to the edge clusters satisfied data locality. However, execution in the edge clusters is less well, due to limited resource, overload interference and cluster-level unreachable troubles, which obstructs the guarantee on the speed and completion of jobs. Synthesizing the impact of cluster heterogeneity and costly inter-cluster data fetch, we expect to make effective copies across clusters for tasks to provide both success and efficiency of the arriving jobs. To this end, we design PingAn, an online insurance algorithm making redundance across-cluster copies for tasks, promising (1+ε)-speed o(1/ε^2+ε)-competitive in sum of the job flowtimes. PingAn shares resource among a part of jobs with an adjustable ε fraction to fit the system load condition and insures for tasks following efficiency-first reliability-aware principle to optimize the effect of copies on jobs' performance. Trace-driven simulations demonstrate that PingAn can reduce the average job flowtimes by at least 14% more than the state-of-the-art speculation mechanisms. We also build PingAn in Spark on Yarn System to verify its practicality and generality. Experiments show that PingAn can reduce the average job completion time by up to 40% comparing to the default Spark execution.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/24/2020

Communication Contention Aware Scheduling of Multiple Deep Learning Training Jobs

Distributed Deep Learning (DDL) has rapidly grown its popularity since i...
research
04/25/2016

Do the Hard Stuff First: Scheduling Dependent Computations in Data-Analytics Clusters

We present a scheduler that improves cluster utilization and job complet...
research
02/01/2018

Towards Reliable (and Efficient) Job Executions in a Practical Geo-distributed Data Analytics System

Geo-distributed data analytics are increasingly common to derive useful ...
research
07/03/2019

CloudCoaster: Transient-aware Bursty Datacenter Workload Scheduling

Today's clusters often have to divide resources among a diverse set of j...
research
05/22/2018

DRESS: Dynamic RESource-reservation Scheme for Congested Data-intensive Computing Platforms

In the past few years, we have envisioned an increasing number of busine...
research
08/23/2019

Griffon: Reasoning about Job Anomalies with Unlabeled Data in Cloud-based Platforms

Microsoft's internal big data analytics platform is comprised of hundred...
research
08/27/2021

Enel: Context-Aware Dynamic Scaling of Distributed Dataflow Jobs using Graph Propagation

Distributed dataflow systems like Spark and Flink enable the use of clus...

Please sign up or login with your details

Forgot password? Click here to reset