Towards Reliable (and Efficient) Job Executions in a Practical Geo-distributed Data Analytics System

02/01/2018
by   Xiaoda Zhang, et al.
0

Geo-distributed data analytics are increasingly common to derive useful information in large organisations. Naive extension of existing cluster-scale data analytics systems to the scale of geo-distributed data centers faces unique challenges including WAN bandwidth limits, regulatory constraints, changeable/unreliable runtime environment, and monetary costs. Our goal in this work is to develop a practical geo-distribued data analytics system that (1) employs an intelligent mechanism for jobs to efficiently utilize (adjust to) the resources (changeable environment) across data centers; (2) guarantees the reliability of jobs due to the possible failures; and (3) is generic and flexible enough to run a wide range of data analytics jobs without requiring any changes. To this end, we present a new, general geo-distributed data analytics system, HOUTU, that is composed of multiple autonomous systems, each operating in a sovereign data center. HOUTU maintains a job manager (JM) for a geo-distributed job in each data center, so that these replicated JMs could individually and cooperatively manage resources and assign tasks. Our experiments on the prototype of HOUTU running across four Alibaba Cloud regions show that HOUTU provides nearly efficient job performance as in the existing centralized architecture, and guarantees reliable job executions when facing failures.

READ FULL TEXT

page 1

page 5

research
07/29/2021

Bellamy: Reusing Performance Models for Distributed Dataflow Jobs Across Contexts

Distributed dataflow systems enable the use of clusters for scalable dat...
research
04/09/2018

PingAn: An Insurance Scheme for Job Acceleration in Geo-distributed Big Data Analytics System

Geo-distributed data analysis in a cloud-edge system is emerging as a da...
research
05/07/2020

Boosting Cloud Data Analytics using Multi-Objective Optimization

Data analytics in the cloud has become an integral part of enterprise bu...
research
04/07/2023

Runtime Variation in Big Data Analytics

The dynamic nature of resource allocation and runtime conditions on Clou...
research
07/21/2022

Templating Shuffles

Cloud data centers are rapidly evolving. At the same time, large-scale d...
research
08/23/2019

Griffon: Reasoning about Job Anomalies with Unlabeled Data in Cloud-based Platforms

Microsoft's internal big data analytics platform is comprised of hundred...
research
09/18/2018

Labyrinth: Compiling Imperative Control Flow to Parallel Dataflows

Parallel dataflow systems have become a standard technology for large-sc...

Please sign up or login with your details

Forgot password? Click here to reset