DeepAI AI Chat
Log In Sign Up

On the Optimality of Scheduling Dependent MapReduce Tasks on Heterogeneous Machines

by   Vaneet Aggarwal, et al.
Purdue University

MapReduce is the most popular big-data computation framework, motivating many research topics. A MapReduce job consists of two successive phases, i.e. map phase and reduce phase. Each phase can be divided into multiple tasks. A reduce task can only start when all the map tasks finish processing. A job is successfully completed when all its map and reduce tasks are complete. The task of optimally scheduling the different tasks on different servers to minimize the weighted completion time is an open problem, and is the focus of this paper. In this paper, we give an approximation ratio with a competitive ratio 2(1+(m-1)/D)+1, where m is the number of servers and D> 1 is the task-skewness product. We implement the proposed algorithm on Hadoop framework, and compare with three baseline schedulers. Results show that our DMRS algorithm can outperform baseline schedulers by up to 82%.


page 1

page 2

page 3

page 4


Scheduling for Multi-Phase Parallelizable Jobs

With multiple identical unit speed servers, the online problem of schedu...

Bag-of-Tasks Scheduling on Related Machines

We consider online scheduling to minimize weighted completion time on re...

Straggler Mitigation with Tiered Gradient Codes

Coding theoretic techniques have been proposed for synchronous Gradient ...

Hybrid Job-driven Scheduling for Virtual MapReduce Clusters

It is cost-efficient for a tenant with a limited budget to establish a v...

Scheduling Parallel-Task Jobs Subject to Packing and Placement Constraints

Motivated by modern parallel computing applications, we consider the pro...

The Power of d Choices in Scheduling for Data Centers with Heterogeneous Servers

MapReduce framework is the de facto in big data and its applications whe...