DeepAI AI Chat
Log In Sign Up

On the Optimality of Scheduling Dependent MapReduce Tasks on Heterogeneous Machines

11/27/2017
by   Vaneet Aggarwal, et al.
Purdue University
0

MapReduce is the most popular big-data computation framework, motivating many research topics. A MapReduce job consists of two successive phases, i.e. map phase and reduce phase. Each phase can be divided into multiple tasks. A reduce task can only start when all the map tasks finish processing. A job is successfully completed when all its map and reduce tasks are complete. The task of optimally scheduling the different tasks on different servers to minimize the weighted completion time is an open problem, and is the focus of this paper. In this paper, we give an approximation ratio with a competitive ratio 2(1+(m-1)/D)+1, where m is the number of servers and D> 1 is the task-skewness product. We implement the proposed algorithm on Hadoop framework, and compare with three baseline schedulers. Results show that our DMRS algorithm can outperform baseline schedulers by up to 82%.

READ FULL TEXT

page 1

page 2

page 3

page 4

05/01/2022

Scheduling for Multi-Phase Parallelizable Jobs

With multiple identical unit speed servers, the online problem of schedu...
07/13/2021

Bag-of-Tasks Scheduling on Related Machines

We consider online scheduling to minimize weighted completion time on re...
09/05/2019

Straggler Mitigation with Tiered Gradient Codes

Coding theoretic techniques have been proposed for synchronous Gradient ...
08/24/2018

Hybrid Job-driven Scheduling for Virtual MapReduce Clusters

It is cost-efficient for a tenant with a limited budget to establish a v...
04/01/2020

Scheduling Parallel-Task Jobs Subject to Packing and Placement Constraints

Motivated by modern parallel computing applications, we consider the pro...
03/31/2019

The Power of d Choices in Scheduling for Data Centers with Heterogeneous Servers

MapReduce framework is the de facto in big data and its applications whe...