GB-PANDAS: Throughput and heavy-traffic optimality analysis for affinity scheduling

09/23/2017
āˆ™
by   Ali Yekkehkhany, et al.
āˆ™
0
āˆ™

Dynamic affinity scheduling has been an open problem for nearly three decades. The problem is to dynamically schedule multi-type tasks to multi-skilled servers such that the resulting queueing system is both stable in the capacity region (throughput optimality) and the mean delay of tasks is minimized at high loads near the boundary of the capacity region (heavy-traffic optimality). As for applications, data-intensive analytics like MapReduce, Hadoop, and Dryad fit into this setting, where the set of servers is heterogeneous for different task types, so the pair of task type and server determines the processing rate of the task. The load balancing algorithm used in such frameworks is an example of affinity scheduling which is desired to be both robust and delay optimal at high loads when hot-spots occur. Fluid model planning, the MaxWeight algorithm, and the generalized cĪ¼-rule are among the first algorithms proposed for affinity scheduling that have theoretical guarantees on being optimal in different senses, which will be discussed in the related work section. All these algorithms are not practical for use in data center applications because of their non-realistic assumptions. The join-the-shortest-queue-MaxWeight (JSQ-MaxWeight), JSQ-Priority, and weighted-workload algorithms are examples of load balancing policies for systems with two and three levels of data locality with a rack structure. In this work, we propose the Generalized-Balanced-Pandas algorithm (GB-PANDAS) for a system with multiple levels of data locality and prove its throughput optimality. We prove this result under an arbitrary distribution for service times, whereas most previous theoretical work assumes geometric distribution for service times. The extensive simulation results show that the GB-PANDAS algorithm alleviates the mean delay and has a better performance than the JSQ-MaxWeight algorithm by twofold

READ FULL TEXT

page 1

page 2

page 3

page 4

research
āˆ™ 01/13/2019

Blind GB-PANDAS: A Blind Throughput-Optimal Load Balancing Algorithm for Affinity Scheduling

Dynamic affinity load balancing of multi-type tasks on multi-skilled ser...
research
āˆ™ 05/09/2017

Affinity Scheduling and the Applications on Data Center Scheduling with Data Locality

MapReduce framework is the de facto standard in Hadoop. Considering the ...
research
āˆ™ 03/31/2019

The Power of d Choices in Scheduling for Data Centers with Heterogeneous Servers

MapReduce framework is the de facto in big data and its applications whe...
research
āˆ™ 04/14/2020

Comparisons of Algorithms in Big Data Processing

Parallel computing is the fundamental base for MapReduce framework in Ha...
research
āˆ™ 02/20/2020

Asymptotically Optimal Load Balancing in Large-scale Heterogeneous Systems with Multiple Dispatchers

We consider the load balancing problem in large-scale heterogeneous syst...
research
āˆ™ 07/02/2018

On Non-Preemptive VM Scheduling in the Cloud

We study the problem of scheduling VMs (Virtual Machines) in a distribut...
research
āˆ™ 08/11/2021

Transportation Polytope and its Applications in Parallel Server Systems

Parallel server system is a stochastic processing network widely studied...

Please sign up or login with your details

Forgot password? Click here to reset