Randomized Work Stealing versus Sharing in Large-scale Systems with Non-exponential Job Sizes

10/31/2018
by   Benny Van Houdt, et al.
0

Work sharing and work stealing are two scheduling paradigms to redistribute work when performing distributed computations. In work sharing, processors attempt to migrate pending jobs to other processors in the hope of reducing response times. In work stealing, on the other hand, underutilized processors attempt to steal jobs from other processors. Both paradigms generate a certain communication overhead and the question addressed in this paper is which of the two reduces the response time the most given that they use the same amount of communication overhead. Prior work presented explicit bounds, for large scale systems, on when randomized work sharing outperforms randomized work stealing in case of Poisson arrivals and exponential job durations and indicated that work sharing is best when the load is below ϕ -1 ≈ 0.6180, with ϕ being the golden ratio. In this paper we revisit this problem and study the impact of the job size distribution using a mean field model. We present an efficient method to determine the boundary between the regions where sharing or stealing is best for a given job size distribution, as well as bounds that apply to any (phase-type) job size distribution. The main insight is that work stealing benefits significantly from having more variable job sizes and work sharing may become inferior to work stealing for loads as small as 1/2 + ϵ for any ϵ > 0.

READ FULL TEXT
research
03/16/2022

A Model of Job Parallelism for Latency Reduction in Large-Scale Systems

Processing computation-intensive jobs at multiple processing cores in pa...
research
01/11/2022

Performance of Load Balancers with Bounded Maximum Queue Length in case of Non-Exponential Job Sizes

In large-scale distributed systems, balancing the load in an efficient w...
research
06/21/2022

On the stochastic and asymptotic improvement of First-Come First-Served and Nudge scheduling

Recently it was shown that, contrary to expectations, the First-Come-Fir...
research
01/27/2022

Queueing Systems with Some Versions of Limited Processor Sharing Discipline

The paper considers a queueing system with limited processor sharing. No...
research
11/13/2018

Global attraction of ODE-based mean field models with hyperexponential job sizes

Mean field modeling is a popular approach to assess the performance of l...
research
10/01/2021

Uniform Bounds for Scheduling with Job Size Estimates

We consider the problem of scheduling to minimize mean response time in ...
research
05/13/2022

Scalable SAT Solving in the Cloud

Previous efforts on making Satisfiability (SAT) solving fit for high per...

Please sign up or login with your details

Forgot password? Click here to reset