The Tiny-Tasks Granularity Trade-Off: Balancing overhead vs. performance in parallel systems

02/23/2022
by   Stefan Bora, et al.
0

Models of parallel processing systems typically assume that one has l workers and jobs are split into an equal number of k=l tasks. Splitting jobs into k > l smaller tasks, i.e. using “tiny tasks”, can yield performance and stability improvements because it reduces the variance in the amount of work assigned to each worker, but as k increases, the overhead involved in scheduling and managing the tasks begins to overtake the performance benefit. We perform extensive experiments on the effects of task granularity on an Apache Spark cluster, and based on these, developed a four-parameter model for task and job overhead that, in simulation, produces sojourn time distributions that match those of the real system. We also present analytical results which illustrate how using tiny tasks improves the stability region of split-merge systems, and analytical bounds on the sojourn and waiting time distributions of both split-merge and single-queue fork-join systems with tiny tasks. Finally we combine the overhead model with the analytical models to produce an analytical approximation to the sojourn and waiting time distributions of systems with tiny tasks which include overhead. Though no longer strict analytical bounds, these approximations matched the Spark experimental results very well in both the split-merge and fork-join cases.

READ FULL TEXT
research
10/20/2016

Non-Asymptotic Delay Bounds for Multi-Server Systems with Synchronization Constraints

Multi-server systems have received increasing attention with important i...
research
12/16/2016

Optimizing Stochastic Scheduling in Fork-Join Queueing Models: Bounds and Applications

Fork-Join (FJ) queueing models capture the dynamics of system paralleliz...
research
09/17/2021

Scheduling in Parallel Finite Buffer Systems: Optimal Decisions under Delayed Feedback

Scheduling decisions in parallel queuing systems arise as a fundamental ...
research
04/05/2020

Achieving Zero Asymptotic Queueing Delay for Parallel Jobs

Zero queueing delay is highly desirable in large-scale computing systems...
research
05/23/2019

The Supermarket Model with Known and Predicted Service Times

The supermarket model typically refers to a system with a large number o...
research
12/16/2016

A Generalized Performance Evaluation Framework for Parallel Systems with Output Synchronization

Frameworks, such as MapReduce and Hadoop are abundant nowadays. They see...

Please sign up or login with your details

Forgot password? Click here to reset