Delay Asymptotics and Bounds for Multi-Task Parallel Jobs

10/01/2017
by   Weina Wang, et al.
0

We study delay of jobs that consist of multiple parallel tasks, which is a critical performance metric in a wide range of applications such as data file retrieval in coded storage systems and parallel computing. In this problem, each job is completed only when all of its tasks are completed, so the delay of a job is the maximum of the delays of its tasks. Despite the wide attention this problem has received, tight analysis is still largely unknown since analyzing job delay requires characterizing the complicated correlation among task delays, which is hard to do. We first consider an asymptotic regime where the number of servers, n, goes to infinity, and the number of tasks in a job, k^(n), is allowed to increase with n. We establish the asymptotic independence of any k^(n) queues under the condition k^(n) = o(n^1/4). This greatly generalizes the asymptotic-independence type of results in the literature where asymptotic independence is shown only for a fixed constant number of queues. As a consequence of our independence result, the job delay converges to the maximum of independent task delays. We next consider the non-asymptotic regime. Here we prove that independence yields a stochastic upper bound on job delay for any n and any k^(n) with k^(n)< n. The key component of our proof is a new technique we develop, called "Poisson oversampling". Our approach converts the job delay problem into a corresponding balls-and-bins problem. However, in contrast with typical balls-and-bins problems where there is a negative correlation among bins, we prove that our variant exhibits positive correlation.

READ FULL TEXT
research
10/01/2017

Asymptotic response time analysis for multi-task parallel jobs

The response time of jobs with multiple parallel tasks is a critical per...
research
04/05/2020

Achieving Zero Asymptotic Queueing Delay for Parallel Jobs

Zero queueing delay is highly desirable in large-scale computing systems...
research
09/11/2021

Sharp Waiting-Time Bounds for Multiserver Jobs

Multiserver jobs, which are jobs that occupy multiple servers simultaneo...
research
10/21/2019

Delay-optimal policies in partial fork-join systems with redundancy and random slowdowns

We consider a large distributed service system consisting of n homogeneo...
research
06/19/2020

Large-scale parallel server system with multi-component jobs

A broad class of parallel server systems is considered, for which we pro...
research
10/19/2022

Fries: Fast and Consistent Runtime Reconfiguration in Dataflow Systems with Transactional Guarantees (Extended Version)

A computing job in a big data system can take a long time to run, especi...
research
08/11/2021

Transportation Polytope and its Applications in Parallel Server Systems

Parallel server system is a stochastic processing network widely studied...

Please sign up or login with your details

Forgot password? Click here to reset