Scheduling in Parallel Finite Buffer Systems: Optimal Decisions under Delayed Feedback

by   Anam Tahir, et al.

Scheduling decisions in parallel queuing systems arise as a fundamental problem, underlying the dimensioning and operation of many computing and communication systems, such as job routing in data center clusters, multipath communication, and Big Data systems. In essence, the scheduler maps each arriving job to one of the possibly heterogeneous servers while aiming at an optimization goal such as load balancing, low average delay or low loss rate. One main difficulty in finding optimal scheduling decisions here is that the scheduler only partially observes the impact of its decisions, e.g., through the delayed acknowledgements of the served jobs. In this paper, we provide a partially observable (PO) model that captures the scheduling decisions in parallel queuing systems under limited information of delayed acknowledgements. We present a simulation model for this PO system to find a near-optimal scheduling policy in real-time using a scalable Monte Carlo tree search algorithm. We numerically show that the resulting policy outperforms other limited information scheduling strategies such as variants of Join-the-Most-Observations and has comparable performance to full information strategies like: Join-the-Shortest-Queue, Join-the- Shortest-Queue(d) and Shortest-Expected-Delay. Finally, we show how our approach can optimise the real-time parallel processing by using network data provided by Kaggle.


Optimal Load Balancing in Bipartite Graphs

Applications in cloud platforms motivate the study of efficient load bal...

Job Dispatching Policies for Queueing Systems with Unknown Service Rates

In multi-server queueing systems where there is no central queue holding...

Analysis of the Symmetric Join the Shortest Orbit Queue

This work introduces the join the shortest queue policy in the retrial s...

The Tiny-Tasks Granularity Trade-Off: Balancing overhead vs. performance in parallel systems

Models of parallel processing systems typically assume that one has l wo...

Optimizing Stochastic Scheduling in Fork-Join Queueing Models: Bounds and Applications

Fork-Join (FJ) queueing models capture the dynamics of system paralleliz...

Fast Distributed Inference Serving for Large Language Models

Large language models (LLMs) power a new generation of interactive AI ap...

Non-Asymptotic Delay Bounds for Multi-Server Systems with Synchronization Constraints

Multi-server systems have received increasing attention with important i...

Please sign up or login with your details

Forgot password? Click here to reset