Scalable Load Balancing in Networked Systems: Universality Properties and Stochastic Coupling Methods

12/22/2017
by   Mark van der Boor, et al.
0

We present an overview of scalable load balancing algorithms which provide favorable delay performance in large-scale systems, and yet only require minimal implementation overhead. Aimed at a broad audience, the paper starts with an introduction to the basic load balancing scenario, consisting of a single dispatcher where tasks arrive that must immediately be forwarded to one of N single-server queues. A popular class of load balancing algorithms are so-called power-of-d or JSQ(d) policies, where an incoming task is assigned to a server with the shortest queue among d servers selected uniformly at random. This class includes the Join-the-Shortest-Queue (JSQ) policy as a special case (d = N), which has strong stochastic optimality properties and yields a mean waiting time that vanishes as N grows large for any fixed subcritical load. However, a nominal implementation of the JSQ policy involves a prohibitive communication burden in large-scale deployments. In contrast, a random assignment policy (d = 1) does not entail any communication overhead, but the mean waiting time remains constant as N grows large for any fixed positive load. In order to examine the fundamental trade-off between performance and implementation overhead, we consider an asymptotic regime where d(N) depends on N. We investigate what growth rate of d(N) is required to match the performance of the JSQ policy on fluid and diffusion scale. The results demonstrate that the asymptotics for the JSQ(d(N)) policy are insensitive to the exact growth rate of d(N), as long as the latter is sufficiently fast, implying that the optimality of the JSQ policy can asymptotically be preserved while dramatically reducing the communication overhead. We additionally show how the communication overhead can be reduced yet further by the so-called Join-the-Idle-Queue scheme, leveraging memory at the dispatcher.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/14/2018

Scalable load balancing in networked systems: A survey of recent advances

The basic load balancing scenario involves a single dispatcher where tas...
research
06/01/2023

Optimal Rate-Matrix Pruning For Large-Scale Heterogeneous Systems

We present an analysis of large-scale load balancing systems, where the ...
research
03/04/2020

LSQ: Load Balancing in Large-Scale Heterogeneous Systems with Multiple Dispatchers

Nowadays, the efficiency and even the feasibility of traditional load-ba...
research
03/24/2017

Optimal Service Elasticity in Large-Scale Distributed Systems

A fundamental challenge in large-scale cloud networks and data centers i...
research
10/29/2020

Self-Learning Threshold-Based Load Balancing

We consider a large-scale service system where incoming tasks have to be...
research
12/18/2020

Learning and balancing time-varying loads in large-scale systems

Consider a system of n parallel server pools where tasks arrive as a tim...
research
08/09/2022

Learning Mean-Field Control for Delayed Information Load Balancing in Large Queuing Systems

Recent years have seen a great increase in the capacity and parallel pro...

Please sign up or login with your details

Forgot password? Click here to reset