Power-of-d-Choices with Memory: Fluid Limit and Optimality

by   Anselmi Jonatha, et al.
Université de Bordeaux

In multi-server distributed queueing systems, the access of stochastically arriving jobs to resources is often regulated by a dispatcher, also known as load balancer. A fundamental problem consists in designing a load balancing algorithm that minimizes the delays experienced by jobs. During the last two decades, the power-of-d-choice algorithm, based on the idea of dispatching each job to the least loaded server out of d servers randomly sampled at the arrival of the job itself, has emerged as a breakthrough in the foundations of this area due to its versatility and appealing asymptotic properties. In this paper, we consider the power-of-d-choice algorithm with the addition of a local memory that keeps track of the latest observations collected over time on the sampled servers. Then, each job is sent to a server with the lowest observation. We show that this algorithm is asymptotically optimal in the sense that the load balancer can always assign each job to an idle server in the large-server limit. This holds true if and only if the system load λ is less than 1-1/d. If this condition is not satisfied, we show that queue lengths are bounded by j^+1, where j^∈N is given by the solution of a polynomial equation. This is in contrast with the classic version of the power-of-d-choice algorithm, where queue lengths are unbounded. Our upper bound on the size of the most loaded server, j^*+1, is tight and increases slowly when λ approaches its critical value from below. For instance, when λ= 0.995 and d=2 (respectively, d=3), we find that no server will contain more than just 5 (3) jobs in equilibrium. Our results quantify and highlight the importance of using memory as a means to enhance performance in randomized load balancing.



There are no comments yet.


page 1

page 2

page 3

page 4


Proximity Based Load Balancing Policies on Graphs: A Simulation Study

Distributed load balancing is the act of allocating jobs among a set of ...

Improved Load Balancing in Large Scale Systems using Attained Service Time Reporting

Our interest lies in load balancing jobs in large scale systems consisti...

Optimal Hyper-Scalable Load Balancing with a Strict Queue Limit

Load balancing plays a critical role in efficiently dispatching jobs in ...

Distributed Dispatching in the Parallel Server Model

With the rapid increase in the size and volume of cloud services and dat...

Energy-Efficient Job-Assignment Policy with Asymptotically Guaranteed Performance Deviation

We study a job-assignment problem in a large-scale server farm system wi...

Derandomized Load Balancing using Random Walks on Expander Graphs

In a computing center with a huge amount of machines, when a job arrives...

Light traffic behavior under the power-of-two load balancing strategy: The case of heterogeneous servers

We consider a multi-server queueing system under the power-of-two policy...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

In multi-server distributed queueing systems, the access of stochastically arriving jobs to resources, or servers, is often regulated by a central dispatcher, also known as load balancer. A fundamental problem consists in designing a load balancing algorithm able to minimize the delays experienced by jobs. In this paper, we are interested in a setting where a traffic of rate needs to be distributed across  unit-rate parallel servers, each with its own queue, as indicated in Figure 1.

Figure 1: Architecture of the distributed system for load balancing.

The load balancer may rely on feedback information coming from the servers, which may be also stored in a local memory. This type of model find applications, for instance, in computer and communication systems, hospitals, and road networks, and it is not surprising that there exists a significant and growing number of references; see, e.g., the recent works Ying et al. [16], Gardner et al. [8], Gamarnik et al. [7], Gupta and Walton [9] and the references therein. Nevertheless, it is often difficult to establish whether an algorithm is better than another because in general the answer strongly depends on the underlying architecture, application or traffic conditions.

During the last two decades, the power-of--choice algorithm, introduced in Mitzenmacher [11], Vvedenskaya et al. [15] and referred to as SQ(), has emerged as a breakthrough in the foundations of this area due to its versatility and its appealing asymptotic properties. It works as follows: Upon arrival of each job, servers are contacted uniformly at random, their state (e.g., queue length or workload) is retrieved, and finally the job is dispatched to a server in the best observed state. The first remarkable property is that in the large-server limit, , the stationary proportion of servers with at least  jobs decreases doubly exponentially in , though it remains strictly positive for all . This result has been generalized in Bramson et al. [3] to the case where service times are heavy-tailed rather than exponential. It turns out that SQ() is heavy-traffic optimal in the sense that it minimizes the workload over all time in the diffusion limit where ; see Chen and Ye [4] and Maguluri et al. [10]. In Ying et al. [16], it is shown that the number of sampled servers can be dramatically reduced by using the fact that tasks arrive in batches. This is useful to reduce communication overheads. In Mitzenmacher et al. [12], the power-of--choice algorithm is studied in the case where the load balancer is endowed with a local memory that stores the index and the state of the least loaded server out of the sampled each time a job arrives. When the -th job arrives, the winning server is chosen among the servers randomly selected upon its arrival and the least loaded server seen by the th job. The resulting performance is better than the one achieved by SQ. If is allowed to depend on and , SQ() has been recently shown to become fluid (or mean-field) optimal, i.e., optimal in the large-system limit; see Mukherjee et al. [13], Dieker and Suk [5]. In this setting, optimality refers to the ability of assigning each incoming job to an idle server. Also our work aims at achieving fluid optimality, though we will consider  as a constant to keep the communication overhead at a minimum. Towards this purpose, we will show that it is enough to endow the load balancer with a local memory that keeps track of the latest observation collected on each server. This approach is also close to Mitzenmacher et al. [12], though different because in that reference the memory can only store one observation. In fact, one observation (or even a finite number of observations) is not enough to achieve fluid optimality.

In Algorithm 1, we provide a pseudocode for the power-of--choices algorithm with memory and  servers, referred to as SQ; in the Conclusions, we will describe some variants practical optimizations. Upon arrival of one job, the states collected from randomly chosen servers are stored in the local array Memory. Then, the job is sent to a server chosen randomly among the ones having the lowest recorded state. Finally, the observation of the selected server is incremented by one. Server selections are thus with replacement.

1:procedure SQ()
2:     Memory;
3:     for each job arrival  do
4:         for  do
5:              rnd_server = random(1,);
6:              Memory[rnd_server] = get_state(rnd_server);
7:         end for
8:         selected_server = random( Memory[]);
9:         send_job_to(selected_server); 
10:         Memory[selected_server]++;
11:     end for
12:end procedure
Algorithm 1 Power-of--choices with memory and servers.

It is intuitive that SQ() results in more balanced allocations than SQ. This follows by using the coupling argument developed in Theorem 3.5 of Azar et al. [1]

, which can be adapted to argue that at any point in time the vector of queue lengths achieved with SQ

is majorized by the vector of queue lengths achieved with SQ. On the other hand, it is not clear how much such improvement can be. This is the goal of the present paper.

We investigate the dynamics of SQ

by means of a continuous-time Markov chain

that keeps track of the proportion of servers with jobs and for which their last observation collected by the load balancer is , for all and . To the best of our knowledge, this is the first paper that studies the time-varying dynamics induced by SQ. The transition rates of are non-Lipschitz and a satisfactory analysis of  when  is finite seems to be out of reach. Our main contributions are as follows:

  1. In Theorem 1, we let and identify the fluid limit of , an absolutely continuous function that is interpreted as a first-order approximation of the original model . The fluid limit is motivated by the fact that real systems are composed of many servers and that it enables a tractable analysis for the dynamics of SQ. A fluid limit is necessarily a fluid solution, as introduced in Definition 1. The proof of the fluid limit is the main technical part of this work and is given in Section 4. The main difficulty stands in the discontinuous structure of the drift of ; see Section 2.2 for further details.

  2. We then study fixed points, fluid solutions that are constant over time. Theorem 2 shows that there exists a unique fixed point. The general structure of such fixed point as a function of is quite cumbersome and implies that

    • The queue length of each server never exceeds , where will be defined in Section 3 by means of a polynomial equation. This is in contrast with SQ, where queue lengths are unbounded; see Mitzenmacher [11]. Figure 2 illustrates the behaviour of the tight upper bound by varying and , and shows that the size of the most loaded server will remain very small even when is very close to its critical value.

      Figure 2: Plots of the maximum queue lenght, , by varying and .

      In fact, even when and (respectively, ), no server will contain more than just 5 (3) jobs.

    • The load balancer memory can only contain two possible observations, namely  and .

    The case of particular interest is when , where and thus the load balancer memory always contains a strictly positive proportion of zeros. This means that the load balancer can always assign incoming jobs to idle servers, which is clearly the ideal situation for any incoming job. In this sense we say that SQ is asymptotically optimal. When the load balancer memory will never contain a strictly positive mass of zeros but it will still be able to assign a fraction of jobs to idle servers ensuring that the average number of jobs in each queue belongs to the interval (Proposition 2).

  3. Finally, we investigate stability properties of the unique fixed point. Theorem 3 establishes that fluid solutions converge to such point regardless of the initial condition and exponentially fast, provided that . Thus, in this case all fluid solutions will be eventually asymptotically optimal as the load balancer memory will eventually be populated by a strictly positive mass of zeros. The proof of this result, given in Section 5.3, is based on a sort of Lyapunov argument that allows us to show that the time evolution of fluid solutions is eventually governed by the unique solution of a simple linear and autonomous ODE system.

2 Performance Models

In order to describe the time varying effects of SQ on queue lengths, we introduce a stochastic and a deterministic model. The stochastic model is meant to capture the variability of job inter-arrival and service times that is intrinsic in multi-server distributed queueing systems. Due to its intractability, a satisfactory analysis of such model is out of reach. In this respect, the deterministic model is convenient because it does enable analytical tractability. In this section, we also show our first result, which states that both models are connected each other: the deterministic can be interpreted as a first-order approximation of the stochastic.

In the following, we will refer to a server with  jobs and for which its last observation at the controller is  as an -server.

2.1 Markov Model

First, we model the dynamics induced by SQ() as a Markov chain in continuous-time: arrivals at the load balancer are assumed to follow a Poisson process with rate , with

, and service times are independent, exponentially distributed random variables with unit mean. Servers process jobs according to any work-conserving discipline and each of them can contain

jobs at most. A job that is sent to a server with jobs is rejected.

Let be the system state at time : represents the number of jobs in queue at time and represents the last observation collected from server  by the controller at time . At time zero, the controller is assumed to know the vector .

Since the system is symmetric with respect to the servers, it is convenient to represent the system state by where


denotes the proportion of -servers at time . It is clear that is still a Markov chain with values in some finite set that is a subset of


The transitions and rates of the Markov chain that are due to server departures are easy to write because they have no impact on memory: for , the transition occurs with rate  where and is the Kronecker delta. On the other hand, the transitions and rates of that are due to job arrivals are quite complex to write and are omitted. However, in Section 4.1 we will show how to construct the sample paths of .

2.2 Fluid Model

For any , let

The next definition introduces the fluid model for the dynamics of .

Definition 1.

A function is said to be a fluid model (or fluid solution) if the following conditions are satisfied:

  1. is absolutely continuous, and

  2. almost everywhere, for every and ,

where is given by




and .

The discontinuous function will be referred to as drift, and to some extent it may be interpreted as the conditional expected change from state of the Markov chain , though this may only be true when , where for all and the formulas above become linear admitting a very intuitive explanation.

Let us provide some intuition for the drift expressions in Definition 1

, and let us start with coordinates (0,0). At the moment of each arrival at the load balancer, the states of 

servers are sampled and 

idle servers that the load balancer has not yet spotted are sampled with probability

. Since and arrivals occur with rate , the rationale behind the first term in (3) is well understood. The dynamics that remain to specify are the ones related to the effective job assignments, that is where singularities can happen. In order to build a fluid model ‘consistent’ with the finite stochastic system , one should take into account the fluctuations of order that appear when . These bring discontinuities in the drift. Let and . We notice that , where is defined in (8), will be interpreted as the proportion of time where the process stays on zero in the limit where first and then ; this will be formalized in Section 4.3.2. Since is the probability of sampling servers containing more than jobs only, is then interpreted as the rate in which the process tends to remain on zero. Thus, the term represents the rate in which jobs are assigned to (0,0)-servers, which become (1,1)-servers as soon as they receive a job. This explains the drift expression in (3). The particular structure of given in (8) will be the outcome of the stochastic analysis that will be performed in Section 4.

Let us provide some intuition also for the drift expression on coordinates (1,1) (see (5)), as it brings some additional interpretation that also applies on general coordinates. The first term says that departures from -servers occur with rate and the second one says that new -servers are discovered with rate . This can be easily justified as done above for . Then, we notice that the term has been already interpreted above and thus the dynamics that remain to specify are the ones related to job assignments at (1,1)-servers. According to SQ, if the load balancer knows no -server then it randomizes over the set of -servers, and thus within this scenario should decrease with rate proportional to . This is indeed the case if . Thus, is the rate in which jobs are assigned to (1,1)-servers when . It remains to model the rate in which jobs are assigned to (1,1)-servers when . Since we aim at building a deterministic model ‘consistent’ with the stochastic one, to model the rate of job assignments to (1,1)-servers when one should take into account the fluctuations of order that appear when . The term is indeed such rate, and again it will be the outcome of the stochastic analysis developed in Section 4.

The following proposition will be proven in Section 4.

Proposition 1.

Fluid solutions exist.

2.3 Connecting the Markov and the Fluid Models

Our first result is the following connection between the stochastic and the fluid models.

Theorem 1.

Assume that almost surely. With probability one, any limit point of the stochastic process satisfies the conditions that define a fluid solution.

In view of this result, proven in Section 4, a fluid solution may be interpreted as an accurate approximation of the time-dependent dynamics of the finite stochastic system when is sufficiently large.

Given , let us define the functions

We notice that represents the number of jobs in the system at time  scaled by  and that represents the number of jobs scaled by  the load balancer believes are in the system at time . Since the system is symmetric with respect to the servers, the function is also interpreted as the average number of jobs at time in each queue.

It is clear that which is to be expected because ()-servers can not contain more than  jobs.

The following corollary of Theorem 1 is immediate.

Corollary 1.

Let be a fluid solution. Assume that is a limit point of with probability one. Then, and are limit points of and , respectively, with probability one.

We complement Theorem 1 and Corollary 1 presenting some numerical simulations to support the claim that the fluid model provides a remarkably accurate approximation of the sample paths of even when  is finite and relatively small. Assuming , Figure 3 plots the time dependent dynamics of and .

Figure 3: Numerical convergence of the stochastic model (continuous lines) to the fluid model (dashed line).

At time zero, we have chosen and such that , which means that all servers are idle and the load balancer is aware of it. Each curve on these plots is an average over ten simulations. The fluid (stochastic) model is always represented by dashed (continuous) lines. In the picture on the left (), we set and notice that the fluid model already captures in an accurate manner the dynamics of , which turn out to be concentrated more and more on just three components: namely (0,0), (0,1) and (1,1). Matter of fact gets closer and closer to 1 when both and increase. In the picture on the right (), dynamics are distributed on several components and for convenience we have plotted with its fluid model counterpart . We notice that almost overlaps the trajectory of already when . This size is in agreement with the magnitude of modern distributed computing such as web-server farms or data-centers, as they are often composed of (tenths of) thousands of servers.

3 Main Results

In this section we focus on fluid solutions and show properties about their ‘optimality and stability’. First, we are interested in fixed points.

Definition 2.

We say that a fluid solution is a fixed point if for all .

When fluid solutions are fixed points, we drop the dependency on .

Let us introduce the intervals

where and , for , is the unique root in of the polynomial equation

The first values of are and . Let also be the unique integer such that , and for simplicity let us assume that .

Theorem 2 (Existence and Uniqueness of Fixed Points).

There exists a unique fixed point, say . It is such that and


This result, proven in Section 5.1, establishes the existence and uniqueness of a fixed point and says that its mass is concentrated only on coordinates of the form and . Thus, our first remark is that queue lengths are bounded, by . As we show in our proof, an explicit expression for seems to be out of reach, though it can be easily computed when and are fixed numerically. In fact, in Section 5.1 we provide an explicit expression for when as a function of , and identify by means of a polynomial equation of degree (see (60)).

A case of particular interest is when , i.e., , where we have the following remark.

Remark 1 (Asymptotic Optimality).

If , then Theorem 2 implies that , , and on the remaining coordinates. Thus, provided that dynamics converge to , we have shown that a load balancer implementing SQ() is always aware of the fact that some servers are idle when and is sufficiently large because . In this scenario, the load balancer can certainly assign each incoming job to one of such idle servers, and the job itself would incur zero delay. This is in fact the ideal situation for any arriving job and in this sense we say that SQ is asymptotically optimal.

The next proposition provides further insights on the system performance at the fixed point .

Proposition 2.

Let as in Theorem 2. Then,




Proposition 2, proven in Section 5.2, provides simple bounds on the average number of jobs in each queue. It also says that there is a fluid mass equal to that the load balancer will never spot. In other words, the samplings performed by the load balancer at each arrival will correctly build the true state of the system up to an (absolute) error of .

In Remark 1, we discussed the asymptotic optimality of SQ() postulating some form of stability for fluid solutions when . The next result shows that fluid solutions are indeed globally stable when .

Theorem 3 (Global Stability).

Let be a fluid solution. If , then


where is the Euclidean norm.

The proof of Theorem 3 is given in Section 5.3 and is based on the following ‘Lyapunov-type’ argument. When , we first show that , which implies that decreases with derivative bounded away from zero. Thus, in finite time must increase, and when it does we show that is uniquely determined by the unique solution of a linear ODE system of the form . At this point, (13) follows by standard results of ODE theory. Our proof actually says more than the statement in the theorem itself because it provides insights on the speed of convergence to . Given that the general solution of the previous linear ODE has the matrix exponential form

and the eigenvalues of

will be shown to be strictly negative, the convergence of to occurs exponentially fast. When , a generalization of this argument is complicated by the involved structure of . However, we conjecture that is again globally stable, i.e., (13) still holds true. This is also confirmed by the numerical simulations shown in Section 2.3.

4 Connection Between the Fluid and the Markov Models

We now prove that the sequence of stochastic processes converges almost surely, as , to a fluid solution, for any . This proves Proposition 1 and Theorem 1.

Our proof is based on three steps. First, we construct the sample paths of the process on each pair of coordinates. This is achieved using a common coupling technique that defines the processes for all on a single probability space and in terms of a finite number of “fundamental processes”. Then, we show that limit trajectories exist and are Lipschitz continuous with probability one. This is done by using standard arguments, e.g., Gamarnik et al. [6], Tsitsiklis and Xu [14], and Bramson [2]. Finally, we prove that any such limit trajectory must be a fluid solution, which is the main difficulty. This last step is based on technical arguments that are specific of the stochastic model under investigation.

4.1 Probability Space and Coupled Construction of Sample Paths

We construct a probability space where the stochastic processes are coupled. All the processes of interest will be a function of the following fundamental processes, all of them independent of each other:

  • , the Poisson processes of job arrivals, with rate , defined on ;

  • , the Poisson processes of potential job departures, with rate 1, defined on ;

  • for all , , , where the random variables , and , for all and

    , are all independent and uniformly distributed over the interval

    . These are selection processes: will select the servers to sample at each arrival (see Line 5 of Algorithm 1), will be used to randomize among the servers having the lowest observations (see Line 8 of Algorithm 1) and will select the server that fires a departure. These processes are defined on ;

  • , the process of the initial conditions, where each random variable takes values in , defined on .

Each process , with , can be constructed on by using that , where denotes equality in distribution. This equality ensures that the Poisson process with rate , which represents the arrival process associated to the -th system, is coupled with the fundamental Poisson process . Since , this coupling is also used for the processes of potential job departures.

Now, let and be the times of the th jump of the Poisson processes and , respectively. Let also and . In view of the coupling discussed above, we can construct as follows


In the above expression, the term (14a) is related to the action of sampling servers and the term (14b) is related to the action of assigning each job to a server. At the arrival of the -th job, , the proportion of (0,0)-servers increases by if -servers are sampled for any , which justifies the term in (14a), and decreases by except when such proportion is zero immediately before and no idle server is sampled at time , which justifies the term in (14b).

Using the random variables and , an expression similar to (14) can be written for when . For simplicity, let us define



We notice that is the number of -servers sampled immediately before time . Furthermore, let also


if , and


if . We will use , with , in the scenario where and for all and , that is the case where the load balancer memory contains no observation less than and no server containing less than jobs is sampled immediately before . In this case, according to SQ, the -th job must be routed to a random -server, provided that such a server exists. This randomness is captured by the uniform random variable and we notice that is the number of -servers, or equivalently the occurrences of in the memory of the load balancer, at the arrival of the -th job and after having performed the associated sampling of the states of random servers. Within these conditions, the job arriving at time is routed to an -server if and only if .

Provided that , the following formula constructs the process on coordinates


The summations in (18a) and (18b) refer, respectively, to job departures from - and -servers, the summation in (18c) refers to the case where -servers are sampled (as soon as of them are sampled, they become -servers, and thus decreases by ), and the summations in (18d) and (18e) refer to the case where a job is assigned to an -server and to an -server, respectively. We notice that a job can be assigned at time to an -server only if i.e., the memory contains no server with observation less than  (i.e., ) and no -server, for some and for any , has been sampled at time . Summation (18f) covers the boundary case where and has the same intuition of term (18f).

Similarly, when , we have