# Parallel Load Balancing on Constrained Client-Server Topologies

We study parallel Load Balancing protocols for a client-server distributed model defined as follows. There is a set of n clients and a set of n servers where each client has (at most) a constant number d ≥ 1 of requests that must be assigned to some server. The client set and the server one are connected to each other via a fixed bipartite graph: the requests of client v can only be sent to the servers in its neighborhood N(v). The goal is to assign every client request so as to minimize the maximum load of the servers. In this setting, efficient parallel protocols are available only for dense topolgies. In particular, a simple symmetric, non-adaptive protocol achieving constant maximum load has been recently introduced by Becchetti et al <cit.> for regular dense bipartite graphs. The parallel completion time is (log n) and the overall work is (n), w.h.p. Motivated by proximity constraints arising in some client-server systems, we devise a simple variant of Becchetti et al's protocol <cit.> and we analyse it over almost-regular bipartite graphs where nodes may have neighborhoods of small size. In detail, we prove that, w.h.p., this new version has a cost equivalent to that of Becchetti et al's protocol (in terms of maximum load, completion time, and work complexity, respectively) on every almost-regular bipartite graph with degree Ω(log^2n). Our analysis significantly departs from that in <cit.> for the original protocol and requires to cope with non-trivial stochastic-dependence issues on the random choices of the algorithmic process which are due to the worst-case, sparse topology of the underlying graph.

## Authors

• 7 publications
• 10 publications
• 4 publications
• ### Improved Bounds for Distributed Load Balancing

In the load balancing problem, the input is an n-vertex bipartite graph ...
08/10/2020 ∙ by Sepehr Assadi, et al. ∙ 0

• ### Proximity Based Load Balancing Policies on Graphs: A Simulation Study

Distributed load balancing is the act of allocating jobs among a set of ...
11/03/2020 ∙ by Nitish K. Panigrahy, et al. ∙ 0

• ### Load Balancing Under Strict Compatibility Constraints

We study large-scale systems operating under the JSQ(d) policy in the pr...
08/17/2020 ∙ by Daan Rutten, et al. ∙ 0

• ### Fast algorithms for general spin systems on bipartite expanders

A spin system is a framework in which the vertices of a graph are assign...
04/28/2020 ∙ by Andreas Galanis, et al. ∙ 0

• ### ExpertMatcher: Automating ML Model Selection for Clients using Hidden Representations

Recently, there has been the development of Split Learning, a framework ...
10/09/2019 ∙ by Vivek Sharma, et al. ∙ 12

• ### Asymptotically Optimal Load Balancing Topologies

We consider a system of N servers inter-connected by some underlying gra...
07/18/2017 ∙ by Debankur Mukherjee, et al. ∙ 0

• ### Algorithmic Number On the Forehead Protocols Yielding Dense Ruzsa-Szemerédi Graphs and Hypergraphs

We describe algorithmic Number On the Forehead protocols that provide de...
01/02/2020 ∙ by Noga Alon, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

### 1.1 The Framework and our Algorithmic Goal

We study parallel Load-Balancing allocation in client-server distributed systems. We have a client-server bipartite graph where: is the set of clients, each one having a number of requests which is bounded by some constant ; is the set of servers; the edge set represents the client-server assignments which are considered admissible because of proximity constraints (a client can send a request only to the servers in its neighborhood).

The algorithmic goal of the entities is to assign the requests in parallel so as to minimize the maximum server load111the load of a server is the overall number of requests which have been assigned to it. .

To analyze the performance of the proposed protocol for the above distributed task, we adopt the standard synchronous distributed model introduced for parallel balls-into-bins processes by Micah et al in [25]: here, clients and servers are autonomous computing entities that can exchange information (only) over the edges of . Micah et al introduce the class of symmetric, non-adaptive protocols and show several tight bounds on the trade-offs between the maximum load and the complexity (i.e. completion time and work complexity222The work complexity is the overall number of exchanged messages performed by the protocol.) of the proposed solutions. Informally, a protocol is said to be symmetric if the entities are anonymous, so all the clients (servers) act in the same way and, moreover, all possible request destinations are chosen independently and uniformly at random. The protocol is said to be non-adaptive if each client restricts itself to a fixed number of (possibly random) candidate servers in its neighborhood before communication starts. Symmetric, non-adaptive protocols have the practical merits to be easy to implement and more flexible [25]. Such solutions have interesting applications in Computer Science, such as load balancing in communication networks, request scheduling and hashing [1, 2, 13, 27].

We notice that efficient symmetric, non-adaptive protocols are known (only) for dense regular bipartite graphs and almost-tight lower bounds are known for this important class of parallel protocols [25, 4, 22] (see also Subsection 1.3 for a short description of such results).

The main goal of this paper does not consist of improving previous solutions with respect to specific complexity measures. Rather, still aiming at efficient solutions that achieve bounded maximum load333According to our parameter setting, the maximum load is clearly at least and we aim at keeping an bound for it., we focus on symmetric, non-adaptive Load-Balancing protocols that work over restricted, non dense graph topologies. This natural extension of previous work is inspired by possible network applications where: i) based on previous experiences, a client (a server) may decide to send (accept) the requests only to (from) a fixed subset of trusted servers (clients) and/or ii) clients and servers are placed over a metric space so that only non-random client-servers interactions turn out to be feasible because of proximity constraints. Such possible scenarios motivated previous important studies on sequential Load-Balancing algorithms [5, 17, 19]. To the best of our knowledge, efficient solutions for non-dense graphs are in fact available only for the classic sequential model. Here, each client request is scheduled once at time so that, for instance, the well-known best-of--choices strategy [3] can be applied: the loads of the servers are updated at each assignment and the new considered request is assigned to a server that has the current minimal load out of servers chosen independently and uniformly at random [5, 17, 19].

As for the parallel distributed model we adopt in this paper, in [4] Becchetti et al propose a symmetric, non-adaptive algorithm, named raes (for Request a link, then Accept if Enough Space), which is based on the well-known threshold criterion [25]. Informally, raes works in rounds, each consisting of two phases. Initially, each client has balls444The terms ball and request will be used interchangeably.. In the first phase of each round, if client has alive balls (i.e. to be still accepted by some server), selects servers independently and uniformly at random (with replacement) from . It then submits each of the balls to each selected client. In the second phase of the round, each server accepts all requests received in the first phase of the current round, unless doing so would cause it to exceed the limit of accepted balls, where the parameter is a suitable large constant; if this is the case, the server is said to be saturated and rejects all requests it received in the first phase of the current round. The algorithm completes when every client has no further balls to be submitted.

Observe that servers only give back Boolean answers to the clients requests and, moreover, if the algorithm terminates, the maximum load of the servers will be at most . Becchetti et al prove555Not related to our context, the main result in Becchetti et al shows that raes can be used to construct a bounded-degree expander subgraph of , w.h.p. that, over any -regular bipartite graph with , raes terminates within rounds and the total work is ,

with high probability

666As usual, we say that an event holds with high probability if a constant exists such that . (for short, w.h.p.).

### 1.2 Our Contribution

We consider a variant of raes, called saer (Stop Accepting if Exceeding Requests) that works like raes with the exception that, whenever a server , in the second phase of a given round, gets an overall load larger than , then rejects all requests arrived in the first phase of the current round and it becomes burned. Once a server gets burned, it will never accept any request for all successive rounds (see Algorithm 1 in Subsection 2.1).

Similarly to raes, if this new version terminates, then each server will have load at most and, hence, the main technical issue is to provide a bound (if any) on the number of rounds required by saer to let every client ball assigned to some server.

We prove that, for any almost-regular bipartite graph of degree (recall that ), it is possible to choose a sufficiently large constant , such that, for any constant request number , the protocol saer terminates within rounds and requires work, w.h.p.

Informally, for almost-regular bipartite graphs we mean bipartite graphs where the ratio between the minimum degree of the client set and the maximum degree of the server set is bounded by an arbitrary positive constant (see Theorem 1

for a formal definition). Observe that this notion of almost regularity allows a certain variance of the degrees of entities of the same type: just as a (“non-extremal”) example, we may consider a bipartite graph where: most of the clients have (minimal) degree

, while few of them have degree ; most of the servers have (maximal) degree , while few of them have degree .

Algorithm Analysis: An Overview. In the case of dense graphs, the key-fact exploited by Becchetti et al’s analysis of the raes algorithm [4] is the following. Since each client has servers in its neighborhood, it is possible to fix a sufficiently large constant , such that, at every round, the fraction of non-burned777Recall that a server is burned at round if its load is larger than . servers in the neighborhood of every client is always at least . Thanks to a basic counting argument, this fact holds deterministically and independently of the previous load configurations yielded by the process. So, every alive client request has probability at least to be accepted at each round: this allows to get a logarithmic completion time of raes on dense graphs.

In the case of non-dense graphs (i.e. for node degree

), the key property above does not hold deterministically: the fraction of non-burned servers in a fixed neighborhood is a random variable that can even take value

and, very importantly, it depends on the graph topology and on the random choices performed by the nodes during the previous rounds. This scenario makes the analysis considerably harder than that of the dense case. To cope with the above issues, for an arbitrary client , we look at its server neighborhood and we establish a clean recursive formula that describes the expected decreasing rate of the overall number of requests that the neighborhood of receives at time . This expectation is derived for round by conditioning on the sequence of the maximum fractions of burned servers in any client’s neighborhood produced by the algorithmic process at rounds . It turns out that, for a sufficiently large , the conditional expected decreasing rate of is exponential. Then, using a coupling argument, we derive a concentration bound for this rate that holds as long as the conditional expectation of keeps of magnitude . To complete our argument, we consider a further (and final) stage of the process888Notice that this stage is only in our analysis and not on the protocol, the latter being symmetric and non-adaptive. that starts when : here, we do not look anymore at the decreasing rate of , rather we show that, w.h.p, the fraction of burned servers in can increase, along a time window of length , by an overall additive factor of magnitude at most . Thanks to this fact, we can then show that the requests that survived the first stage have high chances to be assigned during this last stage if the latter lasts additional rounds.

Remark. We observe that, while the notion of burned server plays a crucial role in our analysis of saer, this notion is stronger than that of saturated servers adopted by the original protocol raes. Hence, our bounds on the termination time and the work complexity of the saer protocol can be easily extended to the original Becchetti et al’s protocol raes.

### 1.3 Previous Work

Load-Balance algorithms have been the subject of a long and extremely active line of research with important applications in several topics of Computer Science such as hashing, PRAM simulation, scheduling, and load balancing. A well-established and effective way to model such problems is by using the classic balls-into-bins processes. In such processes, there are typically balls that must be assigned to bins. In what follows, we use this framework to shortly describe those previous results which are more related to the setting of this work.

Sequential Algorithms on the Complete Bipartite Graph. It is well-known that if balls are thrown independently and uniformly at random into bins, the maximum load of a bin is bounded by , w.h.p (see for instance [26]). Azar et al. [3] proved the following breakthrough result. Assume the balls are assigned sequentially, one at a time and, for each ball, bins are chosen independently and uniformly at random, and the ball is assigned to the least full bin (with ties broken arbitrarily). This greedy strategy is also known as “best of choices”. Then, they prove that the final maximum load is , w.h.p. A similar result was also derived in a different version of the model by Karp et al in [18]. Berenbrink et al extended the analysis of the Greedy algorithm for the heavily-loaded case [12]. Then, several versions of this sequential algorithm have been studied by considering, for instance, non-uniform choices in the assignment process [14, 28, 29]. Moreover, several works addressed weighted balls [8, 11, 21], while the case of heterogeneous bins was studied in [29] . Recently, balls-into-bins processes have also been analyzed over game theoretic frameworks [7, 20].

Sequential Algorithms on Restricted Bipartite Graphs. Sequential algorithms for restricted balls-bins (i.e. client-server) topologies have been considered in [6, 17, 19]: here, each ball comes with its admissible cluster of bins and decides its strategy according to the current loads in its cluster determined by the choices of the previous balls . In this setting, Kenthapadi and Panigrahy [19] analyse the well-known sequential Greedy algorithm [3]: each client , in turn, chooses a pair of servers uniformly at random from and assigns the ball to the server having the current minimum load. They prove that, if the size of every is at least , then the Greedy algorithm achieves maximum load , w.h.p. In [17], Godfrey analyzed the sequential Greedy algorithm on the input model where a random cluster of servers is assigned to each client before the algorithm starts. In more detail, each client places its ball in a uniform-random server among those in with the current fewest number of balls. He proves that, if the random subsets are chosen according to any fixed almost-uniform distribution over the server set and the subsets have size , then the Greedy algorithm achieves optimal maximum load, w.h.p.. The overall work is , where . Further bounds are determined when the overall number of balls is smaller than the size of the server set . Berenbrink et al [6] consider the sequential framework adopted in [17]

and improve the analysis of the greedy algorithm along different directions. In detail, they consider weaker notions of almost-uniform distributions for the random server clusters assigned to the clients and, moreover, they also consider an input framework formed by deterministic, worst-case server clusters of size

. In the case where the overall number of balls is , with any and , they show that a suitable version of the sequential greedy algorithm achieves maximum load 1, w.h.p. Notice that the Greedy algorithm adopted in [19, 17] does require every server to give information to their clients about its current load: in some applications, this feature of the algorithm might yield critical issues in terms of privacy and security of the involved entities [16, 30]. On the other hand, we notice that, the simple threshold approach adopted by both Becchetti et al’s Algorithm saer and our version raes can be implemented in a fully decentralized fashion so that the clients cannot get a good approximation about the current load of the servers (see also the remark after Algorithm 1 in Subsection 2.2).

Parallel Algorithms on Restricted Bipartite Graphs. The only rigorous analysis of parallel protocols for restricted client-server topologies we are aware of is that in [4] by Becchetti et al for the raes protocol which has been discussed in the previous part of this introduction.

## 2 The saer Protocol and the Main Theorem

### 2.1 Preliminaries

In the Load-Balancing problem we have a system formed by a client-server bipartite graph where: the subset represents the set of clients, the subset represents the set of servers, and the edge set determines, for each client , the subset of servers the client can make a request to (i.e. it can send a ball999Recall that the terms ball and request will be used interchangeably.). At the beginning, each client has at most balls where is an arbitrary constant (w.r.t. ) that, in the sequel, we call request number, and the goal is to design a parallel distributed protocol that assigns each ball of every client to one server in .

According to previous work [25, 23], we study the Load-Balancing problem over the fully-decentralized computational model where bi-directional communications take place only along the edges in , in synchronous rounds. Moreover, clients may only send the ball IDs101010It suffices that each client keeps a local labeling of its ball set. , while servers may only answer each ball request with one bit: accept/reject. There is no global labeling of the nodes of : each node just keeps a local labeling of its links.

We analyze the cost of the proposed solution with respect to two complexity measures: the completion time which is defined as the number of rounds required by the protocol to successfully assign all the client balls to the servers; the (overall) work which is defined as the overall number of exchanged messages among the nodes of the network during the protocol’s execution.

For any node (client or server) , we denote its degree in as , i.e. and we define

 Δmin(C)=min{Δv: v∈C} and Δmax(S)=max{Δu: u∈S}.

### 2.2 A Simple Protocol for Load Balancing

As described in the introduction, the protocol we propose in this paper is a variant of the protocol raes introduced in [4] and it is based on a simple, non-adaptive threshold criterion the servers use to accept or rejects the incoming balls. The protocol is organized in rounds and, in turn, each round consists of two phases. For the sake of readability, we consider the case where every client has exactly balls, where the request number is an arbitrary fixed constant: the analysis of the general case () is in fact similar.

Remarks. Some simple facts easily follow from the protocol description above. (i) The protocol completes at round if and only if every client has successfully placed all its balls within round . If this happens, then the maximum load of the servers is clearly bounded by . The main technical question is thus to provide bounds in concentration on the completion time of the protocol and on its performed work. This issue will be the subject of the next section.
(ii) As for the decentralized implementation of saer(), we observe that the knowledge of the parameter (which, in turn, depends on the degree of the underlying almost-regular bipartite graph - see Theorem 1 in the next subsection) is required only by the servers while clients need no knowledge of global parameters. Interestingly enough, this fact implies that, for reasons of security and/or privacy, the servers may suitably choose so that the clients cannot get any good approximation of their current load.

### 2.3 Performance Analysis of saer

Using the definition of client-server bipartite graphs and that of Protocol saer we gave in the previous subsections, we can state our main technical contribution as follows.

###### Theorem 1.

Let and be two arbitrary constants in and let be an arbitrary constant in . Let be any bipartite graph such that and . Consider the Load-Balancing problem on with request number . Then, there is a sufficiently large constant ,111111Our analysis will show that the value of depends (only) on the constants and . such that saer() has completion time and its work is , w.h.p.

Since the notion of burned servers adopted in saer (see Definition 3) is stronger than the notion of saturated servers adopted in the Becchetti et al’s protocol raes([4] (see Section 1.1), it is easy to verify that the number of accepted client requests at every round of the raes process is stochastically dominated by the same random variable in the saer process. This fact implies the following result.

###### Corollary 2.

Under the same hypothesis of Theorem 1, there is a sufficiently large constant such that raes() has completion time and its work is , w.h.p.

A simple counting argument implies that for any bipartite graph while Theorem 1 requires the “almost-regularity” hypothesis . On the other hand, we emphasize that this condition allows a relative-large variance of the node degree. For instance, the theorem holds for a topology where: the minimum client degree and the maximum server degree are , some clients have degree , and some servers have (minimal) degree .

In the next section, we will prove Theorem 1 in the case of -regular bipartite graphs then, in Appendix D, we will show how to adapt the analysis for the more general graphs considered in the theorem. We decided to distinguish the two cases above for the sake of readability: the regular case essentially includes all the main technical ideas of our analysis while allowing a much simpler notation.

## 3 Proof of Theorem 1: The Regular Case

We prove here Theorem 1 for an arbitrary -regular bipartite graph where is any function in . Since the protocol saer makes a crucial use of burned servers, in what follows, we define this notion and some important random variables of the algorithmic process which are related to it. For each round and each server , let be the random variable indicating the number of balls that server receives at time .

###### Definition 3.

A server is burned at round if . Moreover, for any client , define as the fraction of burned servers in the neighborhood of at time , i.e.,

 St(v)=∣∣{u∈S:u∈N(v) and ∑ti=1rt(u)⩾cd}∣∣Δ.

We also define as the maximum fraction of burned nodes in any client’s neighborhood at round , i.e., .

The proof of Theorem 1 relies on the following result.

###### Lemma 4.

Let for an arbitrary constant in and let be an arbitrary constant in . Then, for any and for a sufficiently large , with probability at least , it holds that for all rounds the fraction of burned nodes satisfies

 St⩽12. (1)

We observe that the bound on the completion time stated in Theorem 1 for the regular case with is a simple consequence of the above lemma. Indeed, consider any fixed ball of a client . By choosing121212Since , the suitable value for can be fixed by the servers by looking only at . We also remark our analysis does not optimize several aspects such as the bound on and its relation with . the parameter as indicated by Lemma 4, (1) implies that the probability the ball is not accepted for all rounds , conditioning on the bound given in Lemma 4, is . Then, by applying the union bound for all balls and all clients and considering the probability of the conditioning event, we get that saer() completes in rounds, with probability at least .

The next subsection is devoted to the proof of Lemma 4.

### 3.1 Proof of Lemma 4

In this subsection, we assume that the graph is -regular and for an arbitrary constant . We start by defining the random variables that describe the saer process.

###### Definition 5.

For each round and for each , let be the overall number of balls that all the servers in the neighborhood receives at round ; moreover, let be the maximum number of balls that any server neighborhood receives at round . Formally,

 rt(N(v))=∑u∈N(v)rt(u)andrt=maxv∈Crt(N(v)). (2)

Observe that if a server is burned at a given round then it must have received more than balls since the start of the process. So, for each it holds that

 St(v)⩽1cdΔt∑i=1ri(N(v)). (3)

We also name the expression in the r.h.s. of the inequality above since it will be often used in our analysis.

###### Definition 6.

Let

 Kt(v)=1cdΔt∑i=1ri(N(v))  \emph{ and }  Kt=1cdΔt∑i=1ri.

Notice that the above definitions and (3) easily imply that

 St⩽Kt  and  Kt=Kt−1+1cdΔrt, for each t⩾1. (4)

We next write the random variable in terms of more “elementary” random variables.

###### Definition 7.

For each client , let be the binary random variable indicating whether the ’s -th ball is still alive at round , i.e., it has still not been accepted by some server at the beginning of round , i.e.,

 a(i)t(v)={1 if the v's i-th ball is still alive at% round t0 otherwise
###### Definition 8.

For each client, and , let be the binary random variable indicating whether the (random) contacted server for the ’s -th ball at round is , i.e.,

 z(i)t(v,u)={1\small\,\,if the contacted server for the% v's i-th ball at round t is u0 otherwise (5)

According to the above definitions, for each client , we can write

 rt(N(v))=∑u∈N(v)rt(u)=∑u∈N(v)∑w∈N(u)d∑i=1a(i)t(w)⋅z(i)t(w,u). (6)

We remark that the variable is defined at every round , even when the corresponding request of node has been already accepted in some previous round. The above random variables have the following useful properties.

###### Lemma 9.
1. For each , , and , the random variables and are mutually independent.

2. Let . For each and any choice of positive reals for , it holds

 Pr(a(i)t(v)=1|S1⩽s1,…,St−1⩽st−1)⩽t−1∏j=0sj. (7)
3. The random variables are negatively associated 131313The definition of negative association is given in Definition 15 in Appendix A. This property allows to apply concentration bounds (see Theorem 16 in the Appendix)..

###### Proof of Lemma 9.

Claim follows from the observation that saer is non-adaptive and symmetric and, hence, at each round, each client chooses the (random) destination of its -th request regardless of the value of while the latter determines whether the request is really sent or not.
As for Claim 2, notice that iff ’s -th request have been rejected at each previous round, and this happens iff the destination of the -th request is a burned server.
Finally, Claim 3 follows from the fact that, for each , if for then, for any with , it holds that . Moreover, for each fixed the random variables are independent. ∎

Step-By-Step Analysis via Induction. We first consider the first round of the process and give the following bound on the maximum number of balls a client neighborhood can receive.

###### Lemma 10 (First round).

For all , w.h.p.

 r1⩽2dΔ  \emph{ and }  K1⩽2c. (8)
###### Proof of Lemma 10.

For each we can write as in (6) and since each is a Bernoulli random variable of parameter , . Thanks to Claim 3 of Lemma 9, we can apply Chernoff bound for negatively associated random variables with (Theorem 16 in the Appendix) and get

 Pr(r1(N(v))⩾2dΔ)⩽e−dΔ3. (9)

According to Definition 2 and Definition 6, from (9) and by the union bound, we get that

 Pr(r1⩽2dΔ)⩾1−ne−dΔ3  and  Pr(K1⩽2c)⩾1−ne−dΔ3. (10)

Since and , the above bounds conclude the proof. ∎

The next result is a key step of the proof of Lemma 4. We look at a fixed round of the random process and derive, for each client , an upper bound in concentration on the random variable , assuming some fixed bounds on the variables . This bound shows that, conditional on the bound sequence above, the number of alive balls in decreases, at each round , by a factor that explicit depends on the fraction of burned servers at round .

###### Lemma 11 (Round t⩾2 by induction).

Let and . For each choice of positive reals with and for all ,

 E[rt(N(v))∣K1⩽k1,…Kt−1⩽kt−1]⩽Δdt−1∏j=0kj. (11)

Moreover, for any such that ,

 Pr(rt(N(v))⩾2μ∣K1⩽k1,…,Kt−1⩽kt−1)⩽e−μ3. (12)
###### Proof of Lemma 11.

By expressing as the sum in (6), we can apply the first two claims in Lemma 9 and get

 E[rt(N(v))∣K1⩽k1,…,Kt−1⩽kt−1]⩽dΔt−1∏j=0kj. (13)

In order to get the claimed bound in concentration, we need to apply the Chernoff bound to the sum of random variables of the form . To this aim, we know that for each and each , is a Bernoulli random variables of parameter . However, the distributions of are rather difficult to analyze since there are several correlations among the random variables in . To cope with this issue, we exploit Claim 2 of Lemma 9 and construct ad-hoc independent Bernoulli random variables, for which:

 Pr(X(i)t(w)=1∣K1⩽k1,…,Kt−1⩽kt−1)=t−1∏j=0kj (14)

and such that each stochastically dominates . Formally, thanks to (14) and Claim of Lemma 9, we can define a coupling141414See for instance Chapter of [24]. between and such that

 Pr(⋂i∈[d],w∈C{a(i)t(w)⩽X(i)t(w)}∣K1⩽k1,…,Kt−1⩽kt−1)=1. (15)

The detailed construction of the above coupling is given in Appendix C. By using the coupling, from (15), we get

 Pr(rt(N(v))⩾2μ∣K1⩽k1,…,Kt−1⩽kt−1) ⩽Pr(d∑i=1∑u∈N(v)∑w∈N(u)X(i)t(w)⋅z(i)t(w,u)⩾2μ∣K1⩽k1,…,Kt−1⩽kt−1)⩽ e−μ3, (16)

where is any positive real that satisfies . In detail, (16) follows from (15) and the inequality (16) follows by applying the Chernoff bound with for negatively associated random variables (see Theorem 16 in the Appendix). Indeed, Claim of Lemma 9 and (14) imply that the random variables

 (X(i)t(w)⋅z(i)t(w,u))i∈[d],u∈N(v),w∈N(u),

conditioning on the event , are distributed as Bernoulli’s one of parameter and they are negatively associated (see Definition 15 in the Appendix).

Wrapping up: Process Analysis in Two Time Stages. Lemmas 10 and 11 provide the decreasing rate of the number of for each conditioning on the events “” for a generic sequence ().

We now need to derive the specific sequence of that effectively works for our process and that leads to Lemma 4. Moreover, we notice that (11) in Lemma 11 (only) allows a sufficiently strong concentration as long as the bound we can use on the expectation of keeps of order , while we clearly need to get an effective concentration bound until this value reaches .

To address the issues above, we split our analysis in two time stages. Roughly speaking, the first stage proceeds as long as the expectation of is and we show it is characterized by an exponential decreasing of (see Lemma 12 and Lemma 13). In the second stage, our technical goal is instead to show that the fraction of burned nodes in keeps bounded by some constant , while neglecting the decreasing rate of the balls received by (since we cannot anymore get strong concentration bounds on this random variable). Essentially, our analysis shows that: i) the process starts this second stage when the expectation of is ; ii) during a subsequent window of rounds, the fraction of burned nodes in keeps bounded by some constant and, hence, all the alive requests will be successfully assigned, w.h.p.

As for the first stage, we consider the sequence defined by the following recurrence

 {γ0=1γt=2c∑ti=1∏i−1j=0γj  for t⩾1. (17)

In Appendix B, we will prove the following properties.

###### Lemma 12.

For each , let be the sequence defined by the recurrence (17). Then, if we take such that , we have the following facts:

• is increasing;

• for each , ;

• for each , .

The next lemma provides some useful concentration bounds on the random variables and for the first stage.

###### Lemma 13 (Stage I: Fast decreasing of rt(N(v))).

For any and for a sufficiently large , an integer exists such that, for each ,

 Pr(rt⩽2dΔt−1∏j=0γj∣K1⩽γ1,…,Kt−1⩽γt−1)⩾1−1n3 (18) andPr(Kt⩽γt∣K1⩽γ1,…,Kt−1⩽γt−1)⩾1−1n3, (19)

where is defined by the recurrence (17).

###### Proof of Lemma 13.

We consider as in (17) and apply Lemma 11 with . We get, for each ,

 Pr(rt(N(v))⩾2dΔt−1∏j=0γj∣K1⩽γ1,…,Kt−1⩽γt−1)⩽e−13dΔ∏t−1j=0γj.

From (4), we know that , so, using the union bound over all clients , we get

 Pr(Kt⩽γt∣K1⩽γ1,…,Kt−1⩽γt−1)⩾Pr(rt⩽2Δdt−1∏j=0γj∣K1⩽γ1,…,Kt−1⩽γt−1) ⩾1−ne−13Δd∏t−1j=0γj, (20)

where in the first inequality we also used the definition of given in (17). Lemma 12 and the fact that ensure that for a sufficiently large we can take as the smallest positive for which

 ΔdT−1∏j=0γj⩽12logn (21)

thus

 Δdt−1∏j=0γj>12lognfor all t

Moreover, again from Lemma 12, if we take we have that and so, from (21), we can say that such a verifies

 T⩽12logdΔ12logn.

Finally, using (22) in (20), we get (19) for each . ∎

The next result characterizes the number of burned servers along the second, final stage of our process analysis.

###### Lemma 14 (Stage II: The fraction of burned servers keeps small).

For any and for a sufficiently large , an integer exists (it can be the same stated in the previous lemma) such that, for each in the range ,

 Pr(Kt⩽δt∣K1⩽γ1,…,KT−1⩽γT−1,KT⩽δT,…,Kt−1⩽δt−1)⩾1−1n3, (23)

where is defined in (17) and is defined by the recurrence

 δt=14+24tlogncdΔ, for t⩾T. (24)
###### Proof of Lemma 14.

As in the proof of Lemma 13, let be the first integer such that

 ΔdT−1∏j=0γj⩽12logn. (25)

Observe first that, for each , since , for , we have that . So, for each such that , (25) and Lemma 11 imply

 E[rt(N(v))∣K1⩽γ1,…,KT⩽δT,…,Kt−1⩽δt−1]⩽dΔT−1∏j=0γtt−1∏i=Tδi⩽dΔT−1∏j=0γt⩽12logn.

Hence, we can apply (12) in Lemma 11 with
and and , obtaining

 Pr(rt(N(v))⩾24logn∣K1⩽γ1,…KT−1⩽γT−1,KT⩽δT,…,Kt⩽δt)⩽1n4.

Finally, from (4) we know that , so using the definition of in (24) and the union bound over all the clients , we get (23) for each . ∎

Lemma 13 and 14 imply Lemma 4

. Indeed, for the chain rule, taking

, and , we get

 Pr(∩T−1t=1{Kt⩽γt}⋂∩T′t=T{Kt⩽δt})⩾(1−1n3)T′⩾1−T′1n3⩾1−1n2, (26)

where in the first inequality of (26) we used the chain rule, Lemma 13 and 14 while the second last inequality of (26) follows from the binomial inequality, i.e., for each and for each , .

In conclusion, we have shown that for all and that for all such that , with probability at least . So, recalling that , since we have that, from (24) and Lemma 12, for all such that , with probability at least .

### 3.2 The Work Complexity of saer

To analyze the overall work performed by saer we proceed using an approach similar to that in the analysis of the Becchetti et al’s algorithm raes. For each and each ball , recall the random variable introduced in Definition 7. Then, the random variable counting the total number of requests performed by the clients (plus the relative answers by the servers) to assign the balls can be easily bounded by

 W=2⋅∞∑t=1d∑i=1∑v∈Ca(i)t(v). (27)

To prove that w.h.p., we show that, for any fixed and any , it holds

 Pr(d∑i=1∑v∈Ca(i)t(v)>45k∣d∑i=1∑v∈Ca(i)t−1(v)=k)⩽e−k25cd. (28)

To this aim, we use the method of bounded differences (see Theorem 17 in the Appendix). We notice that the random variable , conditioning on a number of alive balls at the end of round , can be written as -Lipschitz function of independent random variables. Indeed, we define the random variables as the set of alive balls at the end of round and the random variables , taking values in , indicating the server-destination in the alive ball tries to connect to at round . The random variables with are mutually independent, and we can write, given the number of alive balls at round ,

 d∑i=1∑v∈Ca(i)t(v)=f(Yi1,…,Yik).

The function is -Lipschitz because, if we change one of the values , we are changing the destination of a ball from some to some . If has received less than requests since the start of the process, the change of the destination of the -th ball from to would not have any impact. On the other hand, in the worst case, at most balls that try to settle in switch from settled to not settled. A symmetric argument holds for and so if

 Y=(vi1,…,vij,…,vik) and Y′=(vi1,…,v′ij,…,vik)

then

 |f(Y)−f(Y′)|⩽2cd.

Lemma 4 implies that at each round the fraction of burned nodes in any node’s neighborhood remains bounded by with probability at least . Therefore, for each holds

 E[d∑i=1∑v∈Ca(i)t(v)∣d∑i=1∑v∈Ca(i)t−1(v)=k]⩽k2+1n2

and we can apply Theorem 17 with (since ) and , obtaining (28).

From (28) and the chain rule, it follows that for rounds the number of alive balls decreases at each round by a factor , w.h.p. Hence, at the end of the -th round, the number of alive balls is smaller than , w.h.p. From Theorem 1, we know that the remaining alive balls are assigned within round: this implies an additional work of . Observe that the work until round is . Hence, for any constant , we get the claimed linear bound for the work complexity of saer().

## 4 Conclusions and Future Work

We devise a simple parallel load-balancing protocol and we give a probabilistic analysis of its performances. The main novelty of this paper lies in considering client-server bipartite graphs that are much more sparse than those considered in previous work. This new setting can model important network scenarios where proximity and/or trust issues force very restricted sets of admissible client-server assignments. From a technical point of view, such sparse topologies yield new probabilistic issues that make our analysis more challenging than the dense case and rather different from the previous ones.

Several interesting open questions are left open by our paper. In particular, we are particularly intrigued by the analysis of our protocol (or simple variants of it) over graphs with degree and/or in the presence of a dynamic framework where, for instance, the client requests arrive on line and some random topology change may happen during the protocol execution. As for the latter, we believe that the simple structure of saer can well manage such a dynamic scenario and achieves a metastable regime with good performances.

## References

• [1] J. Aspnes, Y. Azar, A. Fiat, S. Plotkin, and O. Waarts (1997-05) On-line routing of virtual circuits with applications to load balancing and machine scheduling. J. ACM 44 (3), pp. 486–504. External Links: ISSN 0004-5411, Link, Document Cited by: §1.1.
• [2] B. Awerbuch, M. T. Hajiaghayi, R. D. Kleinberg, and T. Leighton (2005) Online client-server load balancing without global information. In Proceedings of the Sixteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’05, USA, pp. 197–206. External Links: ISBN 0898715857 Cited by: §1.1.
• [3] Y. Azar, A. Z. Broder, A. R. Karlin, and E. Upfal (1994) Balanced allocations (extended abstract). In

Proceedings of the Twenty-Sixth Annual ACM Symposium on Theory of Computing

,
STOC ’94, New York, NY, USA, pp. 593–602. External Links: ISBN 0897916638, Link, Document Cited by: §1.1, §1.3, §1.3, §1.3.
• [4] L. Becchetti, A. Clementi, E. Natale, F. Pasquale, and L. Trevisan (2020) Finding a bounded-degree expander inside a dense one. In Proceedings of the Thirty-First Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’20, USA, pp. 1320–1336. Cited by: Parallel Load Balancing on Constrained Client-Server Topologies , §1.1, §1.1, §1.2, §1.3, §2.2, §2.3.
• [5] P. Berenbrink, A. Brinkmann, T. Friedetzky, and L. Nagel (2010-04) Balls into non-uniform bins. In 2010 IEEE International Symposium on Parallel Distributed Processing (IPDPS), Vol. , pp. 1–10. External Links: Document, ISSN 1530-2075 Cited by: §1.1.
• [6] P. Berenbrink, A. Brinkmann, T. Friedetzky, and L. Nagel (2012-02) Balls into bins with related random choices. J. Parallel Distrib. Comput. 72 (2), pp. 246–253. External Links: ISSN 0743-7315, Link, Document Cited by: §1.3.
• [7] P. Berenbrink, T. Friedetzky, L. A. Goldberg, P. Goldberg, Z. Hu, and R. Martin (2006) Distributed selfish load balancing. In Proceedings of the Seventeenth Annual ACM-SIAM Symposium on Discrete Algorithm, SODA ’06, USA, pp. 354–363. External Links: ISBN 0898716055 Cited by: §1.3.
• [8] P. Berenbrink, T. Friedetzky, Z. Hu, and R. Martin (2008) On weighted balls-into-bins games. Theoretical Computer Science 409 (3), pp. 511 – 520. External Links: ISSN 0304-3975, Document, Link Cited by: §1.3.