Join-Idle-Queue with Service Elasticity: Large-Scale Asymptotics of a Non-monotone System

by   Debankur Mukherjee, et al.
TU Eindhoven

We consider the model of a token-based joint auto-scaling and load balancing strategy, proposed in a recent paper by Mukherjee, Dhara, Borst, and van Leeuwaarden (SIGMETRICS '17, arXiv:1703.08373), which offers an efficient scalable implementation and yet achieves asymptotically optimal steady-state delay performance and energy consumption as the number of servers N→∞. In the above work, the asymptotic results are obtained under the assumption that the queues have fixed-size finite buffers, and therefore the fundamental question of stability of the proposed scheme with infinite buffers was left open. In this paper, we address this fundamental stability question. The system stability under the usual subcritical load assumption is not automatic. Moreover, the stability may not even hold for all N. The key challenge stems from the fact that the process lacks monotonicity, which has been the powerful primary tool for establishing stability in load balancing models. We develop a novel method to prove that the subcritically loaded system is stable for large enough N, and establish convergence of steady-state distributions to the optimal one, as N →∞. The method goes beyond the state of the art techniques -- it uses an induction-based idea and a "weak monotonicity" property of the model; this technique is of independent interest and may have broader applicability.



There are no comments yet.


page 1

page 2

page 3

page 4


Optimal Service Elasticity in Large-Scale Distributed Systems

A fundamental challenge in large-scale cloud networks and data centers i...

Scalable load balancing in networked systems: A survey of recent advances

The basic load balancing scenario involves a single dispatcher where tas...

Scalable Load Balancing in Networked Systems: Universality Properties and Stochastic Coupling Methods

We present an overview of scalable load balancing algorithms which provi...

Stability of Traffic Load Balancing on Wireless Complex Networks

Load balancing between adjacent base stations (BS) is important for homo...

Asymptotically Optimal Load Balancing in Large-scale Heterogeneous Systems with Multiple Dispatchers

We consider the load balancing problem in large-scale heterogeneous syst...

Flexible Load Balancing with Multi-dimensional State-space Collapse: Throughput and Heavy-traffic Delay Optimality

Heavy traffic analysis for load balancing policies has relied heavily on...

Geographical Load Balancing across Green Datacenters

"Geographic Load Balancing" is a strategy for reducing the energy cost o...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Background and motivation. Load balancing and auto-scaling are two principal pillars in modern-day data centers and cloud networks, and therefore, have gained renewed interest in past two decades. In its basic setup, a large-scale system consists of a pool of large number of servers and a single dispatcher, where tasks arrive sequentially. Each task has to be instantaneously assigned to some server or discarded. Load balancing algorithms primarily concern design and analysis of algorithms to distribute incoming tasks among the servers as evenly as possible, while using minimal instantaneous queue length information. At the same time, a big proportion of the tasks processed by these data centers come with business-critical performance requirements. This forces service providers to increase their capacity at a tremendous rate to cope up with the high-demand period in the presence of a time-varying demand pattern. Consequently, the energy consumption by the servers in these huge data centers has risen dramatically and become a dominant factor in managing data center operations and cloud infrastructure platforms. Auto-scaling provides a popular paradigm for automatically adjusting service capacity in response to demand while meeting performance targets.

Load balancing in large systems. The study of load balancing schemes in large-scale systems have a very rich history, and for decades, a lot of research have been conducted in understanding the fundamental trade-off between delay-performance and communication overhead per task. The so-called ‘power-of-d’ schemes, where each arrival is assigned to the shortest among randomly chosen queues, provide surprising improvements for over purely random routing () while maintaining the communication overhead as low as  per task. This scheme along with its many variations have been studied extensively in [31, 17, 4, 3, 1, 6, 19, 33] and many more. Relatively recently, join-the-idle queue (JIQ) scheme was proposed in [15], where an arriving task is assigned to an idle server (if any), or in case all servers are busy, it is assigned to some queue uniformly at random. The JIQ scheme has a low-cost token-based implementation that involves only communication overhead per task. Large-scale asymptotic results in [27, 28]

show that under Markovian assumptions, the JIQ policy achieves a zero probability of wait for any fixed subcritical load per server in a regime where the total number of servers grows large. It should be noted that the results in 

[27, 28] even hold for considerably more general scenarios, viz. decreasing hazard rate service time distributions and heterogeneous servers pools. Recently, it is further shown [7] that when the average load per server , the large-scale asymptotic optimality of JIQ is preserved even under completely general service time distributions. Results in [18] indicate that under Markovian assumptions, the JIQ policy has the same diffusion-limit as the Join-the-Shortest-Queue (JSQ) strategy, and thus achieves diffusion-level optimality. These results show that the JIQ policy provides asymptotically optimal delay performance in large-scale systems, while only involving minimal communication overhead (at most one message per task on average). We refer to [30] for a recent survey on load balancing schemes.

Auto-scaling with a centralized queue. Queue-driven auto-scaling techniques have been widely investigated in the literature [2, 8, 12, 14, 13, 10, 11, 23, 29, 32]. In systems with a centralized queue it is very common to put servers to ‘sleep’ while the demand is low, since servers in sleep mode consume much less energy than active servers. Under Markovian assumptions, the behavior of these mechanisms can be described in terms of various incarnations of M/M/N queues with setup times. There are several further recent papers which examine on-demand server addition/removal in a somewhat different vein [22, 21]. Generalizations towards non-stationary arrivals and impatience effects have also been considered recently [23]. Unfortunately, data centers and cloud networks with millions of servers are too complex to maintain any centralized queue, and it involves prohibitively high communication burden to obtain instantaneous system information even for a small fraction of servers.

Joint load balancing and auto-scaling in distributed systems. Motivated by all the above, a token-based joint load balancing and auto scaling scheme called TABS was proposed in [20], that offers an efficient scalable implementation and yet achieves asymptotically optimal steady-state delay performance and energy consumption as the number of servers . In [20], the authors left open a fundamental question: Is the system with a given number of servers stable under TABS scheme? The analysis in [20] bypasses the issue of stability by assuming that each server in the system has a finite buffer capacity. Thus, it remains an important open challenge to understand the stability property of the TABS scheme without the finite-buffer restriction.

Key contributions and our approach. In this paper we address the stability issue for systems under the TABS scheme without the assumption of finite buffers, and examine the asymptotic behavior of the system as becomes large. Analyzing the stability of the TABS scheme in the infinite buffer scenario poses a significant challenge, because the stability of the finite- system, i.e., the system with finite number of servers under the usual subcritical load assumption is not automatic. In fact, as we will further discuss in Remark 1 below in detail, even under subcritical load, the system may not be stable for all . Our first main result is that for any fixed subcritical load, the system is stable for large enough . Further, using this large- stability result in combination with mean-field analysis, we establish convergence of the sequence of steady-state distributions as .

The key challenge in showing large- stability for systems under the TABS scheme stems from the fact that the occupancy state process lacks monotonicity. It is well-known that monotonicity is a powerful primary tool for establishing stability of load balancing models [27, 28, 31, 3]. In fact, process monotonicity is used extensively not only for stability analysis and not only in queueing literature – for example, many interacting-particle-systems’ results rely crucially on monotonicity; see e.g. [9]. The lack of monotonicity immediately complicates the situation, as for example in  [7, 25]. Specifically, when the service time distribution is general, it is the lack of monotonicity that has left open the stability questions for the power-of-d scheme when system load  [3], and the JIQ scheme when  [7]. We develop a novel method for proving large- stability for subcritically loaded systems, and using that we establish the convergence of the sequence of steady-state distributions as . Our method uses an induction-based idea, and relies on a “weak monotonicity” property of the model, as further detailed below. To the best of our knowledge, this is the first time both the traditional fluid limit (in the sense of large starting state) and the mean-field fluid limit (when the number of servers grows large) are used in an intricate manner to obtain large- stability results.

To establish the large- stability, we actually prove a stronger statement. We consider an artificial system, where some of the queues are infinite at all times. Then, loosely speaking, we prove that the following holds for all sufficiently large : If the system with servers contains servers with infinite queue lengths, , then (i) The subsystem consisting of the remaining (i.e., finite) queues is stable, and (ii) When this subsystem is at steady state, the average rate at which tasks join the infinite queues is strictly smaller than that at which tasks depart from them. Note that the case corresponds to the desired stability result.

The use of backward induction in facilitates proving the above statement. For a fixed , first we introduce the notion of a fluid sample path (FSP) for systems where some queues might be infinite. The base case of the backward induction is when , and assuming the statement for , we show that it holds for . We use the classical fluid-stability argument (as in [24, 26, 5]) in order to establish stability for the system where the number of infinite queues is

. As mentioned above, here the notion of the traditional FSP is needed to be suitably extended to fit to the systems where some servers have infinite queue lengths. Loosely speaking, for the fluid-stability, the ‘large queues’ behave as ‘infinite queues’ for which induction statement provides us with the drift estimates. Also, to calculate the drift of a queue in the fluid limit for fixed but

large enough

, we use the mean-field analysis. A more detailed heuristic roadmap of the above proof argument is presented in Subsection 

4.1. This technique is of independent interest, and potentially has a much broader applicability in proving large- stability for non-monotone systems, where the state-of-the-art results have remained scarce so far.

Organization of the paper. The rest of the paper is organized as follows. In Section 2 we present a detailed model description, state the main results, and provide their ramifications along with discussions of several proof heuristics. The full proof of the main results are deferred till Section 3. Section 4 introduces an inductive approach to prove large- stability result. We present the proof of the large-scale system (when ) using mean-field analysis in Section 5. Finally, we make a few brief concluding remarks in Section 6.

2 Model description and main result

In this section, first we will describe the system and the TABS scheme in detail, and then state the main results and discuss their ramifications.

Consider a system of

 parallel queues with identical servers and a single dispatcher. Tasks with unit-mean exponentially distributed service requirements arrive as a Poisson process of rate

with . Incoming tasks cannot be queued at the dispatcher, and must immediately and irrevocably be forwarded to one of the servers where they can be queued. Each server has an infinite buffer capacity. The service discipline at each server is oblivious to the actual service requirements (e.g., FCFS). A turned-off server takes an Exponentially distributed time with mean (to be henceforth denoted as Exp) time (setup period) to be turned on. We now describe the token-based joint auto-scaling and load balancing scheme called TABS (Token-based Auto Balance Scaling), as introduced in [20].

TABS scheme [20].

  • When a server becomes idle, it sends a ‘green’ message to the dispatcher, waits for an time (standby period), and turns itself off by sending a ‘red’ message to the dispatcher (the corresponding green message is destroyed).

  • When a task arrives, the dispatcher selects a green message at random if there are any, and assigns the task to the corresponding server (the corresponding green message is replaced by a ‘yellow’ message). Otherwise, the task is assigned to an arbitrary busy server (and is lost if there is none), and if at that arrival epoch there is a red message at the dispatcher, then it selects one at random, and the setup procedure of the corresponding server is initiated, replacing its red message by an ‘orange’ message. Setup procedure takes

    time after which the server becomes active.

  • Any server which activates due to the latter event, sends a green message to the dispatcher (the corresponding orange message is replaced), waits for an time for a possible assignment of a task, and again turns itself off by sending a red message to the dispatcher.

Figure 1: Illustration of server on-off decision rules in the TABS scheme, along with message colors and state variables as given in [20].

As described in [20], the TABS scheme gives rise to a distributed operation in which servers are in one of four states (busy, idle-on, idle-off, or standby), and advertize their state to the dispatcher via exchange of tokens. Figure 1 illustrates this token-based exchange protocol. Note that setup procedures are never aborted and continued even when idle-on servers do become available.

Notation. For the system with servers, let denote the queue length of server at time , , and denote the system occupancy state, where is the number of servers with queue length greater than or equal to at time , including the possible task in service, . Also, let and denote the number of idle-off servers and servers in setup mode at time , respectively. Note that the process provides a Markovian state description by virtue of the exchangeablity of the servers. It is easy to see that, for any fixed

, this process is an irreducible countable-state Markov chain. Therefore, its positive recurrence, which we refer to as

stability, is equivalent to ergodicity and to the existence of unique stationary distribution. Further, let denote the number of idle-on servers at time

. We will focus upon an asymptotic analysis, where the task arrival rate and the number of servers grow large in proportion. The

mean-field fluid-scaled quantities are denoted by the respective small letters, viz. , , , and . Notation for the conventional fluid-scaled occupancy states for a fixed  will be introduced later in Subsection 3.1. For brevity in notation, we will write and . Let

denote the space of all mean-field fluid-scaled occupancy states, so that takes value in for all . Endow with the product topology, and the Borel -algebra , generated by the open sets of . For any complete separable metric space , denote by , the set of all -valued càdlàg (right continuous with left limit exists) processes. By the symbol ‘

’ we denote convergence in probability for real-valued random variables.

We now present our first main result which states that for any fixed choice of the parameters, a sub-critically loaded system under TABS scheme is stable for large enough .

Theorem 2.1.

For any fixed , and , the system with servers under the TABS scheme is stable (positive recurrent) for large enough .

Theorem 2.1 is proved in Section 3.

Remark 1.

It is worthwhile to mention that the ‘large-’ stability as stated in Theorem 2.1 above is the best one can hope for. In fact, for fixed and , there are values of the parameters and such that the system under the TABS scheme may not be stable. To elaborate further on this point, consider a system with 2 servers A and B, and . Let server A start with a large queue, while the initial queue length at server B be small. In that case, observe that every time the queue length at server B hits 0, with positive probability, it turns idle-off before the next arrival epoch. Once server B is idle-off, the arrival rate into server A becomes . Thus, before server B turns idle-on again, the expected number of tasks that join server A is given by at least , while the expected number of departures is . Thus the queue length at server A increases by , which can be very large if is small. Further note that once server B becomes busy again, both servers receive an arrival rate , and hence it is more likely that server B will empty out again, repeating the above scenario. The situation becomes better as increases. Indeed for large , if ‘too many’ servers are idle-off and ‘too many’ tasks do not find an idle queue to join, the system starts producing servers in setup mode fast enough, and as a result, more and more servers start becoming busy. The above heuristic has been illustrated in Figure 2 with examples of three scenarios with small, moderate, and large values of , respectively.

Figure 2: (Top left) Illustration of instability of the TABS scheme for via sample paths of the queue length process. (Top right) Sample paths of the maximum and second maximum queue length processes in an intermediate system () for the same parameter choices. (Bottom) The system becomes stable for a large enough system ().

In the next theorem we will identify the limit of the sequence of stationary distributions of the occupancy processes as . In particular, we will establish that under sub-critical load, for any fixed , , the steady-state occupancy process converges weakly to the unique fixed point. (For the finite buffer scenario this was proved in [20, Proposition 3.3].) Denote by and the random values of and in the steady-state, respectively.

Theorem 2.2.

For any fixed , , and , the sequence of steady states converges weakly to the fixed point as , where

Note that the fixed point is such that the probability of wait vanishes as and the asymptotic fraction of active servers is minimum possible, and in this sense, the fixed point is optimal. Thus, Theorem 2.2 implies that the TABS scheme provides fluid-level optimality for large-scale systems in terms of delay performance and resource utilization, while involving only communication overhead per task.

3 Proofs of the main results

In Subsection 3.1 we introduce the notion of conventional fluid scaling (when the number of servers is fixed) and fluid sample paths (FSP), and state Proposition 3.1 that implies Theorem 2.1 as an immediate corollary. Subsection 3.2 contains two key results for sequence of systems with increasing system size, i.e., number of servers , and proves Theorem 2.2.

3.1 Conventional fluid limit for a system with fixed N

In this subsection first we will introduce a notion of fluid sample path (FSP) for finite- systems where some of the queue lengths are infinite. We emphasize that this is conventional fluid limit, in the sense that the number of servers is fixed, but the time and the queue length at each server are scaled by some parameter that goes to infinity.

Loosely speaking, conventional fluid limits are usually defined as follows: For a fixed , consider a sequence of systems with increasing initial norm (total queue length) say. Now scale the queue length process at each server and the time by . Then any weak limit of this sequence of (space and time) scaled processes is called an FSP. Observe that this definition is inherently not fit if the system has some servers whose initial queue length is infinity. Thus we introduce a suitable notion of FSP that does not require the scaled norm of the initial state to be 1. We now introduce a rigorous notion of FSP for systems with some of the queues being infinite.

Fluid limit of a system with some of the queues being infinite. Consider a system of servers with indices in , among which servers with indices in have infinite queue lengths. Now consider any sequence of systems indexed by such that , and


be the corresponding scaled processes. For fixed , the scaling in (3.1) will henceforth be called as the conventional fluid-scaled queue length process. Also, for the -th system, let and denote the cumulative number of arrivals to and departures from server with and being the corresponding fluid-scaled processes, . We will often omit the superscript when it is fixed from the context.

Now for any fixed , suppose the (conventional fluid-scaled) initial states converge, i.e., , for some fixed such that and for . Then a set of uniformly Lipschitz continuous functions on the time interval (where is possibly infinite) with the convention for all , is called a fluid sample path (FSP) starting from , if for any subsequence of there exists a further subsequence (which we still denote by ) such that with probability 1, along that subsequence the following convergences hold:

  1. For all , and , u.o.c.

  2. For , u.o.c.

Note that the above definition is equivalent to convergence in probability to the unique FSP. For any FSP almost all points (with respect to Lebesgue measure) are regular, i.e., for all , has proper left and right derivatives with respect to , and for all such regular points

Infinite queues as part of an FSP. The arrival and departure functions and are well-defined for each queue, including infinite queues. Of course, the derivative for an infinite queue makes no direct sense (because an infinite queue remains infinite at all times). However, we adopt a convention that , for all queues, including the infinite ones. For an FSP, is sometimes referred to as a “drift” of (finite or infinite) queue at time .

We are now in a position to state the key result that establishes the large- stability of the TABS scheme.

Proposition 3.1.

The following holds for all sufficiently large . For each , consider a system where servers with indices in have infinite queues, and the remaining queues are finite. Then, for each , there exists , such that the following properties hold ( and other constants specified below, also depend on ).

  1. For any such that and for , there exists and a unique FSP on the interval , which has the following properties:

    1. If at a regular point , with , then for all .

    2. For any , if for some , then for all .

    3. .

  2. The subsystem with finite queues is stable.

  3. When the subsystem with finite queues is in steady state, the average arrival rate into each of the servers having infinite queue lengths is at most .

  4. For any such that and for , there exists a unique FSP on the entire interval . In , it is as described in Statement 1. Starting from , all queues in stay at and all infinite queues have drift at most .

Although Part 2 follows from Part 1, and Part 4 is stronger than Part 1, the statement of Proposition 3.1 is arranged as it is to facilitate its proof, as we will see in Section 4 in detail.

Proof of Theorem 2.1.

Note that Theorem 2.1 is a special case of Proposition 3.1 when . ∎

3.2 Large-scale asymptotics: auxiliary results

In this subsection we will state two crucial lemmas that describe asymptotic properties of sequence of systems as the number of servers , if stability is given. Their proofs involve mean-field fluid scaling and limits.

Lemma 3.2.

There exist and , such that the following holds. Consider any sequence of systems with and infinite queues such that , and assume that each of these systems is stable. Then for all sufficiently large ,

Lemma 3.3.

Consider any sequence of systems with and infinite queues such that , and assume that each of these systems is stable. The following statements hold:

  1. If , then as .

  2. If , then the limit of the sequence of stationary occupancy states is the distribution concentrated at the unique equilibrium point , such that



Lemmas 3.2 and 3.3 are proved in Section 5. These results will be used to derive necessary large- bounds on the expected arrival rate into each of the servers having infinite queue lengths when the system is in steady state.

Remark 2.

It is also worthwhile to note that Lemmas 3.2 and 3.3 can be thought of as a weak monotonicity property of the TABS scheme as mentioned earlier. Loosely speaking, the weak monotonicity requires that no matter where the system starts, in some fixed time the system arrives at a state with a certain fraction of busy servers. The purpose of Lemmas 3.2 and 3.3 is to bound under the assumption of stability, the expected rate at which task arrives to the infinite queues when the subsystem containing the finite queues is in steady-state: In this regard

  1. Lemma 3.3 guarantees high probability bounds on the total number of busy servers, so that with probability tending to 1 as , the fraction of busy servers in the whole system is at least in steady state.

  2. But note that since the arrival rate is , when the system has few busy servers (even with an asymptotically vanishing probability), the arrival rate to the infinite servers can become . Thus we need the exponential bound stated in Lemma 3.2 in order to obtain bound on the expected rate of arrivals to the infinite queues.

In Subsection 4.4 we will see that as a consequence of Lemmas 3.2 and 3.3, we obtain that for large enough , under the assumption of stability, the steady-state rate at which tasks join an infinite queue is strictly less than 1, and the drift of the infinite queues as defined in Subsection 3.1 becomes strictly negative. This fact will be used in the proof of Proposition 3.1.

Proof of Theorem 2.2.

Note that given the large- stability property proved in Proposition 3.1 for , and the convergence of stationary distributions under the assumption of stability in Lemma 3.3, the proof of Theorem 2.2 is immediate. ∎

4 Proof of Proposition 3.1: An inductive approach

Throughout this section we will prove Proposition 3.1. The proof consists of several steps and uses both conventional fluid limit and mean-field fluid scaling and limit in an intricate fashion. Below we first provide a roadmap of the whole proof argument.

4.1 Proof idea and the roadmap

The key idea for the proof of Proposition 3.1 is to use backward induction in , starting from the base case . For , all the queues are infinite. In that case, Parts (1) and (2) are vacuously satisfied with the convention . Further observe that TABS scheme does not differentiate between two large queues (in fact, any two non-empty queues). Thus, when all queues are infinite, since all servers are always busy, each arriving task is assigned uniformly at random, and each server has an arrival rate and a departure rate 1. Thus, it is immediate that the drift of each server is , and thus, . This proves (3), and then (4) follows as well.

Now, we discuss the ideas to establish the backward induction step, i.e., assume that Parts (1)–(4) hold for for some and verify that the statements hold for . Rigorous proofs to verify Parts (1)–(4) for are presented in Subsections 4.24.5. We begin by providing a roadmap of these four subsections.

Part (1).

Recall that we denote by the indices of the servers having infinite queue lengths, and by the set of all server indices. Denote by the -th largest component of (ties are broken arbitrarily). Then for any with infinite components, define


with the convention that if all components of are infinite. For , Part (1) is proved with the choice of as given by (4.1). Indeed, recall that we are at the backward induction step where there are infinite queues, and we also know from the hypothesis that Parts (1)–(4) hold if there are or larger infinite queues in the system. Loosely speaking, the idea is that as long as a conventional fluid-scaled queue length at some server is positive, it can be coupled with a system where the queue length at server is infinite. Thus, as long as there is at least one server with , the system can be ‘treated’ as a system with at least infinite queues, in which case, Part (4) of the backward induction hypothesis furnishes with the drift of each positive component of the FSP (in turn, which is equal to the drift of each infinite queue for the corresponding system).

Now to explain the choice of in (4.1), observe that when all the components of the -dimensional FSP are strictly positive, each component has a negative drift of . Thus, is the time when at least one component of the -dimensional FSP hits 0. From this time-point onwards, each positive component has a drift of , and thus, is the time when two components hit 0. Proceeding this way, one can see that at time all finite positive components of the FSP hit 0. The above argument is formalized in Subsection 4.2.

Part (1) Part (2).

To prove Part 2, we will use the fluid limit technique of proving stochastic stability as in [24, 26, 5], see for example [5, Theorem 4.2] or [26, Theorem 7.2] for a rigorous statement. Here we need to show that the sum of the non-infinite queues (of an FSP) drains to 0. This is true, because by Part (1) each positive non-infinite queue will have negative drift. The formal proof is in Subsection 4.3.

Part (2) + Lemmas 3.2 and 3.3 Part (3).

Note that in the proofs of Parts (1) and (2) we have only used the backward induction hypothesis, and have not imposed any restriction on the value of . This is the only part where in the proof we use the large-scale asymptotics, in particular, Lemmas 3.2 and 3.3. For that reason, in the statement of Proposition 3.1 we use “large-enough ”. The idea here is to prove by contradiction. Suppose Part (3) does not hold for infinitely many values of . In that case, it can be argued that there exist a subsequence and some sequence with , such that when the subsystem consisting of finite queues is in the steady state, the average arrival rate into each of the servers having infinite queue lengths is at least 1, along the subsequence. Loosely speaking, in that case, Lemmas 3.2 and 3.3 together imply that for large enough , there are ‘enough’ busy servers, so that the rate of arrival to each infinite queues is strictly smaller than 1, which leads to a contradiction. Note that we can apply Lemmas 3.2 and 3.3 here, because Part (2) ensures the required stability. The rigorous proof is in Subsection 4.4.

Parts (2), (3) + Time-scale separation Part (4).

We assume that Parts (1) – (3) hold for , and we will verify Part (4) for . Observe that it only remains to prove convergence to the FSP on the (scaled) time interval . For this, observe that it is enough to consider the sequence of systems for which where for all . In particular, all that remains to be shown is that the drift of each infinite queue is indeed . Recall the conventional fluid scaling and FSP from Subsection 3.1, and let be the scaling parameter. The proof consists of two main parts:

  1. Let us fix any state of the unscaled process. If the sequence of systems is such that where for all , then due to Part (2), for the subsystem consisting of finite queues, the (scaled) hitting time to the (unscaled) state converges in probability to 0. Also, since this subsystem is positive recurrent (due to Part (2)), starting from a fixed (unscaled) state , its expected (unscaled) return time to the state  is . This will allow us to split the (unscaled) time line into i.i.d. renewal cycles of finite expected lengths. In addition, this also shows that in the scaled time the subsystem of finite queues evolves on a faster time scale and achieves ‘instantaneous stationarity’.

  2. From the above observation we can claim that the number of arrivals to any specific infinite queue can be written as a sum of arrivals in the above defined i.i.d. renewal cycles. Using the strong law of large numbers (SLLN) we can then show that in the limit

    , the instantaneous rate of arrival to an specific infinite queue is given by the average arrival rate when the subsystem with finite queues is in steady state. Therefore, Part (3) completes the verification of Part (4).

The above argument is rigorously carried out in Subsection 4.5.

4.2 Coupling with infinite queues to verify Part (1)

To prove Part (1), fix any such that and for . Let be the set of server-indices , such that . We will first show that when with , then it has a negative drift for all , thus proving Part (1.i). Since ’s are positive, this will then also imply Part (1.ii). Now assume . In that case we have that . Now consider the sequence of processes along any subsequence . Define the stopping time

and . In the time interval , we will couple this system with a system, let us label it , with infinite queues. Let be the queue length, arrival, and departure processes corresponding to the system , and assume that is infinite for . Now couple each arrival to and departure from -th server in both systems, . Since the scheme does not distinguish among servers with positive queue lengths, observe that up to time both systems evolve according to their own statistical laws. Also, up to time , the queue length processes at the servers in in both systems are identical. Thus, in the (scaled) time interval , and for all , and for all . Therefore, using induction hypothesis for systems with infinite queues, there exists a subsequence along which with probability 1,

where for all , and with for all . Consequently, in the time interval , along that subsequence with probability 1,

with for all and for all , where . Observe that the above argument can be extended till the time hits zero. Furthermore, following the argument as above, this time is given by as given in (4.1). This completes the proof of Part 1 (iii).

4.3 Conventional fluid-limit stability to verify Part (2)

As mentioned earlier, we will use the fluid limit technique of proving stochastic stability as in [24, 26, 5] to prove Part (2). Consider a sequence of initial states with increasing norm , i.e., and for . Then from Part (1.iii), we know that for any sequence there exists a further subsequence along which with probability 1, the fluid-scaled occupancy process converges to the process for which hits 0 in finite time , and stays at 0 afterwards. This verifies the fluid-limit stability condition in [5, Theorem 4.2] and [26, Theorem 7.2], and thus completes the verification of Part (2).

4.4 Large-scale asymptotics to verify Part (3)

The verification of the backward induction step for Part (3) uses contradiction. Namely, assuming that the induction step for Part (3) does not hold, we will construct a sequence of systems with increasing , for which we obtain a contradiction using Lemmas 3.2 and 3.3. We note that this is the only part in the proof of Proposition 3.1, where we use the large-scale (i.e., ) asymptotic results.

Observe that we have already argued in Subsection 4.1 that for all , Parts (1) – (4) hold for . Now, if for some , Part (3) does not hold for some while Parts (1)–(4) hold for all , then from the proofs of Parts (1) and (2), note that Parts (1) and (2) hold for as well. Consequently, the subsystem with finite queues is stable. Thus we have the following implication.

Implication 1.

Suppose, for infinitely many , the induction step to prove Part (3) of Proposition 3.1 does not hold for some . Then there exists a subsequence of which we still denote by diverging to infinity, such that (i) The system with infinite queues is stable and (ii) The steady-state arrival rate into each infinite queue is at least 1.

We will now show that Implication 1 leads to a contradiction – this will prove Part (3) of Proposition 3.1. Suppose Implication 1 is true. Choose a further subsequence along which converges to . As in the statement of Lemma 3.3 we will consider two regimes depending on whether or not, and arrive at contradictions in both cases. Since all the infinite queues are exchangeable, we will use to denote a typical infinite queue.

Case 1. First consider the case when . Note that the expected steady-state instantaneous rate of arrival to is given by


Now observe that for large , , since . Further from Lemma 3.3 we know that as . Consequently, as Therefore for large enough ,


which is a contradiction to Part (ii) of Implication 1.

Case 2. In case , first note that the statement in Part (3) is vacuously satisfied if for all large enough . Thus without loss of generality, assume . Fix as in Lemma 3.2. In that case (4.2) becomes

Now, due to Part (2) of Lemma 3.3, we know that

and furthermore, Lemma 3.2 yields



In particular, for large enough , the expected steady-state arrival rate is bounded away from 1, which is again a contradiction to Part (ii) of Implication 1. This completes the verification of Part (3) of the backward induction hypothesis.

4.5 Time-scale separation to verify Part (4)

Assume Parts (1) – (3) hold for all . Now consider a system containing infinite queues with indices in , and recall the conventional fluid scaling and FSP from Subsection 3.1. Also, in this subsection whenever we refer to the process , the components in should be taken to be infinite.

For the queue length vector

, define the norm to be the total number of tasks at the finite queues. Lemmas 4.1 and 4.2 state two hitting time results that will be used in verifying Part (4).

Lemma 4.1.

For any fixed , there exists and , such that if then

Lemma 4.1 says that if the system starts from an initial state where the total number of tasks in the finite queues is suitably large, then the time it takes when the expected total number of tasks in the finite queues falls below a certain fraction of the initial number, is proportional to itself. The proof of Lemma 4.1 is fairly straightforward, but is provided below for completeness.

Proof of Lemma 4.1.

Consider a sequence of initial states with an increasing norm, i.e., is such that where as . Then from Part 1 we know that as , on the time interval the process converges in probability to the unique deterministic process satisfying


where . We also know that for any , if for some , then for all . Consequently, since is positive, there exists , such that

Now since the expected number of arrivals into the -th system up to time , when scaled by , is for any finite , we obtain Therefore, the convergence in probability also implies the convergence in expectation. Thus for the above choice of ,

Hence, there exists such that for all ,

This completes the proof of Lemma 4.1. ∎

For any , define the set , and the stopping time . For large enough , the next lemma bounds the expected hitting time to the fixed set in terms of the norm of the initial state.

Lemma 4.2.

There exists , such that if then

Proof of Lemma 4.2.

Fix any , and take and as in Lemma 4.1. For , define the sequence of random variables with the convention that . Now consider the discrete time Markov chain adapted to the filtration , where is the value of the continuous time Markov process sample at times ’s, and is the sigma field generated by . Further, for define the stopping time . Then observe that

Also define for , and hence Then as a consequence of Dynkin’s lemma [16, Theorem 11.3.1], using [16, Proposition 11.3.2] we have

Choosing completes the proof. ∎

Now we have all the ingredients to verify Part (4) of the backward induction hypothesis. Note that we now look at the sequence of conventional fluid-scaled processes starting at (scaled) time . From the verification of Part (1) we already know that for all , . Thus, it only remains to show that starting from time , the drift of each of the infinite queues is at most . Specifically, we will construct a probability space where the required probability 1 convergence holds.

In order to simplify writing, we assume that the system starts at time 0, and thus it is enough to consider a sequence of initial queue length vectors such that

where is the parameter in the conventional fluid scaling. Hence, Lemma 4.2 yields that as . Consequently, Thus, the fluid-scaled time to hit the set  vanishes in probability, which is stated formally in the following claim.

Claim 1.

If the sequence of initial states is such that as , then as .

Now pick any (unscaled) state , and define the stopping time as

Since due to Part (2) of the backward induction hypothesis, the unscaled process is irreducible and positive recurrent, we have the following claim.

Claim 2.

If the sequence of initial states is such that , then , as

Up to time , consider the product topology on the sequence space. Then Claims 1 and 2 yield that for a sequence of initial states such that as , there exists a subsequence , along which with probability 1, . Starting from the time , along the above subsequence, we construct the sequence of processes on the same probability space as follows.

(1) Define the space of an infinite sequence of i.i.d. renewal cycles of the unscaled process , with the unscaled state being the renewal state, i.e.,

for are i.i.d. copies, and are also i.i.d. copies of .

(2) Define the process as

Let denote the cumulative number of arrivals up to time to a fixed server with infinite queue length when the system starts from the state . Now, in order to calculate the drift of each of the infinite queues, observe that cumulative number of arrivals up to time to server in the -th system can be written as

’s are i.i.d. copies of the random variable , is distributed as , and ’s and are independent of the random variable . Now, since due to Part (2) of the backward induction hypothesis the subsystem consisting of the finite queues is stable, is irreducible and positive recurrent. Thus, we have , and hence, with probability 1,

Thus, using Part (3) of the backward induction hypothesis, SLLN yields, with probability 1,

for some . Therefore, in the conventional fluid limit, . Also, since the departure rate from each of the servers with infinite queue lengths is always 1, it can be seen that in the conventional fluid limit,