Large-scale parallel server system with multi-component jobs

06/19/2020 ∙ by Seva Shneer, et al. ∙ University of Illinois at Urbana-Champaign Heriot-Watt University 0

A broad class of parallel server systems is considered, for which we prove steady-state asymptotic independence of server workloads, as the number of servers goes to infinity, while the system load remains sub-critical. Arriving jobs consist of multiple components. There are multiple job classes, and each class may be of one of two types, which determines the rule according to which the job components add workloads to the servers. The model is broad enough to include as special cases some popular queueing models with redundancy, such as cancel-on-start and cancel-on-completion redundancy. Our analysis uses mean-field process representation and the corresponding mean-field limits. In essence, our approach relies almost exclusively on three fundamental properties of the model: (a) monotonicity, (b) work conservation, (c) the property that, on average, "new arriving workload prefers to go to servers with lower workload."

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

In this paper we consider a broad class of parallel server systems, for which we prove steady-state asymptotic independence of server workloads, as the number of servers goes to infinity, while the system load remains sub-critical. Our model is such that arriving jobs consist of multiple components. There are multiple job classes, and each class may be of one of two types. A job type determines the rule according to which the job components add workloads to the servers. The model is broad enough to include as special cases some popular queueing models with redundancy, such as cancel-on-start and cancel-on-completion redundancy.

More specifically, we consider a service system consisting of identical servers, processing work at unit rate. Jobs of multiple classes arrive as independent Poisson processes of rate . A job of each class consists of components, while the -dimensional exchangeable distribution determines the random component sizes, or workloads, . (I.i.d. component sizes is a special case of exchangeability.) Each class- job uniformly at random selects a subset of servers, . Each job class may be of one of the two types, either water-filling or least-load. A job type determines the way in which the arriving job adds workload to the servers. For the least-load type, the component (random) workloads are added to the least-loaded servers out of the selected by the job. For the water-filling type, we describe the workload placement algorithm via the following illustration. Suppose, , , the component sizes realization is , and the workloads of the selected 4 servers are . Then, adding 10 units of the first component workload in the water-filling fashion brings the selected servers’ workloads to . Before we place the next – second – component’s workload, we exclude one of the servers that already received non-zero workload – it will be one of the servers with workload 11 in this illustration. Then, placing the second component’s workload 5 in water-filling fashion on the remaining 3 selected servers, brings the servers’ workloads to . In general, after each component is placed, the set of selected servers is reduced by excluding one of the servers that received non-zero workload from that component.

We assume that the system is sub-critically loaded, , where is the total expected workload brought by a class job. It is not hard to see that the system is stable for each . Our main results, Theorem 1 and Corollary 2, prove the steady-state asymptotic independence property: for any fixed integer , as , the steady-state workloads of a fixed set of servers (say servers ), become asymptotically independent. This property, in addition to be important in itself, in many cases allows one to obtain asymptotically exact system performance metrics, such as steady-state job delay mean or distribution.

Our model is related to – but not limited to – queueing models with cancel-on-start and cancel-on-completion redundancy [15, 12, 8, 7, 1, 2, 3]

. In redundancy models each job places its “replicas” on a selected subset of servers. The replicas may be served by their servers simultaneously. When a certain number of job replicas start [resp., complete] their service, all other replicas are “canceled” and removed from the system. Hence the term cancel-on-start [resp., cancel-on-completion]. We postpone until the next section the detailed discussion of our model, including its relations to the models with redundancy. At this point we note that our least-load job type covers the cancel-on-start redundancy, and our water-filling job type covers the cancel-on-completion redundancy in the special case of i.i.d. exponentially distributed replica sizes. Moreover, our model allows multiple job classes, of different types, within same system. We also note that, for example, the model in

[7] and some of the models in [10] are special cases of ours; the steady-state asymptotic independence was used in those papers as a conjecture; our Corollary 2 proves this conjecture (for those models in particular), thus formally substantiating the asymptotic steady-state performance metrics derived for those models in [7] and [10].

Methodologically, this paper belong the line of work, establishing the steady-state asymptotic independence in different contexts, e.g. [16, 5, 13, 14]. Our approach is based on analyzing mean-field (fluid) scaled process and its limit. One part of our analysis, namely establishing asymptotic independence of server workloads over a finite time interval, closely follows the previous work, namely [9, 5]. But, the main part of the analysis, namely the transition from the finite-interval asymptotic independence to the steady-state asymptotic independence, is different from that in [5]. (Paper [9] does not have a steady-state asymptotic independence result.) Specifically, we rely on the dynamics – transient behavior – of the mean-field scaled process and its limit; in this sense, our approach is close to that in [13, 14]. (The approach of [5]

relies in essential way the direct steady-state estimates of the marginal workload distributions, obtained in

[4].) At a high level, one may say that our approach relies almost exclusively on three fundamental properties of the model: (a) monotonicity, (b) work conservation, (c) the property that, on average, “new arriving workload prefers to go to servers with lower workload.”

The rest of the paper is organized as follows. A more detailed discussion of our model and results is given in Section 2, which is followed by a brief review of previous work in Section 3. Section 4 gives basic notation used throughout the paper. Section 5 presents our formal model and main results, Theorem 1 and Corollary 2. In Section 6 we define some generalizations of our model and give their basic properties; these generalizations, while may be of independent interest, are primarily for the purposes of analysis of our original model. Section 7 contains more auxiliary facts used in the analysis. Section 8 gives results on the finite-interval asymptotic independence of the server workloads. In Section 9 we define limits of the mean-field (fluid) scaled processes; we call these limits fluid sample paths (FSP). In Section 10 we study properties of the FSPs, starting specifically from “empty” initial state. In Section 11 we define and study an FSP fixed point, which is the point to which the FSP trajectory, starting from empty state, converges. Finally, Section 12 contains the proof of the main result (Theorem 1), which employs the results developed in Sections 6-11.

2 Discussion of the model and main results

The least-load job type is motivated by two scenarios. First, if we consider a system such that the current server workloads can indeed be observed on a subset of servers, and the job consists of components, it directly makes sense to place those components for service on the least-loaded of those servers. (See e.g. LL(d) policy in [5], which the special case of our model with single least-load class with , .) The second scenario arises in systems where the current workloads are not observable, and which use redundancy to improve performance. (See e.g. [12] for a general motivation for redundancy.) For example, suppose a class- job places “replicas” on randomly selected servers, and each server processes its work (replicas of different jobs) in First-Come-First-Serve (FCFS) order. Suppose the job, to be completed, requires only (out of ) replicas to be processed, and as soon as the first replicas of the job start being processed, the remaining replicas of the job are “canceled” and immediately removed from the their corresponding servers. This is usually referred to as cancel-on-start redundancy. (See e.g. [3, 2], where a special case in considered.) We will call it -c.o.s. redundancy, where are the parameters of class . Clearly, from the point of view of the server workload evolution (which needs not be observable in this case), the described -c.o.s. redundancy is equivalent to simply placing job replicas on the least loaded (out of ) servers, and not placing any workload on the remaining servers. Thus, a job class using -c.o.s. redundancy can be equivalently viewed as a least-load job class in terms of our model, with the job components being the first replicas.

The water-filling job type motivation is also two-fold. First, suppose a job class is of water-filling type, with . So, a class job has one component. Suppose further that this component’s can be arbitrarily divided between servers, in the sense that a parallel processing of a job is allowed. (For example, the servers may represent different data transmission channels, with a job (its single component) being a file that needs to be transmitted, and a job size being the file size.) Suppose the job can use randomly selected servers. Servers process workload in the FCFS order. The job is completed when all its workload is processed. Then, if the objective is to minimize the job completion time, then its workload should be placed into the selected servers in the water-filling fashion. This can be done directly, if the workloads of the selected servers and the job workload are observable, or indirectly, as follows. The job joins the FCFS queues at each of the selected servers. When this job, at any of the selected servers, reaches the top of the queue – i.e., can start using that server – that server starts processing the job, possibly in parallel with other selected servers. The job is completed when the total amount of service it receives from all servers is equal to its size, at which point the job is removed from all queues. From the point of view of the server workload evolution (which needs not be observable in this case), the described procedure is equivalent to simply placing the job’s single component on selected servers in the water-filling fashion.

The second motivation for the water-filling job type arises from cancel-on-completion redundancy ([12, 15, 8, 7, 2, 1, 10]). Suppose a class- job places job replicas on randomly selected servers. Each server processes its work (replicas of different jobs) in FCFS order. Suppose the job, to be completed, requires only (out of ) replicas to be processed, and as soon as the first replicas of the job complete their service, the remaining replicas of the job are “canceled” and immediately removed from the their corresponding servers. (Hence the name cancel-on-completion.) We will call this -c.o.c. redundancy, where are the parameters of class . Suppose, in addition, that the replica sizes for a class- job are i.i.d. exponentialrandom variables with mean . (This additional assumption, as well as the further assumption that , is used, e.g., in [7].) Under this additional assumption (of i.i.d. exponential replica sizes), it is easy to observe that, from the point of view of the server workload evolution (which needs not be observable), the described -c.o.c. redundancy is equivalent to placing on the selected servers a water-filling job with the following parameters: are same as above, and the component sizes are i.i.d. exponential random variables with mean . Indeed, the job component 1 places (stochastically) exactly same amounts of additional workload on the servers as the workloads placed by all replicas up to the time of the first replica service completion. Similarly, the job component 2 places (stochastically) exactly same amounts of additional workload on the servers as the workloads placed by all replicas from the time of the first replica service completion until the time of the second replica service completion. And so on. Thus, a job class using -c.o.c. redundancy (under the additional assumption of i.i.d. exponential replica sizes) can be equivalently viewed as a water-filling job class in terms of our model, with parameters and i.i.d. exponentially distributed components.

We see that our model is very broad. Paper [5] proves (among other results) the steady-state asymptotic independence (our Corollary 2) for the special case of a single, least-load job class with . (See LL(d) model in [5].) The -c.o.c. model with i.i.d. exponential replica sizes, considered in [7], is a special case of our model, with a single, water-filling job class, with and i.i.d. exponential component sizes. One of the models considered in [10] (called LL(d,k,0) there) is a special case of ours, with a single least-load job class. In both [7] and [10] the steady-state asymptotic independence was used as a conjecture; our Corollary 2 proves this conjecture for both models. Furthermore, since our model allows multiple job classes of different types, Corollary 2 establishes the steady-state asymptotic independence, for example, for a system with two job classes – one as in [7] and one as the LL(d,k,0) class in [10].

3 Previous work

The work on the steady-state asymptotic independence in the large-scale regime, with the number of servers and the arrival rate increasing to infity, while the system load remains sub-critical, includes, e.g., papers [16, 5, 13]. Papers [16, 5] prove this for the celebrated “power-of-d” choices algorithm, where each arriving (single-component) job joins the shortest queue out of randomly selected; [16] does this for the exponentially distributed job sizes, while [5] extends the results to more general job size distributions, namely those with decreasing hazard rate (DHR). Note that a standard power-of-d choices algorithm is not within the framework of our model, because job placement decisions depend of the queue lengths (number of jobs) as opposed to a workload-based decisions. However, [5] also considers – and establishes the steady-state asymptotic independence for – the LL(d) model, which is a special case of our model with the single, least-load job class with and . Note that, equivalently, this is the single-class -c.o.s. redundancy model. The main results of [5] in turn rely on the uniform estimates of the marginal stationary distribution of a single server state, obtained in [4]. Paper [13] proves the steady-state asymptotic independence under a pull-based algorithm, also for the model with single-component jobs, having DHR size distributions. (The model in [13] is also not within the framework of present paper model.)

For the redundancy models, such as in [15, 12, 8, 7, 1, 2, 3, 10], we are not aware of prior steady-state asymptotic independence results, besides the already mentioned -c.o.s. result in [5]. However, the steady-state asymptotic independence conjecture is often used (e.g. [15, 7, 10]) to obtain estimates of the steady-state performance metrics of large scale systems.

Paper [15] introduces redundancy as a way to reduce job delays. It considers -c.o.c. redundancy model, with generally distributed replica sizes. (As such, this model is not within the framework of our model.) The paper uses the steady-state asymptotic independence conjecture to estimate the average job delay when the system is large.

Paper [12] introduces and motivates the -c.o.c. redundancy model, and establishes a variety of monotonicity properties of the average job delay with respect to selection set size , under different assumptions on the replica size distribution. Some of the results of [12] are for the -c.o.c. redundancy model with i.i.d. exponential replica sizes, which is a special case of our model, but [12] does not consider the asymptotic regime with .

As already described earlier, paper [7] studies a -c.o.c. redundancy model with i.i.d. exponential replica sizes, and obtains the asymptotically exact expressions for the job delay distribution, based on the steady-state asymptotic independence conjecture. Our results prove this conjecture, thus completing formal substantiation of those asymptotic expressions.

Paper [10] studies general redundancy models – more general than c.o.s. and c.o.c. that we described earlier – and uses the steady-state asymptotic independence conjecture to characterize and compute steady-state performance metrics. Some (not all) of the redundancy schemes in [10] are within our model framework. For example, LL(d,k,0) redundancy in [10] is a special case of our model with a single least-load class. Thus, again, by proving the the steady-state asymptotic independence, our results complete formal substantiation of some of the asymptotic results in [10].

Papers [1, 2, 3] derive explicit product-form stationary distributions for different versions of -c.o.c. and -c.o.s. redundancy, assuming i.i.d. exponential replica sizes.

4 Basic notation

We denote by and the sets of real and real non-negative numbers, respectively, and by and the corresponding -dimensional product sets. By we denote the one-point compactification of , where is the point at infinity, with the natural topology. We say that a function is RCLL if it is right-continuous with left-limits

. Inequalities applied to vectors [resp. functions] are understood component-wise [resp. for every value of the argument]. The sup-norm of a scalar function

is denoted ; the corresponding convergence is denoted by . U.o.c. convergence means uniform on compact sets convergence, and is denoted by . We use notation: , . Abbreviation WLOG means without loss of generality.

For a random process we denote by the random value of in a stationary regime (which will be clear from the context). Symbol signifies convergence of random elements in distribution;

means convergence in probability.

W.p.1 means with probability one. I.i.d. means independent identically distributed. Indicator of event or condition is denoted by . If are random elements taking values in set , on which a partial order is defined, then the stochastic order means that and can be coupled (constructed on a common probability space) so that w.p.1.

We will use the following non-standard notation. Suppose , , is a sequence of random functions of , and is a deterministic function of . Then, for a fixed ,

(1)

and for a subset of the domain of ,

(2)

Analogously,

5 Formal model and main results

5.1 Model

There are identical servers. The unfinished work of a server at a given time will be referred to as its workload. Each server processes its workload at rate . There is a finite set of job classes . (Set does not depend on .) Jobs of class arrive as Poisson process of rate . Each job class has three parameters: integers and such that

, and the exchangeable probability distribution

on . A class- job consists of components, with each component having a (random) size (which is the amount of new workload this component brings);

is the joint distribution of random component sizes

for a class- job. Exchangeability of means that it is invariant w.r.t. permutations of component indices. We assume that . WLOG, we can and do assume that . We will denote .

Each job class may be of one of the two types, either water-filling or least-load. The corresponding non-intersecting subsets of we denote by and . (Either of them may be empty.) A job type determines the way in which the arriving job adds workload to the servers. We will describe the job types separately.

A least-load job class . When such a job arrives, servers are selected uniformly at random; these servers form the selection set of the job. Then of the selected servers, that are least-loaded (have the smallest workload), are picked; the workload ties are broken in arbitrary fashion. Then, independently of the process history, random component sizes are drawn according to distribution . Then, workload is added to the least-loaded of those servers, is added to the second least-loaded of those servers, and so on.

A water-filling job class . When such a job arrives, its selection set of servers is selected uniformly at random. Then, independently of the process history, random component sizes are drawn according to distribution . We “take” the first component, and place its -size workload on the servers within the selection set in the “water-filling” fashion. (For example, suppose the selection set consists of servers, , with workloads , and suppose . Then, adding the workload of size to these servers in water-filling fashion will result in the new workloads being . That is servers 1 and 3 will receive non-zero additional workloads, and , respectively, and will end up with equal workload . Servers 2 and 4 will not receive any of the first component’s workload.) After this, we will have the set of servers (one or more), which received some non-zero workload from the first component. (Servers 1 and 3 in the illustration above.) They all will have equal workload. Let us call them component-1 servers. Then we pick one of the component-1 servers (in arbitrary fashion), and exclude it from further workload placement by this job. Then, we “take” the second component, and place its -size workload on the remaining servers by continuing the water-filling. The servers that receive a non-zero workload from the second component we call component-2 servers. Then we exclude one of the component-2 servers, and so on, until the workload of all components is placed. (Note that we could define an additional, different water-filling type, such that the water-filling continues to use all selected servers, without excluding one of the servers after each component placement. This, however, is just a special case of the type we just defined, with components replaced by the single component of the size .)

By the model definition, for each class , regardless of its type, the total expected additional workload it brings to the system is equal to .

5.2 Asymptotic regime. Mean-field scaled process

We consider the sequence of systems with , and assume

Further assume that the system is (asymptotically) sub-critically loaded

(3)

Denote the (limiting) total job arrival rate per server by

(4)

WLOG, we can and will assume . (We can achieve this by rescaling time, if necessary.)

To improve paper readability, let us assume that for each . Having converging does not change anything of substance, but clogs exposition. (However, we do need and will use the fact that our results hold for converging arrival rates.) Similarly, throughout the paper, we will often consider “ servers” for some real , ignoring the fact that may be non-integer; it would be more precise to consider, for example, “-rounded-up servers,” but it would just clog the exposition, rather than creating any difficulties.

From now on, the upper index of a variable/quantity will indicate that it pertains to the system with servers, or -th system. Let denote the workload of server at time in the -th system. (When we say that server at time is empty.) Consider the following mean-field, or fluid, scaled quantities:

(5)

That is, is the fraction of servers with . Then is the system state at time ; note that is the fraction of busy servers (the instantaneous system load).

For any , the state space of the process is a subset of a common (for all ) state space , whose elements are non-increasing RCLL functions of , with values . An element defines a probability measure on , with being the measure of for . Denote ; then is the measure of . An element we will call proper, if , i.e. if the corresponding probability measure is concentrated on . We will equip the space with the topology of weak convergence of measures on ; equivalently, if and only if for each where is continuous. We also can and do equip with a metric consistent with the topology. Obviously, is compact.

For any , process is Markov with state space , and with sample paths being RCLL functions of . Moreover, this is a renewal process, with renewals occurring when all servers become empty.

Under the subcriticality assumption (3), i.e. , the stability (positive Harris recurrence) of the process , for any , is not hard to establish. (Positive recurrence in this case simply means that the expected time to return to the empty state is finite.) It can be established, for example, using the fluid limit technique, analogously to the way it is done in [6]. The key property that fluid limit for our model shares with that in [6] is that if there is a subset of servers, whose fluid workloads are greater than in the rest of the servers, the average per-server rate at which the servers within the subset will receive new workload is at most . (See (12) in Section 6.2.) We do not provide further details of the stability proof.

Given that the process is stable, it has unique stationary distribution. Let be a random element whose distribution is the stationary distribution of the process; in other words, this is a random system state in stationary regime.

5.3 Main results

Theorem 1.

There exists a unique proper element , with , such that

(6)

Function is Lipschitz continuous and strictly decreasing (and then everywhere positive).

Corollary 2 (Steady-state asymptotic independence).

For any fixed integer , the following holds. For each , let denote the random value of in the stationary regime. Then

(7)

where random variables are i.i.d., with .

6 More general systems

6.1 Infinite workloads and truncation. Monotonicity properties.

For the purposes of our analysis, it will be convenient to consider two generalizations of our model. (These more general systems may be of independent interest as well.)

First, we generalize our original system defined above, by allowing that some of the servers to have infinite workload. Specifically, if server workload is initially infinite, , then, by convention, it remains infinite at all times, , . The same workload placement rules apply even if some server workloads are infinite, with the convention that an infinite workload remains infinite when “more” workload is added to it. Note that if one or more server workloads are initially infinite, this implies that and for all .

Second convenient generalization is a system, where the workload of the servers is truncated at some level , where . Such truncated system is defined exactly as the original one, except when an arriving job adds workload to servers, each server’s workload is capped (truncated) at level every time the algorithm would increase it above . The workload lost due to truncation is removed from the system. Case corresponds to the original, non-truncated system, where the arriving workload is never lost. Note that, if , then the stability for any (and any ) is automatic. The process corresponding to the truncated system with parameter , we denote by if superscript is absent, this corresponds to , i.e. the process is for the original non-truncated system.

Finally, if the process starts specifically from the “empty” initial state (with all servers having zero initial workload), we will add superscript to the process notation: ; therefore, . So, for example, denotes the original non-truncated process, starting from the empty state.

The analysis in this paper relies on the system monotonicity, and related properties. We will need several such properties. They are all related and rather simple.

Lemma 3.

Consider two versions of the process, and , such that , . Then these processes can be coupled so that, w.p.1,

(8)

Furthermore, if the process is modified so that, in addition to the job arrival process (as defined in our model), arbitrary amounts of workload may be added at arbitrary times to arbitrary servers, the property (8) still holds.

Proof. As far as the mean-field scaled processes and are concerned, WLOG, we can assume that, after each job arrival and/or other workload addition(s), the actual servers are relabeled, so that the workloads are non-decreasing. Then, for the two processes it is sufficient to couple in the natural way the arrival processes and the job selection sets, to see that (8) must prevail at all times.

From Lemma 3, we obtain the following

Corollary 4.

For any , the process is monotone in time , namely

Lemma 5.

Consider two versions of the process, and , such that . Suppose that for some fixed , we have for . Then these processes can be coupled so that, w.p.1, for and ,

(9)

Proof. We couple the two processes in the natural way, as in the proof of Lemma 3. The proof then follows by induction on the times of job arrivals in the interval . Indeed, if in the time of the first job arrival, (9) of course holds for all . It is then easy to see that the changes of and for , at time , only depend on those servers with workloads at most , which are same for both systems; we also observe that if any of those servers changes its workload to a value not exceeding , the change will be exactly same in both systems. Then (9) holds for . Then, (9) holds until the time of the second job arrival or , whichever is smaller. And so on.

Lemma 5 and Lemma 3 imply the following more general form of Lemma 5.

Lemma 6.

Consider two versions of the process, and , such that . Suppose that for some fixed , we have for . Then these processes can be coupled so that, w.p.1, for and ,

(10)

6.2 Equivalent representation of a system with some workloads being infinite.

Let be fixed. For each , consider the system with initial state such that servers have infinite workload, while the remaining servers’ workloads are finite. Let denote the set of servers with finite workload. Then, for each , the evolution of the subsystem consisting of servers in – let us call it -subsystem – can be equivalently described as follows. The number of servers is . Each job class “breaks down” into multiple classes , , as follows. Let be the probability that exactly servers selected by a class job, will be in . Note that

Then, for a given , class in the -system has the following parameters: arrival rate per server , , , the distribution of the component sizes is the projection of the distribution on the first components. (, and do not depend on .) Clearly, as far as evolution of the -system is concerned, this new description is consistent with the actual behavior. The load of the -system is

Recall that the load of the original system, for any , is .

The following fact is very intuitive – by the nature of the workload placement algorithm, the arriving workload “prefers” servers with finite workloads.

Lemma 7.

For each ,

(11)

Proof. We can write:

Note that, if is the load of the complementary subsystem, consisting of the infinite-workload servers, then and, therefore,

(12)

Consider now a sequence of the above systems, with . Recall that the number of servers in -system is . Note that

Then,

and the -subsystem (limiting) load is

(13)

We see that the sequence of -systems is just like our original sequence of system, but has different parameters. (Recall that our original model does allow converging arrival rates per server, not just constant.)

7 Some auxiliary facts

Lemma 8.

Let be fixed. Consider a sequence of processes such that, for each , at time , we identify a subset, consisting of servers. As the process evolves, for , we will keep track of those servers – let us call them “tagged.” Denote by , , the (scaled) number of the tagged servers, which are not selected by any new job arrival in the interval . Then, for any fixed ,

(14)

Since, by definition, is non-increasing in , as a corollary of (14), we obtain the following stronger property: for any ,

(15)

Proof. Using coupling, we see that the stochastic lower bound of the process can be obtained by considering the following “worst case” unaffected tagged set scenario: (a) each new job arrival selection set consists of servers and (b) if at least one of the selected servers is within the set of currently unaffected tagged servers, the latter set is reduced by servers. For the worst case unaffected tagged set, is the deterministic mean-field (fluid) limit, solving with ; here is the (scaled, limiting) rate at which arriving jobs select a server within the set and is the number of servers removed upon each such event. Namely, using standard techniques (“large number of servers” fluid limit), cf. [11], it is straightforward to show that, for any ,

which then implies (14).

Lemma 9.

Let and be fixed. Consider a sequence of processes with initial states satisfying the following condition: the (scaled) number of servers with workload exactly equal to , is at least ; namely, . Then,

(16)

(Informally, in words, “when is large, then with high probability a positive, bounded away from zero, jump in “moves” left at speed from initial point .)

Proof. Consider the servers with initial workload exactly equal to as tagged servers, and apply Lemma 8.

8 Asymptotic independence over a finite interval

The constructions and the results in this section closely follow those in [9] (proofs of Lemmas 3.1 and 3.2) and [5] (Section 7). We give them here (along with short proofs) in the setting/notation that we need for our model.

Suppose a finite set of fractions (a probability distribution) is fixed, where all and . Also fixed is a set of numbers , . Let the truncation parameter be fixed. In this section, we consider a sequence of our systems, indexed by , with initial states such that servers have workload exactly , . Suppose, initially the server indices are assigned in the order of a server set permutation chosen uniformly at random. This means, in particular, that .

We now formally construct a random process . Lemma 11 below will show that as . So, informally speaking, this is a construction of the evolution of a server workload in a system with “infinite number of servers.”

Suppose we consider a server, labeled to be specific. Let denote its workload at time . Just like for our original system (with finite number of servers), we will use the terminology of a job selection set, although here the latter will be defined formally, not as a result of an actual selection process. Denote , . Then, by definition, the job arrivals of type selecting server 1 occur according to an independent Poisson process of rate . We now define the dependence set of server at time . To improve the exposition, we will define the construction of via an example, shown on Figure 1; in this example we also assume that there are two job classes, with and . Ovals indicate job arrivals, and with crosses showing the servers they select. The figure shows a fixed time interval . The dependence set may change (increase) only at the times of job arrivals selecting server . (To be specific, let’s adopt the convention that is left-continuous in .) On Figure 1 there are two job arrivals, at and in the interval . Then, for . The set (which then remains constant for ) is then constructed as follows. The job arrival at is of class 1, so it selects two servers besides server 1. We “add” two servers to the set , and label them 2 and 3. So, now . The servers we call “children” of server 1 added at time . Note that each added child server receives a new distinct index, thus increasing set . Now, for each child of server 1 added at time , i.e. servers 2 and 3, we consider job arrivals in the time interval selecting those servers; the corresponding job arrival processes are independent of and have the same law as the arrival process selecting server 1. Then, for each of those arrivals we add to the corresponding new servers, being their children. In our example, we add servers as children of , and server as child of . So, now . In our example, servers have no children in their corresponding time interval (shown as horizontal solid line segments). This stops the construction of , which is then . As we already stated, remains constant in , i.e. until the next arrival selecting server . At time we repeat the procedure of set increase, by adding to it new children servers, then their children, and so on. Once again, as we add new children servers to , we keep giving them new distinct indices. In our example, the job arrival (selecting server 1) at is of class 2, so we add one new child server , consider its children, and so on. As a result, in our example, , and remains constant in . This completes the definition of the dependence set in the interval .

Figure 1: Construction of dependence set .

Now, given a realization of , the random value of is obtained by letting the initial workloads of all servers to be i.i.d. with the distribution , , and the component size vectors for the involved job arrivals being independent with the corresponding distributions. As usual, between the times of job arrivals selecting a server, the workload of each server decreases at rate (unless and until it reaches ). This completes the definition of .

Lemma 10.

For any , the (random) cardinality of the dependence set is finite. Moreover, satisfies

(17)

and therefore

Proof. The proof uses the branching process argument. (It is analogous to that used in the proof of Lemmas 3.1 and 3.2 in [9] or in Section 7 of [5].) In a small time interval , the expected number of children of node 1 that will be added is . The dependence set cardinality for each of those new children has the same distribution as that of node 1, and these cardinalities are independent. This leads to ODE (17). We omit further details.

Lemma 11.

For any , as ,

(18)

Proof. Note that the workloads are those of the servers , in the system with finite . (Also recall that initially the servers’ indices are assigned in a random order.) Let us define the dependence set for server at time . (The definition is as in [9], where it is given in a different context.) For a server denote by the times at which job arrivals selecting server occur, and for the job arrival at time define by the set of other servers selected by that job. For a time , define . Then

Let us consider one fixed server, specifically server 1, for each . Note that the construction of , for a finite system with servers, is analogous to the formal construction of for the “infinite system.” The main difference is that when we “add children servers” into , we do not necessarily add “new” servers – some of the added children servers may already be in . (So, informally speaking, the set is “larger” than .) Formally, it is easy to see that, for all sufficiently large ,

if in the construction of we replace arrival rates by slightly larger rates , . Then, applying Lemma 10, we have that, for any sufficiently large ,

(19)

where . Then, we can couple the constructions of for each and the construction of in such a way that w.p.1, and w.p.1.

Let us denote:

(20)
Lemma 12.

For the sequence of systems, considered in this section, the following holds for any and :

(21)

Proof. The proof is exactly same as the proof of Lemmas 3.1 and 3.2 in [9]. It reduces to showing that for any two fixed servers, say 1 and 2, as ,

The proof of the latter in turn relies on the fact that, by (19), is uniformly bounded in . The details can be found in [9].

9 Fluid sample paths

Suppose we are in the setting of Section 8. Defined there function with values in , we will call a fluid sample path (FSP). Clearly, an FSP initial state is: , . (Note that an FSP, by definition, arises as a result of the limiting procedure specified in Section 8. Namely, the initial states of the pre-limit systems are such that exactly a fraction of servers has workload exactly , for some fixed parameters (such that ) and . In this paper we will only need FSPs defined this way.)

By we will denote the special FSP with ; this means that each pre-limit system starts from “empty” initial state, with all initial workloads being . Of course, . This is the FSP “starting from the empty initial state.” As a special case of Lemma 12, we obtain that for any fixed , , ,

(22)

The FSP definition and Lemma 3 imply the following monotonicity property for the FSPs.

Lemma 13.

(i) Consider two FSPs, and , such that and . Then for all . (ii) Consider two FSPs, and , such that, for some and , and . Then, for all .

10 Properties of FSP starting from empty initial state

In this section we study the properties of the FSPs starting from empty initial state. Recall that is the truncation parameter.

Lemma 14.

Function is non-decreasing in , and non-increasing in .

Proof. Follows from (22), along with Lemma 3 and Corollary 4.

Lemma 15.

For any and , function is proper and strictly decreasing in . Consequently, for each .

Proof. Recall the definition (20) of via the construction of (for the special case when all server workloads are initially ). It follows from the construction that is proper. (If this is, of course, automatic.) Moreover, it easily follows from the construction that, for any , , i.e. is strictly decreasing in .

Sometimes, as in the proof the next lemma, it will be convenient to interpret a given server workload evolution as the movement of a “particle” in , with the workload being the particle location. With this interpretation, between the times of job arrivals that select the server, the particle moves left at the constant speed until/unless it “hits” . At the times when a new job arrival adds to the server workload, the particle “jumps right” by the distance equal to the added worlkoad.

Lemma 16.

As a function of , is Lipschitz, uniformly in and .

Proof. Consider time and interval . All particles (server workloads) that are in at time , at time will be in , unless they are selected by new job arrivals in . Recall that new jobs arrive as a Poisson process of (unscaled) rate , for a given , and each job selects at most particles. Let be the (scaled) number of particles that cross point from left to right in the