# Multiple Server SRPT with speed scaling is competitive

Can the popular shortest remaining processing time (SRPT) algorithm achieve a constant competitive ratio on multiple servers when server speeds are adjustable (speed scaling) with respect to the flow time plus energy consumption metric? This question has remained open for a while, where a negative result in the absence of speed scaling is well known. The main result of this paper is to show that multi-server SRPT can be constant competitive, with a competitive ratio that only depends on the power-usage function of the servers, but not on the number of jobs/servers or the job sizes (unlike when speed scaling is not allowed). When all job sizes are unity, we show that round-robin routing is optimal and can achieve the same competitive ratio as the best known algorithm for the single server problem. Finally, we show that a class of greedy dispatch policies, including policies that route to the least loaded or the shortest queue, do not admit a constant competitive ratio. When job arrivals are stochastic, with Poisson arrivals and i.i.d. job sizes, we show that random routing and a simple gated-static speed scaling algorithm achieves a constant competitive ratio.

## Authors

• 18 publications
• 19 publications
07/10/2019

### Speed Scaling with Tandem Servers

Speed scaling for a tandem server setting is considered, where there is ...
11/14/2017

### Robust Online Speed Scaling With Deadline Uncertainty

A speed scaling problem is considered, where time is divided into slots,...
12/20/2019

### A QoS-aware workload routing and server speed scaling policy for energy-efficient data centers: a robust queueing theoretic approach

Maintaining energy efficiency in large data centers depends on the abili...
08/16/2021

### Speed Scaling with Multiple Servers Under A Sum Power Constraint

The problem of scheduling jobs and choosing their respective speeds with...
06/11/2020

### Performance Analysis of Modified SRPT in Multiple-Processor Multitask Scheduling

In this paper we study the multiple-processor multitask scheduling probl...
12/02/2021

### A Foreground-Background queueing model with speed or capacity modulation

The models studied in the steady state involve two queues which are serv...
12/10/2021

### A General "Power-of-d" Dispatching Framework for Heterogeneous Systems

Intelligent dispatching is crucial to obtaining low response times in la...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## I Introduction

How to route and schedule jobs are two of the fundamental problems in multi-processor/multi-server settings, e.g. microprocessors with multiple cores. Microprocessors also have the flexibility of variable speed of operation, called speed scaling, where to operate at speed , the power utilization is typically, , with . Speed scaling is also available in modern queuing systems where servers can operate at variable service rates with an appropriate cost function .

Increasing the speed of the server reduces the response times (completion minus arrival time) but incurs a larger energy cost. Thus, there is a natural tradeoff between the the flow time (defined as the sum of the response times across all jobs) and the total energy cost, and a natural objective is to minimize a linear combination of the flow time and total energy, called flow time plus energy.

In this paper, we consider the online problem of routing, scheduling, and speed scaling in a multi-server setting to minimize the flow time plus energy, where jobs arrive (are released) over time and decisions have to be made causally. On the arrival of a new job, a centralized controller needs to make a causal decision about which jobs to process on which server and at what speed, where preemption and migration is allowed. By migration, we mean that a job can be preempted on one server and restarted on another server later. The model, however, does not allow job splitting, i.e., a job can only be processed on a single server at any time.

For this problem, both the stochastic and worst case analysis is of interest, where in the stochastic model, the input (job sizes and arrival instants) is assumed to follow a distribution, and performance guarantees in expectation are derived. In the worst case analysis, the input can be generated by an adversary, and the performance metric is the competitive ratio, that is defined as the maximum of the ratio of the cost of the online algorithm and the optimal offline algorithm (that knows the entire input sequence ahead of time).

### I-a Prior Work

#### I-A1 Single Server

For a single server, it is known that Shortest Remaining Processing Time (SRPT) is an optimal scheduling policy, and the only decision with speed scaling is the optimal dynamic speed choice. There is a large body of work on speed scaling in the single server setting [1, 2, 3, 4, 5, 6, 7, 8] both in the stochastic as well as worst case settings, where mostly is used, under various assumptions, e.g. bounded speed [9], with and without deadlines [10, 11, 12], etc.

In the stochastic model, [6] showed that a simple fixed speed policy (called gated speed) that depends only on the load/utilization and is independent of the current number of unfinished jobs/sizes has a constant multiplicative gap from the ‘unknown’ optimal policy. Further work in this direction can be found in [13, 14], where [14] derived the mean response time under the SRPT algorithm. For the worst case, there are many results [1, 2, 3, 4, 5, 7, 9, 10, 11, 12, 15, 8]. A key result in this space was proved in [15], where an SRPT-based speed scaling algorithm is proved to be -competitive algorithm for an arbitrary power function . In [8], using essentially the same ideas as in [15], but with a more careful analysis, a slightly modified SRPT-based speed-scaling algorithm is shown to be -competitive algorithm, also for an arbitrary power function.

In the worst case setting, when considering speed scaling, two classes of problems are studied: (i) unweighted and (ii) weighted, where in (i) the delay that each job experiences is given equal weight in the flow time computation, while in (ii) it is scaled by a weight that can be arbitrary. The weighted setting is fundamentally harder that the unweighted one, where it is known that constant-competitive online algorithms are not possible [16], even for a single server, while constant competitive algorithms are known for the unweighted case, even for arbitrary energy functions, e.g., the -competitive algorithm proposed in [8]. To circumvent the negative result for the weighted case, typically, the online algorithm is allowed a speed augmentation of compared to the optimal offline algorithm, in which case algorithms with competitive ratios are possible, where depends on .

#### I-A2 Multiple Servers

With multiple servers without speed scaling (when the server speeds are fixed), to minimize just the flow time, a well known negative result from [17] showed that the SRPT algorithm (which always processes the smallest jobs with servers) that requires both preemption and job migration has a competitive ratio that grows as the logarithm of the ratio of the largest and the smallest job size and the logarithm of the ratio of the number of jobs and the number of servers. Moreover, [17] also showed that no online algorithm can do better than SRPT when server speeds are fixed.

With multiple servers, one critical aspect is whether job migration is allowed or not. With job migration, a preempted job can be processed by any of the servers and not necessarily by the server where it was partially processed first. Remarkably in [18], a non-migratory algorithm that only requires preemption is proposed that achieves the same competitive ratio as SRPT. A more positive result for SRPT is that if it is allowed a speed augmentation of (respectively, ) over the offline optimal algorithm, then it has a constant competitive ratio of (respectively, a constant constant depending on ); see [19, 20].

For the worst case design, speed scaling with multiple servers to minimize flow time and energy has been studied in [21, 22, 23, 24, 25]. The homogenous server case was studied in [24, 22], i.e., is identical for all servers, while the heterogenous case was addressed in [23, 25], where is allowed to be different for different servers.

For the unweighted flow time and energy problem under the homogenous server case, a variant of the round robin algorithm without migration has been shown to have a competitive ratio of [21] with augmentation with bounded server speeds. This result was extended in [22] for the weighted flow time plus energy using a randomized server selection algorithm that also does not use migration.

For the heterogenous server setting with augmentation, [23, 25], derived algorithms that assigns job to server that cause least increase in the projected future weighted flow and a variant of processor sharing, respectively, that are competitive in unweighted and weighted flow time plus energy. Moreover, if for server , , then the algorithm in [23, 26] has a competitive ratio dependent on without any need for speed augmentation; however, the exact competitive ratio is not provided there.

In the stochastic setting, for multiple servers, the flow time plus energy problem with multiple servers is studied under a fluid model [27, 28]

or modelled as a Markov decision process

[29], and near optimal policies are derived.

Our focus in this work is on the unweighted flow time plus energy under the homogenous server setting, where in the context of the prior work we want to answer the following open questions: (i) For the worst case design, is it possible to achieve a constant competitive ratio with simpler algorithms without any speed augmentation (compared to algorithms of [23, 25], that are hard to implement)? In particular, can SRPT do so, since it is a widely used and simple to implement algorithm? This question is also directly related to the limitation of SRPT without speed scaling as shown in [17], and whether SRPT with and without speed scaling are fundamentally different. (ii) For the stochastic setting, can simple algorithms achieve near optimal performance without the need of fluid limit approximations?

### I-B Our Contributions

Let the number of (homogenous) servers be

• The SRPT algorithm, with speed chosen as if and if , where is the number of unfinished jobs, is shown to be -competitive, where

 c=P(2−1/m)(2+2P−1(1)max(1,P(¯s))).

This means the above algorithm is constant-competitive, with a competitive ratio that is independent of the number of servers as well as the workload sequence.111While does appear in the expression for the competitive ratio, note that is trivially upper bounded by 2. This result is proved under mild regularity assumptions on the power function  which can be further relaxed using standard arguments [15]. For the special case where , we derive another bound of on the competitive ratio; this bound is tigher than the previous one for Similar to the algorithm proposed in [23], the competitive ratio of the our SRPT-based policy also depends on . However, the algorithm proposed here is much simpler, and comes with a lower implementation complexity.

• An important conclusion to draw from this result is that SRPT with speed scaling is fundamentally different as compared to the case when speed scaling is not allowed; in the latter setting, the competitive ratio depends on the number of jobs and their sizes [17]. Thus, allowing for speed scaling, the ever popular SRPT is shown to be robust in the multiple server setting.

• With speed scaling, we also derive some lower bounds for the immediate dispatch case when the job has to be assigned to a server instantaneously on its arrival and cannot be migrated across servers, though preemption within a server is allowed. Under this setting, we show that greedy routing policies, that assign a new job to the currently least loaded server or to the historically least loaded server have a competitive ratio of at least . Moreover, even when immediate dispatch is not necessary (i.e., jobs can wait in a common queue), but job migration across servers is not allowed, we show that the competitive ratio of SRPT is at least .

• For the special case where all jobs have unit size, we show that round robin (RR) routing is optimal, and the best known competitive ratio results on speed scaling to minimize the flow-time plus energy in the single server setting apply in the multiple server setting as well.

• We also consider the stochastic setting, where jobs arrive according to a Poisson process with i.i.d. sizes. This case turns out be significantly easier than the worst case; we show that with (), random routing and a simple gated-static speed scaling algorithm achieves a constant competitive ratio, e.g., for

## Ii System Model

Let the input consist of jobs, where job arrives (is released) at time and has work/size . There are homogenous servers, each with the same power function where denotes the power consumed while running at speed . Any job can be processed by any of the servers.

The speed is the rate at which work is executed by any of the server, and amount of work is completed in time by any server if run at speed throughout time . A job is defined to be complete at time if amount of work has been completed for it, possibly by different servers. We assume that preemption is allowed, i.e., a job can be suspended and later restarted from the point at which it was suspended. Moreover, we also assume that job migration is allowed, i.e., if a job is preempted it can be processed later at a different server than the one from which it was preempted. Thus, a job can be processed by different servers at different intervals, but at any given time it can be processed by only server, i.e., no job splitting is allowed. The flow time for job is (completion time minus the arrival time) and the overall flow time is . From here on we refer to as just the flow time. Note that , where is the number of unfinished jobs at time . Thus, flow time can also be interpreted as the cumulative holding cost, where instantaneous holding cost at time equals

Let server run at speed at time . The energy cost is defined as summed over the flow time. Choosing larger speeds reduces the flow time, however, increases the energy cost, and the natural objective function that has been considered extensively in the literature is the sum of flow time and energy cost, which we define as

 C=∫n(t)dt+∫m∑k=1P(sk(t))dt.\lx@notefootnoteItisalsonaturaltotaketheobjectivetobealinearcombinationofflowtimeandenergy,i.e.,$∫n(t)dt+β∫m∑k=1P(sk(t))dt,$where$β>0$weighstheenergycostrelativetothedelaycost.However,notethatsincethefactor $β$maybeabsorbedintothepowerfunction,wewillworkwiththeobjective (???)withoutlossofgenerality. (1)

Any online algorithm only has causal information, i.e., it becomes aware of job only at time . Any online algorithm with multiple servers has to make two causal decisions: routing; that specifies the assignment of jobs to servers, and scheduling; that specifies a job to be processed by each server and at what speed at each time. Let the cost (1) of an online algorithm be . Moreover, let the cost of (1) for an offline optimal algorithm that knows the job arrival sequence (both and ) in advance be . Then the competitive ratio of the online algorithm for is defined as

 cA(σ)=CA(σ)COFF(σ), (2)

and the objective function considered in this paper is to find an online algorithm that minimizes the worst case competitive ratio

 c⋆=minAmaxσcA(σ). (3)

We will also consider stochastic input where both and are chosen stochastically, in which case our definition for competitive ratio for will be

 cA=E[CA]E[COFF], (4)

where the expectation is with respect to the stochastic input; see Section V for the details. Correspondingly, the goal is to come up with an online algorithm that minimizes

In Sections III to IV, we study the worst-case setting, and present the results for the stochastic setting in Section V.

## Iii Worst Case Competitive Ratio: Upper Bounds

In this section, we present our results on constant competitive policies for scheduling and speed scaling in a multi-server enviroment. We propose an online policy that performs SRPT scheduling, where the instantaneous speed of each server is a function of the number of outstanding jobs in the system. We prove that this policy is constant competitive for a broad class of power functions. Specifically, the competitive ratio depends only on the power function, but not on the number of jobs, their sizes, or the number of servers.

### Iii-a SRPT Algorithm

In this section, we consider the SRPT algorithm for routing, and analyze its competitive ratio when the server speeds are chosen as follows. Let and denote the number of unfinished jobs with the SRPT algorithm and (the offline optimal algorithm) respectively, at time Moreover, let and be the set of active jobs with the SRPT algorithm and respectively. Recall that the SRPT algorithm maintains a single queue and serves the shortest jobs at any time

The speed for job with the SRPT algorithm is chosen as

 sk(t)={P−1(n(t)m) if n(t)≥m,P−1(1),         otherwise. (5)

The above speed scaling rule can be interpreted as follows. Under (5), i.e., the instantaneous power consumption is matched to the instantaneous job holding cost.

Our main result (Theorem 1) is proved under the following assumption on the power function.

###### Assumption 1.

is differentiable, strictly increasing, and strictly convex, such that and

###### Remark 1.

It is possible to relax Assumption 1 to allow for almost arbitrary power functions (including non-convex functions and those associated a finite maximum speed) by adapting the arguments in [15]; see, for example, [23]. Since these arguments are well understood, we do not repeat them here. The main takeaway in the context of the present paper is that Assumption 1 is not restrictive, and that a -competitive algorithm under Assumption 1 can be extended to obtain a -competitive algorithm under an arbitrary power function for

We are now ready to state our main result, which shows that our SRPT algorithm is constant competitive.

###### Theorem 1.

Under Assumption 1, the SRPT algorithm with speed scaling (5) is -competitive, where

 c=P(2−1/m)(2+2P−1(1)max(1,P(¯s))).

Taking for the competitive ratio equals To prove Theorem 1, we use a potential function argument, where the potential function is defined as follows. Let and denote the number of unfinished jobs under and the algorithm, respectively, with remaining size at least . In particular, and . Let

 d(t,q)=max{0,n(t,q)−no(t,q)m}.

Define

 Φ1(t)=c1∫∞0f(d(t,q))dq,

where and , (this means where ), and

 Φ2(t)=c2∫∞0(n(t,q)−no(t,q))dq.

Consider the potential function

 Φ(t)=Φ1(t)+Φ2(t). (6)

The part of the potential function is a multi-server generalization of the potential function in [15], while the part is novel. Let the speed of job under at time  be . Suppose we can show that for any input sequence

 n(t)+∑k∈A(t)P(sk(t))+dΦ(t)dt≤c(no(t)+∑k∈O(t)P(~sk(t))) (7)

almost everywhere and that satisfies the following boundary conditions (proved in Proposition 5; see Appendix A):

1. Before any job arrives and after all jobs are finished, , and

2. does not have a positive jump discontinuity at any point of non-differentiability.

Then, integrating (7) with respect to , we get that

 ∫n(t)+∑k∈A(t)P(sk(t))≤∫c(no(t)+∑k∈O(t)P(~sk(t))),

which is equivalent to showing that for any input as required.

The intuition for the form of the competitive ratio in Theorem 1 is as follows.

###### Lemma 1.

[19] Without speed scaling, where the speed of each server is fixed to be unity for all times, and the objective is to only minimize the flow-time (total delay), that follows SRPT is -approximate with respect to .

###### Remark 2.

For proving Theorem 1 via showing that (7) is true for some , we assume that also uses SRPT with arbitrary speeds at time that can depend on future job arrivals, since enforcing to use SRPT helps in proving (7). From Lemma 1, it follows that with speed scaling, -SRPT ( that is constrained to perform SRPT scheduling) is -competitive with respect to , since following SRPT can scale the speed up by a factor at all times, and get exactly the same flow-time as the , by paying an extra multiplicative energy cost of . Therefore, we show that SRPT with speed scaling as in (5) is -competitive with respect to -SRPT by showing (7), to get the final result that it is -competitive with respect to itself.

For smaller values of the result of Theorem 1 can be further improved for the special case of power-law power functions, as described in the next theorem.

###### Theorem 2.

With and for any , the SRPT-based algorithm with speed scaling (5) is -competitive, where

 c=3+22−α.

The proof of Theorem 2 is similar in spirit to that of Theorem 1, but without assuming that follows SRPT. It also uses the same potential function (see (6)), and directly tries to bound the increase in because of processing of the jobs by the algorithm and . The limitation on appears because without enforcing that follows SRPT, we cannot apply a technical lemma (Lemma 8) jointly on the change made to by the algorithm and the , but individually. The improvement in competitive ratio compared to Theorem 1 results because of not enforcing to follow SRPT, thereby saving on the penalty of The proof of Theorem 2 is provided in Appendix C, while the remainder of this section is devoted to the proof of Theorem 1.

###### Proof of Theorem 1.

In light of Lemma 1 and Remark 2, we assume throughout this proof that performs SRPT scheduling, and additionally include a factor of in the competitive ratio. For simplicity, we refer to the -SRPT algorithm as simply throughout this proof.

In the following, we show that (7) is true for a suitable choice of To show (7), we bound via individually bounding and in Lemmas 2 and 3 below. Note that it suffices to show that (7) holds at any instant  which is not an arrival or departure instant under the algorithm or For the remainder of this proof, consider any such time instant For ease of exposition, we drop the index from and since only a fixed (though generic) time instant is under consideration.

###### Lemma 2.

For ,

 dΦ1/dt≤ c1n0−c1n+c1(m−12)+c1∑k∈OP(~sk),

while for

 dΦ1/dt ≤c1no−c1n(n+1)2m+c1∑k∈OP(~sk)
###### Lemma 3.

 +c2∑k∈Omax{P(¯s),P(~sk)}

Using Lemmas 2 and 3 (proved in Appendix B), we now prove (7) by considering the following two cases:

[Case 1: ]

 (a)≤ n+n+c1no−c1n+c1(m−12)+c1∑k∈OP(~sk) −c2mP−1(1)+c2∑k∈Omax{P(¯s),P(~sk)} ≤ (c1+c2)∑k∈OP(~sk)+(c1+c2P(¯s))n0+n(2−c1) +[c1(m−12)−c2mP−1(1)] (b)≤ (c1+c2max(1,P(¯s)))(no+∑k∈OP(~sk))

Here, follows from Lemmas 2 and 3, and since when (see (5)), while follows by setting and

[Case 2: ]

 (a)≤ n+n+c1no−c1n(n+1)2m+c1∑k∈OP(~sk)−c2nP−1(1) +c2∑k∈Omax{P(¯s),P(~sk)} ≤ (c1+c2max(1,P(¯s)))(no+∑k∈OP(~sk))+n(2−c2P−1(1)) (b)≤ (c1+c2max(1,P(¯s)))(no+∑k∈OP(~sk))

Once again, follows from Lemmas 2 and 3, and since when (see (5)), while follows by setting

This proves (7) for

 c=c1+c2max(1,P(¯s))=(2+2P−1(1)max(1,P(¯s))).

In the next section, we consider a special case when all jobs have unit size, but their arrival instants are still worst case, for which we can improve the competitive ratio guarantees.

### Iii-B Equal Sized Jobs

Assume that all jobs have equal size, which is taken to be  without loss of generality. There are servers and jobs are assigned on arrival to one of the servers for service. refers to the offline optimal policy. We propose the following policy . Each job on its arrival is assigned to servers in a round-robin fashion, and each server uses speed , where is the number of unfinished jobs that have been assigned to server .

###### Theorem 3.

With unit job sizes, under Assumption 1, is -competitive.

###### Proof.

In Proposition 1, we show that when all jobs are of unit size, follows round robin scheduling. Thus, and see the same set of arrivals on each server. The result follows from [8], which shows that choosing speed for a single server system is a -competitive. ∎

###### Proposition 1.

With unit job sizes, under Assumption 1, performs round robin dispatch across servers.

###### Proof.

Let us assume that can hold arriving jobs in a central queue before dispatch to one of the servers. It suffices to show that even in this expanded space of policies, can be assumed to perform round robin dispatch without loss of optimality (WLO).

1. From the convexity of the power function, it follows that serves each job at a constant speed. Labeling jobs in the order of their arrival, let denote the speed at which job  is served.

2. WLO, we may assume that dispatches jobs for service in a FCFS manner.

Claim 1: WLO, completes jobs in the order of their arrival.

It follows from Claim 1 that can be assumed to perform round robin WLO.

Proof of Claim 1: Let denote the time when job  begins service and let denote the time when the same job completes service. Suppose the claim does not hold, i.e., there exist where such that We now demonstrate an alternative power allocation that is strictly better for .

Note that Let denote the remaining work jobb  at time Clearly, implies that Fix such that

 sjδ+si(1sj−δ)=r. (8)

Consider the following power allocation:

1. Starting at time job  is served at speed for time units, and at speed for time units

2. Starting at time job  is served at speed for time units, and at speed for time units

From (8), it is not hard to see that under this new power allocation, the departure instants of jobs  and are interchanged, i.e., job  completes at time whereas job  completes at time Moreover, under the above power allocation, the cost of remains unchanged. Indeed, the increase in the delay cost of job  is exactly compensated by the decrease in the delay cost of job  Moreover, the energy cost remains unchanged, and the cost associated with all remaining jobs remains unchanged as well (we simply interchange all subsequent dispatches between the servers serving jobs  and ).

Now, from the convexity of the power function, it follows that we can strictly decrease the energy cost of by running jobs  and at constant speeds from time , such that the completion times remain unchanged.

This gives us a contradiction, and completes the proof of the claim. ∎

## Iv Worst Case Competitive Ratio: Lower bounds

In the previous section, we showed that while SRPT scheduling is not constant-competive in a multi-server environment without speed scaling, it can be made constant-competitive when speed scaling is allowed. However, one issue with implementing SRPT on multiple servers is the need for job migration. In this section, we show that a broad class of greedy non-migratory policies is not constant-competitive.

We begin by stating the following preliminary result.

###### Lemma 4.

On a single server, consider a single burst of jobs, with sizes The cost incurred by in processing this burst equals where the constant depends on

The proof of Lemma 4 follows by direct computation of the optimal speeds for each job that minimize the flow time plus energy cost (1).

### Iv-a Greedy algorithms

###### Lemma 5.

Consider the class of policies that routes an incoming job to a server with the least amount of unfinished workload. All policies in this class have a competitive ratio that is

###### Proof.

Consider the following instance: A burst of jobs, each having size arrives at time , and another burst of jobs, each having size 1 arrives at time

Any workload-based greedy policy would assign the first jobs of size to different servers, and the jobs of size 1 to the remaining server. By Lemma 4, the cost incurred by any such algorithm is at-least

 c(m−1)w+cw∑k=1k1−1/α≥c(m−1)w+c′w2−1/α.

Consider now an algorithm that assigns the first jobs of size to different servers and then distributes the jobs of size 1 uniformly among all servers. The algorithm then performs scheduling and speed scaling on each server as per single server . The cost incurred by (which upper bounds the cost under ) equals (using Lemma 4)

 cw/m∑k=1k1−1/α+c(m−1)⎡⎣w+w/m+1∑k=2k1−1/α⎤⎦ ≤c′′m(wm)2−1/α+cmw

Now, setting for large enough we see that the competitive ratio of any workload-based greedy policy is

It follows from the proof of Lemma 5 that the competitive ratio of any policy that routes an incoming job to a server that has been assigned the least aggregate workload so far (including completed as well as queued workload) is also

###### Lemma 6.

Consider the class of policies that route an incoming job to a server with the least number of queued jobs (join the shortest queue (JSQ)). All policies in this class have a competitive ratio that is

###### Proof.

Consider the following instance: jobs arrive in quick succession, causing any JSQ-based policy to perform round robin routing. Every th arriving job has size while all remaining jobs have size 1.

Thus, under any JSQ-based policy, one server would get jobs of size routed to it, whereas all other servers would get jobs of size 1. Thus, the cost under any such policy is at least .

Consider an algorithm that routes the jobs uniformly across the servers, such that each server gets jobs of size 1, and one job of size Post routing, performs scheduling and speed scaling on each server as per single server . The cost incurred by (which upper bounds the cost of ) is thus .

Now, setting for large enough we see that the competitive ratio of any JSQ-based policy is

It is also clear from the above proof that any policy that performs round robin routing would have a competitive ratio that is

### Iv-B SRPT-based algorithms

In this section, we consider the following class of non-migratory SRPT-based policies: Let denote the least remaining processing time among all jobs queued at server  If server  is idle at time , then set Consider now a job of size arriving into the system at time If the set is non-empty, then the job is assigned to a server from this set. Else, the job is assigned to any server, or held in a central queue. Each server may preempt between jobs queued at that server. But jobs once assigned to a certain server must complete service at that server, i.e., migration is not allowed.

###### Lemma 7.

Consider the class of non-migratory SRPT-based policies described above. All policies in this class have a competitive ratio that is

###### Proof.

Consider the following instance: jobs of size 1 arrive at time 0, and jobs of sizes arrive in quick succession right after.

Any non-migratory SRPT-based policy would route the jobs of unit size to different servers, and the next jobs to the remaining server. Thus, the cost incurred is at least .

Consider next a policy that routes the first unit sized jobs to different servers, and distributes the next jobs across all servers. Post routing, performs scheduling and speed scaling on each server as per single server . The cost incurred by (which upper bounds the cost of ) is thus at most . Now, setting for large enough we see that the competitive ratio of any non-migratory SRPT-based policy is

## V Stochastic Input

In this section, we consider a stochastic model for the job arrivals. Jobs arrive according to a Poisson process of rate and have i.i.d. sizes. Let denote a generic job size. We assume that The load, which is the rate at which work is submitted to the system, is given by

The performance metric under consideration is the stationary variant of the flow time plus energy metric considered for the worst-case analysis, i.e.,

 C=E[T]+E[E], (9)

where denotes the steady state response time, and denotes the energy required to serve a job in steady state.333Of course, for this metric to be meaningful, we restrict attention to policies that are regenerative, and thus have a meaningful steady state behavior. We also note that it is straightforward to extend the results of this section to a metric that is a linear combination of and In the present section, we restrict attention to power functions of the form where

In the following, we generalise a result proved in [6] for the single server setting to the multi-server setting. Specifically, we show that a policy that routes each job randomly, and runs each server at a constant speed when active, is constant competitive. Note that the speed chosen depends on the load which needs to be known or learnt. Policies of this type are referred to in [6] as gated static policies.

Specifically, the proposed algorithm is the following: Arriving jobs are routed to any server unifomly at random. Each server performs processor sharing (PS) scheduling using a fixed speed which is the optimal static speed to minimize the metric (9) on that (single) server.

We begin our analysis by deriving a lower bound on the performance of any routing and speed scaling policy.

### V-a Lower Bound

Let denote the time-averaged speed of server We have

 λC ≥m∑i=1P(E[si])≥m∑i=1P(Λ/m) =Λαmα−1. (10)

The first inequality above is an application of Jensen’s inequality, while the second exploits the convexity of the power function, given that (for stability).

Next, we derive an alternate lower bound on . Consider the case when only a single job of size arrives. This job is run at a constant speed that minimizes its response time plus energy consumption, i.e., This yields the following lower bound on the performance of any algorithm

 λC≥Λα(α−1)(1α−1). (11)

Combining (10) and (11) gives us

 (12)

Next, we characterize the performance of the proposed policy and bound its competitive ratio.

### V-B Performance under policy S

Under random routing, each server sees a Poisson arrival process with rate Thus, the performance under metric (9) when operating each server at speed when active with PS scheduling is given by

 c(s)=E[X](s−Λ/m)+E[X]sP(s). (13)

Thus, and the performance of the algorithm is given by

###### Theorem 4.

In the stochastic input setting, the competitive ratio of the algorithm is a constant that depends on but not on the job size distribution, or .

###### Proof.

The proof follows by comparing the performance with the lower bound (12) that holds for any algorithm.

Indeed, for any algorithm

 CSCA ≤λc(s∗(Λ))max(Λα(α−1)(1α−1),Λαmα−1) ≤λc(1+Λ/m)max(Λα(α−1)(1α−1),Λαmα−1) =Λ+Λ(1+Λ/m)α−1max(Λα(α−1)(1α−1),Λαmα−1) ≤Λ+Λ(1+Λ/m)α−1min(1,α(α−1)(1α−1))max(Λ,Λαmα−1) ≤1+2α−1min(1,α(α−1)(1α−1)).

The above bound can be tightened for the case since can be computed explicitly in this case.

###### Corollary 1.

For , in the stochastic input setting, the competitive ratio of the algorithm is at most .

###### Proof.

For , from (13), we get and thus, the performance under the algorithm satisfies

 λC(S)=Λ2/m+2Λ.

Now, from (12), under any algorithm , which implies the statement of the corollary. ∎

## Vi Concluding Remarks

In this paper, we show that SRPT can be made constant competitive in the multi-server speed scaling environment with respect to the flow time plus energy metric. This presents an interesting contrast to the case when server speeds are constant, where it is known that SRPT has an unbounded competitive ratio with respect to the flow time metric. We also show that the multi-server speed scaling problem is easy in the absence of job size variability; simple round robin dispatch in conjunction with a single-server speed scaling rule is near-optimal. Finally, we show that a broad class of policies based on greedy non-migratory dispatch rules do not admit a constant competitive ratio.

In contrast, in the stochastic setting, we show that random routing, along with a gated static speed setting is constant competitive. However, the required speed is a function of the load, which needs to be learnt.

While SRPT is a well studied scheduling policy in the multiple server setting, one issue with implementing SRPT in practice is the need for migration. Considering that there is a cost associated with migration of a job across servers in practice, a natural generalization would be to include this cost of migration in the performance metric. How to optimally tradeoff flow time, energy consumption, and migration costs is an interesting open problem for the future. However, it is easy to bound the performance of the SRPT-based speed scaling algorithm proposed in this paper accounting for migration costs. Indeed, in a job sequence consisting of jobs, SRPT performs at most migrations. Thus, assuming a fixed cost of each migration, our SRPT-based algorithm remains constant competitive with respect to the flow time plus energy plus migration cost metric if one assumes a lower bound on the size of each job; in this case, the migration cost is at most a constant factor of the flow time.

Finally, we note that while there is a considerable literature on speed scaling in parallel multi-server environments, we are not aware of any work on speed scaling in tandem queueing systems, and more generally, on a queueing network. Coming up with constant competitive speed scaling algorithms in these settings is an interesting avenue for future work.

## Appendix A Proof of Proposition 5

###### Proposition 5.

as defined in (6) satisfies boundary conditions (1) and (2).

###### Proof.

Note that Condition (1) is satisfied; before any job is released and after all jobs are finished, since and for all . Whenever a new job arrives/is released, and does not change for any , so remains unchanged. Similarly, whenever a job is completed by the algorithm or , or is changed for only a single point of which does not introduce a discontinuity in Thus, Condition (2) is also satisfied. ∎

## Appendix B Proof of Lemmas 2 and 3

To prove Lemmas 2 and 3 we need the following technical lemma from [15].

###### Lemma 8.

[Lemma 3.1 in [15]] For ,

 Δ(x)(−sk+~sk)≤ (−sk+P−1(x))Δ(x)+P(~sk)−x.
###### Proof of Lemma 2.

Throughout we assume that is following SRPT. Let and denote, respectively, the size of the  shortest job in service under the algorithm and

Case 1: Suppose that is serving jobs, where Define , and . The function satisfies the following properties.

1. as

2. is piecewise constant and left-continuous, with a downward jump of 1 at and an upward jump of 1 at 444This assumes all jobs being served by the algorithm and have distinct remaining sizes. If, for example, jobs under have the same remaining size then would have an upward jump of at

Consider the change in due to ( for ):

 dΦ1= c1r∑i=1[f(n(qo(i))−no(qo(i))+1m) (a)= c1r∑i=1Δ(n(qo(i))−no(qo(i))+1m)~sidt (b)≤ c1r∑i=1Δ(~n(qo(i))−~no(qo(i))+1m)~sidt = c1r∑i=1Δ(g(qo(i))+1m)~sidt (14)

In writing we take for   holds since for and

Next, consider the change in due to the algorithm ( for ):

 dΦ1= c1m∑i=1[f(n(q(i))−1−no(q(i))m) −f(n(q(i))−no(q(i))m)]sidt = −c1m∑i=1Δ(n(q(i))−no(q(i))m)sidt (a)≤ −c1r∑i=1Δ(~n(q(i))−~no(q(i))m)sidt −c1m∑i=r+1Δ(n(q(i))−no(q(i))m)sidt (b)≤ −c1r∑i=1Δ(g(q(i))m)sidt −c1m∑i=r+1Δ(n−i+1−nom)sidt (15)

Here, holds because for and for all   follows since 555 if the algorithm has exactly one job with remaining size If multiple jobs have the same remaining size under the algorithm, then we have

We now combine (14) and (15) to capture the overall change in In doing so, we make the following crucial observation.

Claim 1: For each