1 Introduction
Starting with the classical work [26], the speed scaling problem has been widely considered in literature, where there is a single/parallel bank server with tuneable speed, and the canonical problem is to find the optimal service speed/rate for servers that minimizes a linear combination of the flow time (total delay) and total energy [1], called flow time plus energy, where flow time is defined as the sum of the response times (departure minus the arrival time) across all jobs.
Many interconnected practical systems such as assembly lines, flow shops and job shops in manufacturing, traffic flow in a network of highways, multihop telecommunications networks, and clientserver computer systems, however, are better modelled as network of queues/servers. Another important example is service systems with server specific precedence constraints, where jobs have to be processed in a particular order of servers. In such systems, for each job, service is defined to be complete once it has been processed by a subset of servers, together with a permissible order on service from different servers.
The simplest such network is a server tandem setting, where there are servers in series, and each object/job has to be processed by all the servers in a serial order. With tandem servers, we consider the speed scaling problem of minimizing flow time plus energy, when the speed/service rate of each server is tuneable and there is an associated energy cost attached to the chosen speed. The control variables here include scheduling, i.e., which job to run on each server, and speed scaling, i.e., which speed to operate each server at. In the worst case setting, the arrival sequence is arbitrary, and possibly adverserially determined. In this case, the performance metric is the competitive ratio, that is defined as the maximum of the ratio of the cost of the online algorithm and the optimal offline algorithm that is allowed to know the entire input sequence ahead of time, over all possible inputs. In the stochastic setting, job arrivals occur according to a stochastic process. Here, the cost of an algorithm is the sum of the steady state averages of response time and energy consumption per job. The competitive ratio of an algorithm is in turn the ratio of its cost to that of the optimal algorithm (that admits the above steady state averages). In both settings, the goal is to design algorithms that have a small competitive ratio.
1.1 Related Work
1.1.1 Arbitrary Input Case
With arbitrary input, where job arrival times and sizes are arbitrary (even chosen by an adversary), for speed scaling with a single server or bank of parallel servers, two classes of problems have been studied: i) unweighted and ii) weighted, where in i) the delay that each job experiences is given equal weight in the flow time computation, while in ii) it is scaled by a weight that can be arbitrary.
The weighted setting is fundamentally different that the unweighted one, where it is known that constantcompetitive online algorithms are not possible [6], even for a single server, while constant competitive algorithms are known for the unweighted case, even for arbitrary energy functions, e.g., the best known competitive algorithm [3]. For more prior references on speed scaling, we refer the reader to [3, 8].
In addition to a single server, speed scaling problem has also been considered extensively for a parallel bank of servers, where there is a single queue and multiple servers. With multiple servers, on an arrival of a new job, the decision is to choose which jobs to run on the multiple servers, by preempting other jobs if required, and what speed [19, 14, 16, 21, 15]. The homogenous server case was studied in [21, 14], i.e., power usage function is identical for all servers, while the heterogenous case was addressed in [16, 15, 11], where power usage function is allowed to be different for different servers.
1.1.2 Stochastic Input Case
Under stochastic input, research on two tandem servers with variable speed was initiated in the classical work of [23] [17], that established that the optimal service rate of the first queue is monotonic in the joint queue state, and is of the bangbang type. These results have also been extended for any number of tandem servers when each server has exponential service distribution [24]
. These type of problems belong to the class of control of Markov decision processes for which general results have been also derived
[13]. Typically, in early works, the objective function did not include an energy cost for increasing the speed of the service rate. To reflect the energy cost, [25] considered the same problem as in [23] in the presence of an average power constraint. Analytical results in this area have been limited to structural results, such as the monotonicity results, and that too for special input/service distributions, and no explicit optimal service rates are known. In the stochastic setting, with multiple parallel servers, the flow time plus energy problem with multiple servers is studied under a fluid model [5, 22] or modelled as a Markov decision process [12], and near optimal policies have been derived.1.2 Our work
We consider the speed scaling problem in the tandem network setting, where there are multiple servers () in series. Each external job arrives at server , and is defined to be complete once it has been processed by each of the servers in series. Each server has an identical power (energy) consumption function , i.e., if the server speed is , then power consumed is .
1.2.1 Arbitrary Input
We consider the arbitrary input setting, where jobs can arrive at arbitrary time on server , arrival times possibly chosen by an adversary. However, we assume that each job has the same size/or requirement on any of the servers. Even for a single server setting, initial progress was made for unit sized jobs [1, 20, 7, 10, 2], which was later generalized for arbitrary job size. In the sequel, it is evident that the considered problem is challenging even with unit sized jobs. The motivation to consider the arbitrary input setting is twofold : i) that it is the most general, ii) that even if one assumes that the external arrivals to server have a nice distribution, with speed scaling by each of the server, the internal arrivals (arrivals at server corresponding to departures from server ) need not continue to have the same nice distribution. Under the arbitrary input setting, we consider the unweighted flow time + energy as the objective function, and the problem is to find an online algorithm with minimum competitiive ratio.
The proposed algorithm ensures that there is at most one outstanding job with all servers other than server . Let be the number of outstanding jobs with server , and let the total number of servers with outstanding jobs (called active) be excluding the first server. Then the algorithm runs each active server (including server ) at the same speed of . Thus, the total power consumed across all servers is equal to the number of total outstanding jobs plus , that could be spread across servers. The main result of this paper is as follows.
With unit sized jobs, and identical power consumption function for all servers, the competitive ratio of the proposed algorithm is at most where and . For , and for , , making the competitive ratio at most .
Even though there has been large number of papers on online speed scaling algorithms with a single server or with multiple parallel servers, as far as we know, there is no work on competitive algorithms for a tandem server case for the objective of flow time plus energy. We would like to point that there is work on only energy efficient routing for networks [4, 9] without any delay consideration.
With a tandem network, the main technical difficulty in obtaining results for flow time plus energy with the arbitrary input case is that the external arrivals happen at the same time for any algorithm and the optimal offline algorithm into server , but because of dynamic speed scaling, the internal arrivals at intermediate servers (departures from previous server) are not synchronized for any algorithm and the (that has noncausal information about future job arrivals). Thus, a sample path result that is needed in the arbitrary input case is hard to obtain.
We overcome this difficulty by proposing a potential function that has positive jumps (corresponding to movement of jobs in consecutive servers) in contrast to typical approach of using potential function that has no positive jumps. Consequently, to derive constant competitive ratio results, we upper bound the sum of the positive jumps and relate that to the cost of the . Moreover, the potential function is only a function of the number of jobs with the in the first server and not in any subsequent servers, since controlling and synchronizing the jobs of the algorithm and the in servers other than the first is challenging. We show in Remark 3, that a simple/natural extension of the the popular speed scaling algorithm [8] for a single server, does not yield any useful bound on the competitive ratio with tandem servers. Our result is similar in spirit to the results of [16, 11] for parallel servers, where the competitive ratio also depends on . Compared to the prior work on speed scaling with single(parallel) server(s) [8, 16, 11], we make a nontrivial extension (even though our results require unit sized jobs) and provide constant competitive ratio results for tandem servers, that has escaped analytical tractability for long.
1.2.2 Stochastic Input
In the stochastic setting, we consider a more general tandem network, with a parallel bank of servers at each stage. The external arrivals to stage
are assumed to follow a Poisson distribution. We consider a simple ’gated’ static speed algorithm and random routing among different servers in each stage that critically ensures that the job arrivals to subsequent stages are also Poisson
[18]. We show that the random routing and gated static speed policy has a constant competitive ratio that only depends on the power functions, and is independent of the workload and the number of servers. To contrast our work with prior work on stochastic control of tandem servers [23, 17], the novelty of our work is that we are able to give concrete (constant factor) competitive ratio guarantees, while in prior work only structural results were known that too in the stochastic input setting.2 System Model
Let the input consist of jobs, where job arrives (released at) at time and has work/size , to be completed on server . There are homogenous servers in series/tandem, each with the same power function , where denotes the power consumed by any server while running at speed . Typically, with . Each job has to be processed by each of the servers, in sequence, i.e. server can process a job only after it has been completely processed by server and departed from it. Following most of the prior work in this area, we assume that each server has a large enough buffer and no jobs are dropped ever.
The speed is the rate at which work is executed by any of the server, and amount of work is completed in time by any server if run at speed throughout time . A job is defined to be complete at time on server if amount of work has been completed for it on server . The flow time for job is defined as ( is the completion time of job on the last () server minus the arrival time) and the overall flow time is . From here on we refer to as just the flow time. Note that , where is the number of unfinished jobs (spread across possibly different servers) at time . We denote the corresponding variables for the the optimal offline algorithm by a subscript or superscript .
Let server run at speed at time . Then the energy cost for server is defined as , where is strictly convex, nondecreasing, differentiable function at . Natural example of clearly satisfies all these conditions. Following [8], these special conditions on can be relaxed completely, without affecting the results, and more importantly work even when maximum speed is bounded (see Remark 6). Total energy cost is summed over the flow time.
Choosing larger speeds reduces the flow time, however, increases the energy cost, and the natural objective function that has been considered extensively in the literature is a linear combination of flow time and energy cost, which we define as
(1) 
Note that there is no explicit need for considering the weighted combination of the two costs in (1) since a scalar multiple can be absorbed in the power function itself.
3 Arbitrary Input
Any online algorithm only has causal information, i.e., it becomes aware of job only at time . Using only this causal information, any online algorithm has to decide at what speed each server should be run at at each time. Let the cost (1) of an online algorithm be , and the cost for the that knows the job arrival sequence (both and ) in advance be . Then the competitive ratio of the online algorithm for is defined as
(2) 
and the objective function considered in this paper is to find an online algorithm that minimizes the worst case competitive ratio .
A typical approach in speed scaling literature to upper bound (of ) the competitive ratio is via the construction of a potential function and show that for any input sequence
(3) 
almost everywhere and that satisfies the following boundary conditions,

Before any job arrives and after all jobs are finished, , and

does not have a positive jump discontinuity at any point of nondifferentiability.
Then, integrating (3) with respect to , we get that
which is equivalent to showing that for any input as required.
Since any online algorithm is only allowed to make causal decisions, thus at any time , the speed chosen by an online algorithm for any server and the can be different. Because of this, the main challenge when there are tandem servers, is that the internal arrivals at server that corresponds to departures from server ( other than the last) can happen at different times for the algorithm and the . Thus constructing a potential function and ensuring that the boundary conditions are satisfied presents a unique challenge. With a single (or parallel bank) server such a problem does not arise since there, arrivals only happen externally at the same time for both the algorithm and the . Thus, instead of finding a potential function that does not have a positive jump discontinuity, we propose a potential function for which we can control how large the positive jump discontinuity and compare it with cost of the . Let the new boundary conditions be,

Before any job arrives and after all jobs are finished, , and

Let increase by amount at the discontinuous point. Let .
Then, integrating (3) with respect to , we get that
(4) 
which is equivalent to showing that for any input as required.
The main novel contribution of this paper is the construction of a potential function for tandem server settings with positive jumps, where we can upper bound , and importantly which is only a function of the number of jobs with the on the first server (which arrive together for the algorithm as well) and not on subsequent servers, since controlling them is far too challenging.
Job sizes: For a single server setting, constant competitive algorithms have been derived independent of the job sizes [8]. Considering arbitrary job sizes in a multiple tandem server setting is more complicated (technical difficulty is described in Remark 3.1) and we consider homogenous job size setting, where all job sizes are identical across all jobs and all servers . Without loss of generality, we infact let .
We here discuss briefly why it is nontrivial to extend the results for single server setting to tandem server setting. Let . Consider a competitive algorithm for a single server with equal job size, e.g. [8] that chooses speed , where is the number of outstanding jobs. There are two ways to use this in the tandem server setting. Let be the number of jobs on server . Either we can replicate the speed of jobs as seen on server () on server , or use autonomously on both the servers. We argue next that both these choices are not very useful.
Speed Replication: Let job arrive at time and depart at , and during this time the speed chosen by server to serve job be . Replicating the speed profile on the second server as well does not result in competitive algorithm for the twoserver problem. What can happen is that consider a time where a job starts its service at server and let that job was alone in server throughout the time it spent in server , i.e. its speed profile [8]. Let the next job arrive at into server . The speed of job in server is , and because of replication of on server for job , job is also being processed at speed . Let at , new jobs arrive in server , because of which the speed of job is increased to . Thus, with the server setting, job will be processed fast and will have to wait behind job in server since job ’s speed is fixed at . Such a problem is avoided in a single server system since at time job has departed the system. Thus, with the twoserver system, the cost for job could be more than two times compared to a single server system.
Autonomous: Consider an input, where jobs of unit size arrive at time into server . Then choosing , server runs slower compared to server until jobs have been processed by server and are available at server . Thus, jobs start accumulating in server ’s queue, and consequently, each of jobs have to wait behind jobs in server for sufficient time before they are processed by server , entailing a large flow time + energy cost. This argument on its own does not mean that the competitive ratio of this algorithm is poor, since the inherent cost could be large even with the with this input. However, for this input, instead a simple algorithm ( can only do better) that chooses for both avoids any waiting for any job on server and can be shown to have at most twice the flow time + energy cost of the server . Thus autonomous speed choice for two servers is also not expected to provide low (or constant) competitive ratio.
3.1 Speed Scaling Algorithm
We begin this section, by first deriving a lower bound on the cost of the . , where .
Proof.
We enhance the as follows to derive a lower bound on its cost. Instead of requiring that processes jobs in series, each incoming job is copied on all servers and a job is defined to be complete, when it is completed by all servers. Thus, allowing to run jobs in parallel. Essentially this will let run jobs at same speed in each of the servers, and have the same number of outstanding jobs on each server. Thus, for the enhanced , the total cost (flow time + energy) is equal to , where is the number of outstanding jobs on server . Thus, we have . ∎
Next, we will compare the performance of the proposed algorithm and the enhanced . Let and the number of outstanding jobs on server with the algorithm and the enhanced (which for succinctness call whenever there is no ambiguity), respectively at time . For the enhanced we only need to consider the number of jobs on server . At time , let be the number of unfinished jobs with on the first server with remaining size at least , while be the number of unfinished jobs with the algorithm on server with remaining size at least . Thus, and .
For server , let , while for server ,
where is the total number of outstanding jobs from server till server . Notably in defining there is no contribution from the unlike in . This is key, since there is no way to control the number of jobs that the has in server and their transitions between servers to .
For the algorithm, a server is defined to be active if it has an unfinished job, i.e., The indicator function for if server is active under the algorithm at time and otherwise. Then is the number of active servers with the algorithm at time , other than server .
For let
and . For server , let
(5) 
while for server ,
(6) 
where
Consider the potential function
(7) 
Algorithm: The speed scaling algorithm that we propose, chooses the following speeds. For server ,
(8) 
For active servers, i.e., servers with
(9) 
The nonactive servers have speed .
With this speed scaling choice, all active servers work at the same speed at each time, and since we are assuming that each job has the same size on all servers, this implies that jobs only wait in server if at all, and are always in process at active servers . Moreover, other than server , all servers have at most outstanding job at any time. The speed choice ensures that the total power used is (or if ) one more than the total number of outstanding jobs in the system.
Comments about the potential function: The basic building blocks and of our proposed potential function are inspired by the potential function first constructed in [8], however, the nontrivial aspect is the choice of including to define the function. Since changes dynamically, the overall construction and analysis is far more challenging.
The proposed potential function is rather delicate and is really the core idea for solving the problem. We discuss its important properties and reasons why a more natural choice does not work as discussed in Remark 3.1. To begin with, note that the denominator in is fixed to be and not which can change dynamically. This is important since can be arbitrarily large, and a decrease in can have an arbitrarily large increase in . However for which is function of , even when decreases, the increase in can be bounded since (choice made by the proposed algorithm) and . The choice of potential function is also peculiar since is spread over all the servers with a normalization factor of (as defined in ). This is needed since the algorithm prescribes an identical speed of for all the servers, and to get sufficient negative drift from the term, it is necessary that , which is true since . If instead we just keep one term for in without the normalization by in , the speed of each server has to be at least to get sufficient negative drift from the term, however, that makes the total power used , which is order wise too large.
The considered potential function (6) for server is not a natural choice. Instead it should really be
by combining the arguments of and into a single function. This choice avoids the increase in when a job moves from server to unlike (6), since in this case , while cancelling each other off. This, however, makes controlling the increase in when decreases, since can be arbitrarily large. The current choice (6) avoid this bottleneck by isolating server from all the other subsequent servers by keeping the terms of server and subsequent servers (6) separate, however, at a cost of incurring positive jumps whenever jobs move from server to
which can be bounded. To eliminate the need for considering different epochs at which the job transition happens between server
and server with the algorithm and the which can result in increase in the potential function, one can consider a following equivalent model. Let on (external) arrival of a new job to server at time , jobs are created with sizes , and the copy with size is sent to the server at time . To model the tandem server constraint, a precedence constraint can be enforced such that any copy of any job cannot start its processing at server unless it has been processed (served and departed) at the server . The precedence constraint, however, brings in a new feature unlike the single server case, that the servers can idle even when they have outstanding jobs, if those jobs have not been processed by preceding servers, which needs to be handled carefully.Following [8], a natural choice for the potential function with this alternate model is , and , and consider the potential function , where , and
and , and , (this means where ). To get the correct negative drift with this potential function, however, requires because of the ’back flow’ (terms of type in which increase the potential function when the algorithm is working on server ) making , and since the competitive ratio at least for all , the resulting competitive ratio turns out to be .
From here on we work towards proving Theorem 1.2.1. The first step in that direction is to bound the increase in the potential function at discontinuous points, which is done as follows.
Taking for all , the total increase in at points of discontinuity is at most . Proof can be found in Appendix 5. The restriction of equal job sizes is essentially needed to prove Lemma 3.1. Since all server speeds are identical, if job sizes are different, jobs will accumulate in servers other than , making it hard to control the increase in when decreases. Let , if is the active server (in increasing order of server index) among the active servers. The proof of Theorem 1.2.1 is based on the following bounds on the potential function drift. Consider any instant when no arrival/departure (including internal transfers) occurs under the algorithm or For server ,
If , then either of the above two cases arise. Moreover, for any active server at time ,
where when server is active and zero otherwise. Proof can be found in Appendix 6.
Next, we consider the cost of the algorithm at any time , and suppress for simplicity. When and , then (Lemma 3.1), and since , the ‘running’ cost (3) from Lemma 3.1 with is
choosing . If , where each active server other than server has speed , then the running cost
choosing . When , then using Lemma 3.1. Moreover, from Lemma 3.1 with , where , the running cost,
choosing . Thus, in both cases, accounting for the discontinuities from Lemma 3.1 with for all since the first boundary condition is trivially met,
which implies that
(10) 
Now we complete the Proof of Theorem 1.2.1.
Proof.
From (10)
(11) 
Recall that . For any job with size , the minimum cost incurred by on processing it on any one server is , where is the speed. Thus, the optimal satisfies , and the optimal cost is . With jobs arriving each with size which have to be processed by each of the servers, a simple lower bound on the cost of is . This implies from (11) that
(12) 
For , and the minimum cost is , and , thus .
∎
4 Stochastic setting
In this section, we move from the worst case setting to the stochastic setting, where the workload is specified by stochastic processes and we evaluate algorithms based on their performance in steady state. We find that the stochastic setting is ‘easier’ than the worst case setting; specifically, we show that a naive routing strategy coupled with a simple ON/OFF (gated static) speed selection is constant competitive. Crucially, the competitive ratio depends only on the power functions, and not on the statistical parameters of the workload or the topology of the queueing system. Moreover, the tandem system we consider in this section is more general that the one considered before—each job needs to be served in tandem layers/phases, where each layer is composed of parallel servers.
Formally, our system model is as follows. The service system is composed of tandem layers of servers, with parallel and identical servers in layer Jobs arrive to layer according to a Poisson process with rate The jobs have to be processed sequentially in the layers (by any server in each layer) before exiting the system. Moreover, we assume that each server is equipped with an (infinite capacity) queue, so that once a job completes service in layer it can be immediately dispatched to any server in layer The service requirement in layer
is exponentially distributed with mean
Job scheduling on each server is assumed to be blind to the service requirements of waiting jobs. The power function for all servers in layer is whereThe performance metric is given by
where and denote, respectively, the response time and energy consumption associated with a job in steady state. We note that the performance metric can be also be expressed as the sum of the costs incurred in each layer:
Here, denotes the steady response time in layer and denotes the steady state energy consumption (per job) in layer ^{1}^{1}1We implicitly restrict attention to the class of policies that admit these stationary averages.
The proposed algorithm () is the following. When a job arrives into layer we dispatch it to a random server in layer chosen uniformly at random. The speed of each server in layer is set in a gated static fashion as when active (and zero when idle), where is the offered load to layer Note that the speed selection requires knowledge (via learning if not available) of the offered load into each layer (unlike the dynamic speed scaling algorithm in (8). This boils down to learning the arrival rate and the mean service requirement, which is feasbile in the stochastic workload setting considered here. Under the proposed random routing and gated static speed selection, the system operates as a (feedforward) Jackson network, with each server operating as an M/M/1 queue [18]. Thus, the arrival process for each layer is also Poisson.
Our main result is the following. Let The algorithm is constant competitive, with a competitive ratio that depends on only the power functions, i.e., on . Specifically, the competitive ratio does not depend the workload parameters , the number of layers or on the number of servers in the different layers
5 Proof of Lemma 3.1
There are 4 possible ways that can give rise to a discontinuity in

Job arriving at server at time . On arrival of a new job which happens only on server , both , and increase by for all . Hence, there is no change to the in this case.

Transfer of jobs between servers under the algorithm (without departure from server ) at time For each job transitioning from server to for , there is potentially a positive jump in because of either increase in or for . In particular, for jobs, there are at most jumps in for , with each jump of size at most
since . Counting for at most such jumps, the total increase in is . Note that for each jump either remains same or increases by . In the above bounding we have taken the worst case, when remains the same. If increases by , then the same bound follows using second part of Lemma 5. Note that transfer of jobs between servers under the without any departure from server has no effect on .

Job departing from server under algorithm at time We consider two subcases. If then In this case,
Here, () follows from first part of Lemma 5, while () is a consequence of:
On the other hand, if then In this case, it is easy to see that
Thus, the departure of a job from the system under the algorithm can result in an upward jump in of at most Choosing for all , the total increase in .

Completion of jobs by Any job completed by on server changes only for thus, keeping the integral to define unchanged for all .
For where
Moreover, for
Proof.
To prove the first statement, we note that
Here, follows from the monotonicity of The second statement of the lemma is trivial:
∎
6 Proof of Lemma 3.1
Our proofs will require the following technical Lemma from [8]. [Lemma 3.1 in [8]] For , then for any function that is strictly increasing, strictly convex, and differentiable,
Proof of Lemma 3.1.
Proof.
Since the statement of the lemma applies to a fixed (though generic) time we shall omit the reference to throughout this proof for notational simplicity. Let and be the size of the job under process at server with the algorithm, and with the on server , respectively. Recall that the speed of all active servers with the algorithm is , while the speed of server with the is .
The main idea of bounding is similar to [8] being specialized for this potential function and the speed choice.
Case 1: If , then we first show that . Note that under this condition, for . Thus, at time , is still at least as much as for . Therefore, does not increase because of processing by . Since processing by the algorithm can only reduce , thus, .
Case 2: If and (since otherwise again ). Because of processing of jobs by the algorithm and the , changes because of reduction in (because of the algorithm) and (because of the ). Then for the algorithm, decreases by for , and the contribution in because of the algorithm is
(13) 
where has been defined after (7).
Comments
There are no comments yet.