A Foreground-Background queueing model with speed or capacity modulation

The models studied in the steady state involve two queues which are served either by a single server whose speed depends on the number of jobs present, or by several parallel servers whose number may be controlled dynamically. Job service times have a two-phase Coxian distribution and the second phase is given lower priority than the first. The trade-offs between holding costs and energy consumption costs are examined by means of a suitable cost functions. Two different two-dimensional Markov process are solved exactly. The solutions are used in several numerical experiments. Some counter-intuitive results are observed.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

05/19/2021

Speed Scaling On Parallel Servers with MapReduce Type Precedence Constraints

A multiple server setting is considered, where each server has tunable s...
12/16/2016

A Generalized Performance Evaluation Framework for Parallel Systems with Output Synchronization

Frameworks, such as MapReduce and Hadoop are abundant nowadays. They see...
05/01/2022

Scheduling for Multi-Phase Parallelizable Jobs

With multiple identical unit speed servers, the online problem of schedu...
07/10/2019

Speed Scaling with Tandem Servers

Speed scaling for a tandem server setting is considered, where there is ...
07/21/2019

Multiple Server SRPT with speed scaling is competitive

Can the popular shortest remaining processing time (SRPT) algorithm achi...
06/10/2020

Product Forms for FCFS Queueing Models with Arbitrary Server-Job Compatibilities: An Overview

In recent years a number of models involving different compatibilities b...
08/18/2021

Modeling Performance and Energy trade-offs in Online Data-Intensive Applications

We consider energy minimization for data-intensive applications run on l...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The topic of controlling the energy consumption of computers has recived considerable attention in recent years. Some modern processors are designed so that the frequency at which they work, and hence the speed at which they execute jobs, can be adjusted dynamically depending on the number of jobs present. Several discrete frequency levels are supported, covering a wide range of possible speeds. The idea is that at light loads the processor would work at lower speed, reducing the energy costs, while at high loads it would speed up, reducing the job holding costs. This approach will be referred to as ‘speed modulation’.

A similar energy-saving technique can be applied in traditional multiprocessor systems. Rather than modulating the speed of individual processors, one could control the number of processors that are working: switch one or more of them off during periods of light loads and back on when the load increases. This will be referred to as ‘capacity modulation’.

We are interested in evaluating the trade-offs arising in connection with both speed modulaton and capacity modulation, with a view to computing optimal operating policies.

Another factor that influences system performance is the job scheduling strategy. The commonly used First-Come-First-Served policy (FCFS, also called First-In-First-Out, or FIFO) performs well when the service times are not very variable, but it is far from optimal when the coefficient of variation is high. In the latter case, it is well known that policies which favour short jobs over long ones have lower holding costs. In the case of a single processor, it was demonstrated by Schrage [21] that the globally optimal scheduling strategy is Shortest-Remaining-Processing-Time-first (SRPT). However, that is not a practical policy because the exact processing times of incoming jobs are not usually known in advance. Two ‘blind’ policies (i.e., they do not require advance knowledge of processing times) which favour short jobs are Processor-Sharing (PS) and Least-Attained-Service (LAS, also known as Foreground-Background, FB). Indeed, it was shown by Yashkov [25] that, among all blind policies and processing times distributions with increasing failure rate, LAS minimizes the average number of jobs in the system.

Unfortunately, neither PS nor LAS are implementable in their pure form because they require more than one job to be served in parallel, which implies excessive levels of context switching. Moreover, LAS requires a dynamic number of priority queues to store the jobs that have received the same amount of service.

Our aim is to study speed and capacity modulation in the context of a particular scheduling strategy that gives priority to shorter jobs without needing to keep track of their elapsed times. Two queues are employed, referred to as the foreground queue and the background queue. Jobs consist of one or two sequential phases, the first of which is executed in the foreground queue and the second, at lower priority, in the background queue. In this setting, instead of keeping track of the attained service of each job, we use the passage from the first to the second phase as the event that triggers the downgrading of the job priority. Clearly, the practical assumption is that the system is able to detect the change of the service phase. In fact, jobs consisting of more than one task are common in many areas of computer science.

An example of an application that would lend itself to such a scheduling policy is one where commercial transactions access a database. All transacrions start by executing a task involving ‘read’ queries. In many cases that is all they do before terminating. Alternatively, the read phase may be followed by an ‘update’ phase which is more complex and takes longer to execute. Our scheduling strategy would place the read tasks in the foreground queue and the update ones in the background queue. Moreover, depending on whether we are dealing with a single speed-modulated processor, or a capacity-modulated multiprocessor system the service offered would depend on the total number of jobs present.

To the best of our knowledge, such models have not been analysed before. Under appropriate assumptions, we obtain exact solutions for the two-dimensional Markov processes that describe the steady-state behaviour of the two queues. The single and multiprocessor models present distinct challenges in the analysis. In the case of a single processor, the background queue can be served only when the foreground queue is empty, whereas in a multiprocessor system, some processors could be serving the background queue while others are serving the foreground queue.

The solutions obtained enable us to evaluate and minimize a cost function which takes both holding and energy consumption costs into account. Among the numerical experiments that are carried out is one comparing the performance of the two-phase FB policy with that of FCFS and the pure LAS policy. The result shows an apparent (but explainable) violation of the optimality of LAS. Other experiments examine= the gains achieved by speed modulation in a single-server system and by capacity modulation in a multprocessor system. We also explore the possiblity of using the two-phase model in order to approximate three-phase models.

1.1 Related literature

The idea of scheduling jobs according to their attained service has been widely investigated in the literature. That area of study was opened up more than half a century ago with two seminal papers by Schrage [20], and Coffman and Kleinrock [4] (see also [15]). Thresholds on the attained service were used to assign priorities to the waiting jobs, and service was given in quanta. The LAS policy emerged as a limiting case when the number of thresholds tends to infinity and the quantum size tends to 0. A good survey of subsequent developments can be found in Nuyens and Wierman [19].

More recently, a large class of scheduling policies, including LAS, FIFO, SRPT and others, was analysed by Scully et al. [22] in the context of a single server without speed modulation. That class is referred to as SOAP – Schedule Ordered by Age-based Priority. Our policy is not in the SOAP class because it assigns priorities according to phase, rather than age, and the phase changes as the job progresses. In fact, we will show that scheduling based on the phase of service can lead to a lower expected response time than that provided by LAS.

Speed scaling policies applied to several scheduling disciplines have also been widely studied, adopting different approaches to the trade-offs between energy saving and performance. For example, Yao et al. [26] analyse systems in which jobs have deadlines and dynamic speed scaling is used. Bansal et al. [3], examine the problem of minimizing the expected response time given a fixed energy budget.

An M/M/1 queue with occupancy-dependent server speed was analysed by George and Harrison [10], with a view to minimizing average service costs (which may be interpreted as energy consumption costs). Performance was not included in the optimization. That study was later generalized by Wierman et al. [24] to M/G/1/PS queues.

Those models do not allow the scheduling discipline to depend on job sizes. Such a dependence was included in Marin et al. [17], where speed scaling was modelled in the context of a variant of the SRPT policy. In all these papers, the speed scaling strategy is energy-proportional, i.e. the power consumed by the processor depends on the speed at which it operates, and that in turn is determined by the number of jobs in the queue. That approach is commonly adopted in the literature (see Andrew and Wierman [1] and Bansal et al. [2]). It is nearly optimal.

Elahi et al. [6] have studied a threshold-based policy with a restricted form of speed modulation. Jobs that have received more than a certain amount of service are assigned a lower priority and speed modulation is applied to them only. Another example of server control involving the LAS policy is described by Lassila and Aalto [16]. In that work, LAS is combined with server sleeping states. The conclusion is that such an approach does not minimize either the linear combination of the expected response time and the energy consumption, or their product.

The literature concerning scheduling policies in systems with multiple servers is not as large, but still quite extensive. Most of the tractable models in that area involve jobs of different types arriving in independent Poisson streams. Harchol-Balter et al. [9] used phase-type distributions to approximate various busy periods and recursively reduce the dimensionality of the model to one. The resulting QBD process is solved by matrix-analytic methods. The M/M/n model with two preemptive priority queues was studied by Mitrani and King [18], by Gail at al.[8] and by Kao and Narayanan [12]. The case of non-preemptive priorities was examined by Gail at al.[7] and by Kao and Wilson [11]. Kella and Yechiali [14] considered the special case of several priority queues with identical average service times for all job types.

The optimal scheduling policy for a heavily loaded M/GI/n queue was established by Scully et al. [23]. It turns out to be a version of the Gittins index policy. However, determining performance measures for that policy is intractable.

We have not encountered in the literature an example exhibiting the features present in our model: multiple servers, non-exponential service times, two priority queues and capacity modulation.

1.2 Structure of the paper

In Section 2, we present the single-server model and in Section 3 we show its exact solution. Section 4 discusses some special cases for which the solution can be obtained in closed form. The multiserver model is described and slved in Section 5 and in the Appendix. Numerical experiments and comparison with other disciplinesare discussed in Section 6. Section 7 concludes the paper.

2 The single-server model

Jobs arrive in a Poisson stream at rate

. Their lengths (measured in number of instructions) are i.i.d. random variables with a two-phase Coxian distribution (see

[5]). Phase 1 is distributed exponentially with mean

. Phase 2 follows with probability

, and its length is distributed exponentially with mean . That distribution is illustrated in Figure 1.

Figure 1: Graphical representation of the 2-phase Coxian distribution.

On arrival, jobs join the foreground queue, where they execute phase 1. Upon completion, a job departs with probability , and joins the background queue with probability

, in order to execute phase 2. However, the background queue is served only when the foreground one is empty. If a new job arrives during a background service, the latter is interrupted and the new job starts phase 1 in the foreground queue. This is a version of the Least Attained Service (LAS) policy where context is switched only at moments of arrival or phase completion. The queues and the flow of jobs are illustrated in Figure

2.

Figure 2: Graphical representation of the Coxian service time distribution.

The processor speed (measured in instructions per second) can be controlled and depends on the total number of jobs in the system. There are possible speed l evels, . The level is when the total number of jobs in the system is , for , and it is when that number is greater than or equal to . In other words, if the total number of jobs in the system is , the service rates in the foreground and background queues are and , respectively. For all , those rates are and , respectively.

The system state is described by the pair of integers , where is the number of jobs in the foreground queue and is the number of jobs in the background queue. Let be the steady-state probabilities of those states:

(1)

One might guess that the system is stable and steady-state exists when the processor, working at the highest speed level, can cope with the offered load:

(2)

This is indeed the case and will be established analytically.

The steady-state probabilities satisfy the following set of balance equations.

Case 1. (processor serves the foreground queue at maximum speed):

(3)

where is 1 when the Boolean is true, 0 when false.

Case 2. (processor serves the foreground queue at lower speed):

(4)

Case 3. (processor serves the background queue at maximum speed):

(5)

Case 4. (processor serves the background queue at lower speed):

(6)

Case 5. (processor is idle):

(7)

From the joint distribution

, it is possible to determine the marginal probabilities, , that there is a total of jobs in the system (), and also the corresponding average number, . When the processor runs at speed , it consumes energy at a rate proportional to , where is a constant which depends on the design of the processor; its value is usually between 1 and 3 (e.g., see [24]). To examine the trade-offs between holding costs and energy costs, we define a cost function, , which is a linear combination of the two:

(8)

where and are coefficients reflecting the relative importance given to holding jobs in the system and energy consumption, respectively.

The objective would be to choose the number and values of the speed levels , so as to minimize the cost function .

3 Exact solution

We start by concentrating on the system states where , i.e. where the phase 1 and phase 2 completion rates are and , respectively. In order to determine the probabilities corresponding to those states, it will be helpful to introduce the following generating functions.

(9)

and

(10)

The steady-state balance equations can now be transformed into relations between these functions, plus some of the unknown probabilities.

Consider first the case . Multiply equation (5) by and sum over all . This leads to

(11)

This can be rewritten as

(12)

Similarly, for , multiply (3) by and sum over all . The resulting relation is

(13)

When , the last term in the right-hand side disappears:

(14)

For all , the corresponding relations do not involve boundary probabilities and the sum ranges from 0 to .

(15)

The next step is to combine all one-dimensional generating functions into a single two-dimensional function. For this purpose, we define

(16)

Multiply the equation for by and sum over all . In order to facilitate the manipulations, add to both sides of (12). The equation that emerges is

(17)

After multiplying both sides by and rearranging terms, this can be rewritten as follows.

(18)

where

(19)
(20)

and

(21)

Thus the two-dimensional generating function is expressed in terms of the one-dimensional function and the probabilities that appear in (21). In order to determine , note that for every value of in the interval (0,1), the coefficient satisfies , and . The polynomial is quadratic in , hence for every such it has exactly two real zeros, and , in the intervals (0,1) and , respectively. After dividing the numerator and denominator in the expression for by , and rearranging terms, that expression can be written as

(22)

The minus sign in front of the square root is taken because is the smaller of the two zeros.

We shall also need the first and higher order derivatives of . These are given by

(23)

and

(24)

In particular, we have

(25)

Since the generating function is finite when and , the right-hand side of (18) must vanish when . This provides an expression for in terms of the probabilities in (21).

(26)

In fact, that expression can be simplified a little. From it follows that

(27)

Substituting this into (20) yields

(28)

This allows us to cancel a factor of from both sides of (26), reducing that expression to

(29)

We are now left with unknown probabilities, including the ones involved in (21) and those not included in the definitions of the generating functions. These unknowns are , for . The balance equations (4), (6) and (7) provide relations among them. Another equations are obtained as follows. Divide both sides of (29) by and let . Remembering that the first term in the expansion of is , we conclude that when ,

(30)

For that limit to hold, the first terms in the Maclaurin series expansion of the numerator in (30) must be 0. In other words, the numerator itself, together with its first derivatives, must vanish at . This provides additional equations for the unknown probabilities. The derivatives of at are given by (23) and (24).

The final equation that we need is provided by the normalizing condition

(31)

To find an expression for , set in (17). We have

(32)
(33)

and

(34)

Since both and are 0 at , so is . In other words,

(35)

The sum in the left-hand side of this equation is the probability, , that the total number of jobs present is . The equation expresses the balance of flow between states with or fewer jobs in the system, and those with or more.

Using (35), expression (34) can be rewritten as

(36)

Substituting (32), (33) and (36) into (18) we obtain

(37)

This could also have been derived directly, as the generating function of the probabilities , , .

To complete the expression of in terms of , we need an expression for . This is provided by (29), after an application of L’Hospital’s rule. The numerator and denominator in the right-hand side of (29) are both 0 at , which means that derivatives must be taken at that point.

Having computed the unknown probabilities, one can evaluate the cost function (8). The average total number of jobs in the system, , is given by

(38)

The derivative of at requires . The latter is again given by (29), after a double application of L’Hospital’s rule.

4 Special cases

The simplest non-trivial special case for this model is . The processor has two possible speeds: when idle and when there is at least one job present. The service rates in the foreground and background queues are and respectively, whenever those queues are served. The only state probability that is not included in the definitions of , (9) and (10), is . Hence, is the full generating function of the joint distribution of the two queues. Also, is the generating function of the state probabilities where the foreground queue is empty.

In this special case, it is intuitively clear that the foreground queue behaves like an M/M/1 queue with offered load . To prove that, set in (18), for :

(39)

Using the balance equation (7) and cancelling the factor , (39) can be rewritten as

(40)

or

(41)

Adding to both sides produces

(42)

This is the generating function for the geometric distribution of an M/M/1 queue. The normalizing condition yields

(43)

As expected, the average number of jobs in the foreground queue, , is

(44)

Clearly, the condition is necessary for the stability of the foreground queue. However, it is not sufficient for the ergodicity of the model because the background queue may still be saturated.

Expression (29) for , together with (7), now has the form

(45)

It is convenient to rewrite this in terms of the function :

(46)

where . It is not difficult to see that satisfies

(47)

The values of and are given by (25).

Substituting (47) and (25) into (46) gives

(48)

which, together with the normalization (43) determines and :

(49)

This result shows that a normalizable positive solution for the probabilities exists when . That inequality, which coincides with (2), is therefore the ergodicity condition for the model.

In order to determine the average total number of jobs in the system, , one could differentiate (37) at . Alternatively, since we already know , it is simpler to find the average number of jobs in the background queue, . Setting in (18), using the balance equation (7) and cancelling the factor , we obtain

(50)

The average is the derivative of the right-hand side at . To find , differentiate (46) and use (47). This yields

(51)

and

(52)

Thus the complete solution of the model is obtained in closed form. In this case, the idling speed of the processor only affects the energy costs, not the holding ones. Therefore, in order to minimize the cost function (8), should be set as low as possible.

Another related special case worth examining is the one where is arbitrary, but for . That is, the processor works only when there are or more jobs present. Once the background queue reaches size , it cannot drop down below that level because it can be served only when the foreground queue is empty and then the processor stops working. Hence, all states , such that , are transient and their long-term probabilities are zero.

The main interest of this model is that it defines the stability region for a given top speed . Indeed, if the system is stable when the lower service rates are zero (in the sense that the recurrent states have normalizable probabilities), then it would be stable when those service rates are greater than zero.

The new generating functions are almost identical to the ones for the case . The role of is now played by . Since for , the balance equation for is similar to (7):

(53)

Equations (40) - (43) are valid, with replaced by . The foreground queue again behaves like an M/M/1 queue with offered load , and its average size, , is given by (44).

The expression for the generating function is similar to (46), except that the factor is now :

(54)

The result (49) becomes

(55)

We conclude that the stability condition (2) holds in any system where the arrival rate is , the probability of phase 2 is and the top service rates are and .

Differentiating (50) at yields the average size of the background queue:

(56)

Intuitively, this model minimizes the energy costs and maximizes the holding costs, given the top speed of the processor.

5 Multiple servers

Instead of a single speed-modulated server, we now consider a system with identical parallel servers. We start by assuming that is fixed, but later we shall allow that number to be controlled dynamically, for purposes of energy conservation. In all other respects, the model is the same as before. The crucial new feature of the resulting two-dimensional Markov process is that, if the number of jobs in the foreground queue is less than , some servers may be serving the background queue while others are serving the foreground one. This changes the analysis substantially, although the general apprach based on generating functions remains the same.

The offered load in the foreground queue is ; in the background queue it is . The whole system is stable and steady-state exists when the total offered load is lower than the number of available servers:

(57)

This intuitive condition can also be established analytically.

Denote again by the steady-state probability that there are jobs in the foreground queue and jobs in the background queue. There is no difficulty in computing the marginal distribution and the average number of jobs in the foreground queue, , since that queue behaves like an queue with parameters and . However, in order to find the average number in the background queue, , it is necessary to determine the joint distribution, , of the two queues.

We shall write separately the balance equations for , when servers are allocated to the foreground queue and servers are available for the background queue, and for , when the service rates at the foreground and background queues are and 0, respectively.

(58)
(59)

In both cases, any probability with a negative index is 0 by definition.

To solve these equations, introduce the generating functions

(60)

Focusing first on the case where the background queue service rate is 0, we multiply the th equation in (59) by and sum over all . This leads to relations similar to (15):

(61)

These can be combined into a single equation for the two-dimensional generating function, , defined as

(62)

Multiplying (61) by and summing over all expresses in terms of and :

(63)

where

Now note that for every value of in the interval (0,1), the coefficient , which is quadratic in , is negative at , positive at and negative at . Therefore, has exactly two real zeros, and , such that . Since is finite on the closed interval , the right-hand side of (63) must vanish at . This implies a relation between and :

(64)

Expressions (63) and (64) can be simplified further by using the properties of the quadratic:

(65)

and

(66)

Substituting (66) into (64) yields

(67)

Then, substituting (65), (64) and (67) into (63), the latter becomes

(68)

The generating function equations for are more complicated because they include terms involving , and depend on probabilities for . After some manipulations we obtain

(69)

where by definition and

The term which appears in the right-hand side of (69) for can be replaced by , according to (66) and (67). Then (69) become a set of simultaneous linear equations for the generating functions , , ,

. We shall write these in matrix and vector form as follows.

(70)

where is the column vector , is the column vector , and is a tri-diagonal matrix

The diagonal elements of are

and

while the upper diagonal elements are

The solution of (70) is given by

(71)

where is the determinant of and is the determinant of the matrix obtained from by replacing its ’st column with the column vector (the columns are numbered 1 to , rather than 0 to ).

Thus all generating functions are determined in terms of the unknown probabilities that appear in the elements of : , for and . The balance equations (58), for and , offer relations between them. That is fewer than the number of unknowns.

Another equation comes from the normalizing condition, requiring that the sum of , for all and , is 1. This can be expressed as

(72)

The remaining equations needed in order to determine the unknown probabilities are provided by the following result.

Lemma. When the stability condition (57) holds, the determinant has exactly distinct real zeros, , , , , in the open interval (0,1).

The proof of this Lemma is in the Appendix.

Consider one of the generating functions, say . Since it is finite on the interval (0,1), the numerator in the right-hand side of (71), , must vanish at each of the points , , , , yielding equations. Using a different generating function would not provide new information.

One way of converting the normalizing equation (72) into an explicit relation for the unknown probabilities is by invoking the known distribution of the queue. The marginal probability that the foreground queue is empty is given by

(73)

Both and