Maximizing Service Reward for Queues with Deadlines

by   Li-on Raviv, et al.

In this paper we consider a real time queuing system with rewards and deadlines. We assume that packet processing time is known upon arrival, as is the case in communication networks. This assumption allows us to demonstrate that the well known Earliest-Deadline-First policy performance can be improved. We then propose a scheduling policy that provides excellent results for packets with rewards and deadlines. We prove that the policy is optimal under deterministic service time and binomial reward distribution. In the more general case we prove that the policy processes the maximal number of packets while collecting rewards higher than the expected reward. We present simulation results that show its high performance in more generic cases compared to the most commonly used scheduling policies.



page 1

page 2

page 3

page 4


Improving rewards in overloaded real-time systems

Competitive analysis of online algorithms has commonly been applied to u...

Markov Rewards Processes with Impulse Rewards and Absorbing States

We study the expected accumulated reward for a discrete-time Markov rewa...

Randomized Scheduling of Real-Time Traffic in Wireless Networks Over Fading Channels

Despite the rich literature on scheduling algorithms for wireless networ...

On the Power of Randomization for Scheduling Real-Time Traffic in Wireless Networks

In this paper, we consider the problem of scheduling real-time traffic i...

Safe Optimal Design with Applications in Policy Learning

Motivated by practical needs in online experimentation and off-policy le...

Comparison of Loss ratios of different scheduling algorithms

It is well known that in a firm real time system with a renewal arrival ...

Optimal Control of Distributed Computing Networks with Mixed-Cast Traffic Flows

Distributed computing networks, tasked with both packet transmission and...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

The growing usage of the Internet as an infrastructure for cloud services and the Internet of Things (IoT) has resulted in an extremely high demand for traffic bandwidth. The growth of the infrastructure has not kept pace with demand. This gap creates queues in the downstream traffic flow, which impacts service quality in terms of latency and packet loss. The Internet provides service for various types of applications that differ in their priority and latency sensitivity. In this paper, we discuss ways to schedule packets based on their sensitivity to latency (deadline) and priority (reward). We present an a novel scheduling policy that uses the information about the packets upon arrival. We show that this scheduling policy outperforms other policies which process maximal number of jobs.

Barrer in [1] and [2] pointed out that customers in a queue often have practical restrictions; namely, the time they are willing to wait until they get service. This problem is known as the impatient customer phenomenon when it refers to people.

A similar problem exists when handling time deadlines related to job processing or packet data transmission. The problem is known as real time system [3, 4]. The term real-time system is divided into soft real time and hard real time. In the soft real time there is a non-completion penalty while in hard real time there is no reward on non-completion. If the penalty is higher than the reward a soft real time system behaves similar to hard real time system scheduling [5]. Brucker in [6] describes many models of queuing systems and cost functions which use weights and due dates. While the weights are rewards and the due dates are deadlines as a step function the environment is limited integer values only. Basic overview of queuing theory to delay analysis in data networks is described e.g. in Bertsekas and Gallager [7] and in [8]. These delay models focus mainly on the changing environment of the arrival process, service time, and the number of servers.

In the communication world, many Internet applications vary in their sensitivity to the late arrival of packets. The application are divided into two types; services with guaranteed delivery (TCP based applications) and non guaranteed delivery (UDP, RTP, ICMP based applications). In both cases there is an expiration time. VoIP, video, and other real-time signaling are highly sensitive to packet delays. These RTP based applications transfer data without acknowledgment (non-guaranteed delivery). In IoT environment sensors information needs to arrive to the processing units on time in order to ensure correct decisions. In these cases the timeout is short and late packets are ignored. Guaranteed delivery applications like File Transfer Protocol (FTP) do not suffer as much from packet delay since their deadline is relatively long. Packets that arrive after the long deadline are considered to be lost, and lost packets are considered to be late [9]. In addition, in guaranteed based protocols, the round trip of payload delivery and acknowledgment is important to avoid re-transmission. The resulting re-transmissions overload the network. Detailed analysis of the Internet applications and latency problem is described in [10]. Therefore, to maintain high-efficiency networks, late packets must be dropped as soon as it is known that they will miss the deadline.

There are many models to the impatient customer phenomenon in the literature. One common approach is to view impatient customers as an additional restriction on customer behavior, as was presented in Stankovic et al [11]

. Human behavior assumes no knowledge about the patience of a specific customer. When assessing balking behavior, it is assumed that a customer decides whether to enter the queue by estimating waiting time (balking). In

[12] the customer leaves the queue after waiting (reneging). There are models in which the actual deadline is known upon arrival [13, 14, 15, 16]. Other models assume only statistics regarding the abandonment rate [12, 17, 18, 19, 5]. In some cases the abandonment included also cases of server failure which forces synchronized abandonment as described in [20, 21, 22]. The analysis of impatient customer behavior can be measured in two ways. One way measures the Impatience upon End of Service (IES) as was presented in [23]. In this approach a client must end its service before its deadline expires in order to be considered “on time”. The second measures the Impatience to the Beginning of Service (IBS) as is discussed in [24]. If the service time is deterministic or exponential, the results of IES are also applicable to the IBS system without changing the expiration distribution behavior. Another type of prior information provided to the scheduler is the service time. As in the deadline case there might be statistical knowledge about the service time [17, 18, 19, 5] in other cases there is no information about the service time [16]. An explicit information about a service times may be provided either upon arrival, prior to receiving service or even not known at the beginning of the service. In the case of goods and data packets, this information is already known at the time of arrival in the queue, and can be beneficial.

The EDF scheduling is the most popular policy for scheduling real-time systems [4]. The policy was proved to be optimal in many scenarios using different metrics. It was shown that if a queue is feasible then EDF is optimal by Jackson’s rule [4], meaning that in the level of queue ordering it is optimal. In the general case of non feasible queues Panwar, Towsley and Wolf [16] showed that if unforced idle times are not allowed, EDF is optimal for the non-preemptive M/D/1 queue when the service time is exactly one time unit, EDF is shown to be optimal assuming service times are not known at the beginning of the service. In [15] they expanded the proof to multiple queues when only partial information was provided on the deadlines. In [9] Cohen and Katzir show that EDF without preemption is an optimal scheduling algorithm in the sense of packet loss in the VoIP environment assuming fixed packet size. Other applications of the EDF were discussed in [25], comparing the EDF and FCFS (First Come First Serve) in networks with both QoS and non-QoS requirements. In that paper different rewards are added to a successful packet delivery. This mechanism is used to differentiate between types of packets, and allows the scheduler to assign different service levels. The reward mechanism has a financial side which can involve setting Internet Service Provider (ISP) fees for transferring Internet traffic according to different Service Level Agreements (SLAs). In the late 1990’s, two standards were implemented to support QoS over the Internet. The IETF proposed two main architectures, IntServ [26] and DifServ [27] working groups.

A related problem is the multi-class scheduling which assumes mainly statistical information about the jobs. The problem consists of a single or multiple servers that can process different classes of customers arranged in different queues (each class has a queue). The queues are ordered according to FCFS. It is assumed that there is a cost () associated with each queue that is proportional to the waiting time of the customers and a service rate . The target is to minimize the cost. It is assumed that the service time is unknown to the scheduler. Atar, Giat and Shimkin expanded the model to include abandonment [18]. They introduced the -rule, which gives service to the class k with highest where

is the abandonment rate. Customer patience is modeled by an exponentially distributed random variable with mean

. The -rule minimizes asymptotically (in the fluid limit sense) the time-average holding cost in the overload case. In [19]

it was shown that the problem can be formulated as a discrete time Markov Decision Process (MDP). Then, the framework of multi-arm restless bandit problem is used to solve the problem. An optimal index rule for 1 or 2 users was obtained for users with abandonment. The queue selection problem is solved in

[28] by approximating it as Brownian control problem under the assumption that the abandonment distribution for each customer class has an increasing failure rate.

Scheduling policies such as FCFS, Static Priority (SP), and EDF focus on the scheduling mechanism, which only defines the job service order. A different approach is to define which jobs do not receive service; these policies are known as dropping policies. The best results are achieved by combining scheduling and dropping approaches. The Shortest Time to Extinction (STE) policy is an algorithm that orders the queue while implementing a dropping mechanism for jobs that definitely cannot meet the deadline. The studies reported [16] and [13] discuss the idea of implementing both scheduling and a simple dropping mechanism which drops jobs that are eligible for service if their deadline expires before receiving service. In this paper we refer to EDF as . Clare and Sastry [29]

presented a value-based approach and reviewed several policies and heuristics that use both scheduling and dropping mechanisms.

In order to evaluate the performance of these policies, a quantitative measure or a metric that reflects network performance needs to be defined. There are two approaches to defining a quantitative measure. The first approach is based on measuring the rewards by a utility function whereas the second approach is based on the cost function. The reward function measures the sum of the rewards of the jobs that received service. The cost function measures the sum of the rewards of the jobs that did not receive service. Peha and Tobagi [14] introduced cost-based-scheduling (CBS) and cost based dropping (CBD) algorithms and their performance.

In general traffic models are divided into light traffic, moderate traffic and heavy traffic. This classification is derived from the ratio of the arrival process and the required service time. Examples for heavy traffic analysis can be found in [30], [31] and [32]. In general heavy traffic assumes which forces the queue to explode. In this paper the queue size is stable even if since it is bounded.

In this paper we propose a novel queuing policy that provides excellent results and prove it is optimal in certain cases.

The queuing model that is presented in the paper assumes:

  • Jobs parameters are: arrival and service times, deadlines and rewards.

  • All realizations are known upon arrival. This approach is less suitable for human behavior, but reflects the behavior of goods for delivery and packets over networks.

  • The server is idle infinitely often. This assumption is used in many proofs that focus on the interval between server idle times.

  • The scheduling policy is non-preemptive and forcing idle time is not allowed

We prove that this policy accumulate more rewards than previously proposed policies such as EDF and CBS, under deterministic service time. We proved that in the case of dual reward and deterministic time the policy is optimal. The same proof also shows that the policy is optimal in the case of dual service times and constant reward. The proposed approach significantly improves the performance of routers due to its low computational demands.

The structure of the paper is as follows: In section II we present the queuing model, its assumptions and its notation. In section III we describe the EDF policy and policy. We also provide a simple example of how modifying the EDF policy improves the performance as the processing time is provided upon arrival. In section IV we introduce the Maximum Utility with Dropping scheduling policy and analyze its performance. In section V we analyze the performance of the proposed policy.

In section VI we present the numerical results and conclude the paper in section VII.

Ii Queuing with rewards and deadlines

In this section we describe the mathematical model of queues with deadlines and rewards. We setup notation and describe the various assumptions required for our analysis.

A queuing system is composed of jobs, scheduling policy queues and servers. The jobs entering the system are described by a renewal process with different attributes. The attributes may be assigned upon arrival or later on. The scheduling policy is responsible for allocating the servers to the jobs and choosing the job to be processed out of the queue when the server is idle. In the next few sections we define the stochastic model and the system events.

In this system events occur in continuous time following a discrete stochastic process. The event timing allows for a discrete time presentation. The renewal process is defined as follows:

  • Let be a timely ordered set of job arrival events.

  • Let be the time of .

  • Let be the initialization event, .

  • Let be the first arrival to the system.

  • .

  • Let be the job that arrived at time .

We use the extended Kendall [33] notation to characterize the input process and the service mechanism.

The random processes has different distribution types as for a General i.i.d distribution with strictly positive values, for a Markovian process and for a deterministic distribution.

is used for binomial i.i.d distribution of two different values, (the first value occurs in probability

and second value occurs in probability ).

We define the attributes of job as follows:

  • Let be the inter arrival time of the renewal-reward process , .

  • Let be the processing time required by .

  • Let be the deadline of . The time is measured from the arrival time to the beginning of the service (IBS).

  • Let be the reward of processing .

If deadline is large enough it becomes redundant. In stable queues the bound can be the maximal queue size times the average service time according to Little’s law. We use the following assumptions as part of the queuing model:

  1. , and are known upon arrival of .

A is fundamental assumption in this paper and the proposed policy operation depends on it. In many application assumption A is reasonable, for example in routers, where the service rate is known as well as packet size. To analyze the performance of the proposed method we need further assumptions: B-H. However these are only required to simplify the analysis.

When is deterministic we use as the service time.

The server is either in busy state when it is servicing a customer, processing a job or transmitting a packet. If the server is not busy it is in the idle state. In this article we use the job processing terminology. We assume that the server is always available (not in fault position).

The service mechanism and the scheduling policy follow the assumptions below:

  1. A single server in the model .

  2. The server processes a single job at a time.

  3. Non-preemptive policy - Processing must not be interrupted.

  4. Forced idle times are not allowed - The server processes jobs as long as there are eligible jobs in the queue without delays.

  5. The policy chooses the next job to be processed from the queue.

  6. The policy is deterministic. For the same jobs in the queue, the service order is always the same.

This article focuses on a single server problem as discussed in [34] with a renewal reward process and a general distributed service time models with boundedness assumptions.

The events that occur in the model are:

  1. Job arrival ().

  2. Job processing begun.

  3. Job processing completed.

  4. Job dropping - a policy decision not to process a job forever. At this point we assume it leaves the queue.

  5. Job deadline expired - In this event the policy must drop the job.

Job arrival (a) and deadline expiration (e) are events that are independent of policy decisions. Beginning the processing of a job (b) and job dropping (d) depend on policy decisions. Job processing completion (c) is derived from the initial processing time and the total processing time (which, by our assumptions, is known before the beginning of the processing). A job deadline expiration (e) has no immediate effect on the system. If a job deadline has expired (e) the policy must drop the job at the some point (d). In our model we allow job drop (d) to be only upon arrival (a) or at job processing begun (b) events. Policy can decide to drop a job before its deadline has expired. Assuming forcing idle time is forbidden (D), job processing completed takes place either when the queue is empty or at job processing begun (b). The events in the system are defined as follows:

  • Let be a timely ordered set of all events that occur according to the renewal reward process and policy .

  • and .

  • .

  • Let be the time of .

  • .

  • such that and .

Definition 1.

The following parameters enable the analysis of the service at time .

  • Let be the set of waiting jobs in the queue at time .

  • Let be the queue potential at time . Queue potential is an ordered set of jobs according to that will get service if no new job goes into the queue. By definition the queue potential is feasible.

  • Let be the set of processed jobs by policy up to time .

  • Let be the set of potential processed jobs.

  • Let be the difference in number of jobs in the queue potential between two consecutive system events where and .

If and are finite the queue length is bounded by:


The General Queue Events of the system are:

  1. A job reaches to the system ().

    1. If the server is idle and the queue is empty it gets service immediately and enters the set of processed jobs ().

    2. If the server is busy (), it enters the queue. At this point it is possible to compute the queue potential (). If the new job is added to the queue potential, other jobs may be removed from the queue potential.

  2. The server has completed processing a job ().

    1. If the queue is not empty () then the job that is at the head of the queue is processed and moves from the queue and queue potential to the set of processed jobs.

    2. If the queue is empty (), there is no change in the set of system parameters.

Fig. 1: System job processing flow

We define a job as a tuple .

  • and are realizations of and .

Definition 2.

The cumulative reward function for time and policy is:


Let and then, the reward difference function is:


The objective is to find a policy that maximizes the cumulative reward function. If the rewards are deterministic and the cumulative reward function measures the number of jobs that got service. If the rewards are the cumulative reward function measures the processing time utilization.

In Appendix -A queue behavior analysis is provided. As a result of this analysis, it is enough to examine the size of the queue potential and deduce the relationship between the corresponding size of . Note that jobs that currently do not meet their deadline are disregarded here since they are not part of the queue potential.

Iii Scheduling Policies

In this section we describe several relevant scheduling policies which treat queues with priorities and deadlines.

Iii-a Earliest Deadline First

We define the EDF as follows: Let be the current time and assume that the queue is not empty and the server is ready to process a job. Earliest Deadline First policy

  1. .

  2. If then drop

  3. else provide service to

  4. Return to state 1

The EDF was proved to be optimal under different metrics [16, 13, 9, 4]. Because of this optimality the EDF became the standard policy used in queuing models with deadlines.

Panwar, Towsley, and Wolf distinguish between service times being assigned (A-i) at arrivals or (A-ii) at beginning of services. Independently of when the assignment occurs, they assume that (A-iii) service times are not known to the scheduler at the beginning of the service. Then, they show that, under (A-iii), the two assignments (A-i) and (A-ii) are equivalent. In this paper, in contrast, we consider the case where (A-iii) does not hold (see assumption A). Then, the distinction between (A-i) and (A-ii) is important. To illustrate that knowing the service times upon arrivals can lead can lead to improved results compared to EDF we consider the following variation on EDF which we term MEDF.

Assume that there are two jobs in the queue. has its expiration time at time 10 seconds and a service time of 30 seconds. has its expiration time at 20 seconds and a service time of 5 seconds. According to the EDF policy is processed before . In this case is expired and is lost. If we change the order of and can be executed. Figure 2 depicts this case.

Fig. 2: Example of queue order

The MEDF policy is a simple policy based on EDF but uses knowledge of the job reward and service time. It replaces the order of the first two jobs in the queue if this is beneficial. We introduce it here in order to present the advantages of using information on the service time.

Assume the current time is . The queue has at least two jobs waiting and just finished processing a job and needs to choose a new job to process.


  1. If then drop

  2. else if then provide service to

  3. else provide service to

  4. Return to state 1

Iii-B policy

Another policy which is relevant as a benchmark for testing our algorithm is the scheduling policy [18]. The policy assumes that are queues in the system where it needs to select the queue to be served. We assume that there are levels of rewards (or group of rewards) each against a queue. The scheduling algorithm is composed of two steps:

  1. Upon arrival the policy inserts the new job to queue according to its reward in FCFS order.

  2. If the server becomes idle the policy chooses the queue to be served according to and process the job that is in the head of the queue.


Assume new arrival of job .

  1. Insert the at the end of queue .


  1. If then drop

  2. else provide service to

  3. Return to state 1

The was proven to be optimal in [35] assuming geometric service times and a preemptive scheduler. In [36] Bispo claimed that for convex costs, the optimal policy depends on the individual loads. He proposed first-order differences of the single stage cost function that reaches near optimal performances. In the non-preemptive case, the was proved to be optimal for the two-user case in [37]. The proof was done using calculus of variations arguments as well as an Infinitesimal Perturbation Analysis (IPA) approach. Down, Koole and Lewis [38] provided an optimal server assignment policy, under reward, linear holding costs, and a class-dependent penalty which reduces the reward for each customer that abandons the queue.

In our model we assume that the service times, deadlines and rewards are know upon arrival. The EDF assumes that there is no information about the service times, while in the deadline is known statistically in terms of the probability of abandonment. This information has advantages in the case of a non-deterministic service time. We present a simple example showing the advantages of using knowledge of the service time upon arrival in the EDF case and present new version of that exploits the knowledge of deadlines information. In the case of the policy, knowing deadlines upon arrival allows us to modify the queue order to use EDF instead of FCFS as was proposed in [14]. We mark this new version as . However, as we will show later our proposed technique outperforms this variation as well.

Iv Maximum Utility with Dropping scheduling policy

We next present the Maximum Utility with Dropping (MUD) scheduling policy. The queuing model assumes A-F. We use the notation to present the MUD parameters and to present the EDF parameters.

The MUD policy combines both scheduling and dropping mechanisms. The scheduling algorithm is composed of two steps:

  1. - Upon arrival the policy inserts the new job to the queue keeping the EDF order. If the insertion will cause a job to miss its deadline, the policy drops the packet with the minimal throughput ratio () from the queue.

  2. - Upon terminating the service of a job the policy processes one of the two jobs in the head of the queue. The selected job is the one with the highest throughput ratio as long as it does not cause the second job to miss its deadline similarly to MEDF.

Let and be functions that characterize the queue potential order of with policy MUD at time .

  • is the index of job in . means that is at the head of the queue.

  • is the time a job waits until it is processed assuming that no new jobs arrive until it starts processing.

Below we describe how the MUD policy handles a job arrival. Let be the new job which reaches to the queue at time .

  1. Wait for arrival of a new job ()

  2. If and the server is idle then process and go to statement 1

  3. Add to the queue according to shortest time to expiry order. If there are already jobs with the same expiration time, order them in a descending order of their rewards

  4. Find the first job () which misses its deadline due to the insertion of into the queue

  5. If then go to statement 1

  6. Find the job () with the minimum reward per service time. If there are several pick the one with the shortest time to expiry.

  7. Drop job from the queue

  8. Go to statement 1

Note that and values change after adding a new job at time as follows if then and else there is no change. Different presentation of the policy is that it marks the jobs to drop at the arrival time and postpone the dropping stage to the time to the job reaches the head of the queue.

  1. Wait until and the server is idle. Assume the time is .

  2. If then drop

  3. else if then process

  4. else process

  5. Return to state 1

Next we analyze the complexity of adding a new job to a queue in the MUD policy. The MUD policy uses its dropping policy to maintain the queue in such a way that . According to formula 1 we can bound the queue size. Adding a job to the queue and dropping jobs from the queue require one sequential pass over the queue. Since the ratio is calculated upon arrival the policy only needs to compare and calculate the . From the above we can conclude that the complexity of adding a job to the queue is bounded by the size of the queue potential as defined in 1.

In the FIFO policy the complexity of adding a new job is O(1). In the EDF policy the complexity of adding a new job is and in the CBS policy the complexity of adding a new job is with a calculation of the exponent and multiplication as described in [14]. The complexity of adding a new job to the queue in the MUD is larger than EDF but more efficient compared to the other policies. Using more complicated data structure the complexity can be reduced.

V MUD performance analysis

The following section presents the MUD performance analysis. This section analyzes the MUD scheduling policy performance in the case. Then it proves the policy optimality in the case.

V-a Deterministic Time Case

The analysis of the policies performance assumes that the queue becomes empty infinitely often. The assumption below forces this behavior.

  1. .

Assumption B implies that there are infinitely many times where the queue is empty irrespective of the scheduling policy, i.e, for any two policies and there are times () which satisfy that both queues are empty; i.e. .

Definition 3.

If then policies and are equivalent.

Definition 4.

If then is as good as

Definition 5.

If then is better than

Definition 6.

If for any policy , is as good as policy then is optimal.

In the following section we analyze MUD performance against the group of policies that transmits the maximal number of packets. We show that MUD policy performance and the EDF policy performance in a is equal. In this model the service times and the rewards are deterministic (one unit) which is equivalent to the queuing model and the cumulative reward function is the number of processed jobs. In [16] it was shown that EDF policy is optimal in the terms of number of jobs to be processed for a queuing model satisfying . We continue the analysis by showing that in a queuing model satisfying , the MUD policy is as good as the EDF policy or similar policies. We conclude with additional assumptions to show that the MUD policy is better than the EDF policy or similar policies.

Lemma 1.

For any queuing model satisfying , .


Since the lemma is related solely to the number of jobs that are processed, the rewards are not used in the proof. The proof is based on the inequality between the two sides.

The first step is to prove that .This inequality is derived from the optimality of EDF in a queuing model satisfying as was proven in [16].

For the other side the proof is achieved by induction on . Let , the induction hypothesis is and the two policies provide service simultaneously (synchronized).

By definition at the queues are empty and the servers are idle . At the first job arrives and is processed immediately by both policies, i.e., they are synchronized. This event is similar to all events when the queues are empty and the server is idle (a). By definition the service time is deterministic; thus the policies are synchronized as long as and have the same size. For the induction step, assume that the processing start time is synchronized and . We need to show that fot , is kept, and the service start time is synchronized in the following cases:

  • Begin processing a job in one or both policies (c)

  • Arrival of a new job when the server is busy (b)

First we analyze the case of c. Let . When the event of process begins . The processing begins simultaneously (induction hypothesis) and since the processing time is deterministic, both policies end processing simultaneously , thus adhering to the induction hypothesis.

Second we analyze the case of b. As stated above b contains two type of events: ba when the new job is added to the queue potential and the queue potential is growing and bb when the new job causes one of the jobs in the queue potential to miss its deadline.

The arrival of a new job does not impact the synchronization of the processing. We analyze events ba and bb in both policies.

  1. Similar behavior of the two policies in the event ba.

  2. Similar behavior of the two policies in the event bb.

  3. , MUD event is ba while the EDF event is bb.

  4. , MUD event is bb while the EDF event is ba.

The first two cases preserve the induction hypothesis as required. The third case contradicts the optimality of the EDF; in other words the event cannot exist. We show that the fourth event cannot exist.

By definition since they have the same feed of jobs and the queues are ordered by expiration time. Let be the job that is dropped by the MUD policy at time . If , then, there are more jobs in then in whose expiry time is less than . This contradicts the optimality of EDF (please note that the order of the queues is the same). If since we can add it to without dropping and have a larger queue since is added at time . This contradicts the optimality of EDF. This means that case 4 cannot exist. The result is that the processing is synchronized and the queues have a similar size. ∎

Lemma 2.

For any queuing model satisfying , the MUD and the EDF policies are equivalent.


Since the reward is deterministic, without loss of generality assume . Therefore, from lemma 7 and lemma 1 we get that thus MUD and EDF are equivalent. ∎

Theorem 1.

For any queuing model satisfying , the MUD policy is optimal in terms of the number of jobs to process where the process time is exactly one unit.


Queuing models satisfying are equivalent to queuing systems satisfying if we measure the number of jobs that were processed and . In the case of a deterministic service time, knowing the service time upon arrival or just before service is equivalent since both are known to be a constant. In [16] it was shown that the EDF policy is optimal in terms of the number of jobs to process for the discrete time queue where process time is exactly one unit. In lemma 2 we showed that the MUD is equivalent to EDF; thus MUD is optimal in terms of the number of jobs to process where the process time is exactly one unit and the reward is also one unit. ∎

Let be the group of policies that processes the maximal number of jobs assuming . as a result of [16] and Lemma 2 . Let be the group of policies that their expected reward is .

In order to prove lemma 3 we add the following assumptions:

  1. , and

    have discrete probability distribution function.

  2. and are independent of each other.

Assumption C is used to simplify the proofs. In addition has finite presentation in the communication environment forcing discrete distribution. and may have a continuous distribution function as well. Assumption D is reasonable since is related to the server’s arrival process. is derived from the original deadline reducing different network delays. is generated independently by the data source.

Lemma 3.

For any queuing model satisfying and assumptions B-D, MUD policy is as good as EDF.


This proof analyzes the behavior of the different types of events. For purposes of this proof let and let the time of event and the time of event be the times of the current event and the prior event. The different cases are:

  1. Arrival of a new job at events a and b where . This contradicts Lemma 1 since one of the becomes longer than the other one.

  2. Arrival of a new job where (ba). In this case a job was added to both queue potentials without reducing the number of jobs in them. The queue potential reward growth is the same in both queues.

  3. Arrival of a new job where (bb). In this case there is a change in the queue potential rewards which can be different between policies.

  4. Processing begins of a job in one or both policies (c) - Process starting moves the job from to keeping .

  5. A job does not meet its deadline (d) or is dropped (e) - By definition this job did not belong to .

We define explicitly and compare the difference between the two policies. For arrival events a, ba and bb let be the arriving job and be the current time. We use or to describe the rewards of jobs and that are removed or dropped from the queue potentials in the different policies.

Let be:


We are interested in the difference between the potential rewards, and . In the events a, ba, c, d and e, as was presented above. We focus on the behavior of events of type bb which impacts the queue potential rewards.

Let be a sequence of times satisfying bb and let . Let be the empirical mean of the sequence of all events of type bb up to time .

We begin by analyzing the queue potential behavior in the EDF. In this event an arbitrary job from misses its deadline and is removed from the queue potential. The EDF policy orders the queue according to expiration time. Hence the job that misses its deadline only depends on its deadline. Since the deadlines and the rewards are independent according to assumption D and rewards are i.i.d we get:


This means that .

On the other hand the MUD policy drops a job with the minimal reward from as defined in statements 5-8 of the policy. Let be the job that is chosen to be dropped in the MUD case. Let be the number of events in sequence which occur up to event .


If then the probability of the event that in the sequence is 0; i.e., that the number of beneficial exchanges achieved by the MUD is negligible. The MUD policy orders the queue by expiration time exactly like EDF; thus, both policies behave identically at probability 1. Alternatively .

The conclusion is that the MUD policy is at least as good as the EDF policy. ∎

Theorem 2.

For any queuing model satisfying and assumptions B-D , MUD policy is as good as any policy .


By definition their expected reward is . From Dertouzos theorem [4] we know that EDF is optimal in that if there exists any policy that can build a feasible schedule then EDF also build a feasible schedule. Adding to it lemma 3 we deduce that MUD policy is as good as any policy . ∎

In order to prove that the MUD policy is better than any policy the following assumptions are required:

  1. .

  2. , Let

  3. .

  4. Let be the minimal difference between two different rewards. C and G imply that .

Assumption E guarantees that there is always a positive probability that a feasible job arrives. This is a natural assumption, since otherwise the will be no jobs to process. Assumption F guarantees that jobs wait in the queue with positive probability irrespective of the scheduling policy. H always holds when has a discrete distribution with finite support.

Lemma 4.

For any queuing model satisfying and assumptions A - H, there exists an arrival which occurs infinitely often such that .


In order to prove the lemma we generate a sequence of job arrivals called with strictly positive probability. We prove that the last event of sequence is of type bb regardless the scheduling policy. The sequence of jobs is defined as follows:

  1. Let where be the first job in the sequence.

  2. Let be a sequence of jobs with constant parameters. Let be defined by:

  3. Let be the last job in the sequence.

The first arrival empties the queue potential with positive probability by assumption B. By assumption E, if the queue is increased upon arrival of . After arrivals there is at least one job waiting in the queue when arrives. Meaning that one job must miss its deadline (). The probability of the arrival of is due to assumption D. The probability of the arrival of is . As a result the event has positive occurrence probability since:


The MUD policy drops a job with a smaller reward than causing the cumulative reward functions to increase at least by . For comparison, the EDF policy drops because it misses its deadline.

The arrivals are independent and mutually exclusive events by definition (forcing an empty queue) and . By the second Borel Cantelli lemma, ; i.e., meaning that this event occurs infinitely often.

From lemma 4 the series of events of type satisfies:

Theorem 3.

For any queuing model satisfying assumptions A - H, MUD policy is better than any policy .


From theorem 2 we know that in all cases except bb case and that . in case bb. From lemma 4 we can conclude as regards bb that

. By the strong law of large numbers there exists an

that for all with probability 1. Let . Then


This means that the MUD policy is better than EDF policy when assumptions A - H are held. Adding to it Dertouzos theorem [4] we can deduce that MUD policy is better than any policy . ∎

V-B Deterministic Time and Dual Reward Case

In this section we study the optimality of the MUD policy for deterministic service time. To simplify an already complicated notation we focus on the two priority levels case. We define a dual reward process by i.e. .

In the dual reward case we can split the queue into two sub-queues. Let

  • be the set of jobs in the queue potential with maximal reward.

  • be the set of jobs in the queue potential with minimal reward.

Similarly, we split the group of processed jobs to two groups: