I Introduction
Recent advances in information and communication networks and sensor technologies brought a wide range of new applications to reality. These include wireless sensor networks in which sensors equipped with communication modules are powered by renewable energy, such as solar energy and geothermal energy, as well as emerging technologies, such an unmanned aerial vehicles (UAVs). In many cases, the performance of the component systems as well as that of the overall system can be analyzed with the help of a suitable queueing model.
Unfortunately, most existing queueing models are inadequate because, in many of these new applications, the performance of the servers (e.g., a human supervisor monitoring and controlling multiple UAVs) is not timeinvariant and typically depends on the history of past workload or the types of performed tasks. As a result, little is known about the performance and stability of such systems and, hence, new methodologies and theory are needed.
In this study, we first propose a new framework for studying the stability of systems in which the efficiency of servers is timevarying and is dependent on their past utilization. Using the proposed framework, we then study the problem of designing a task scheduling policy with simple structure, which is optimal in that it keeps the task queue stable whenever doing so is possible using some policy.
To this end, we consider a queueing system
comprising the following three components:
A firstin firstout unbounded queue registers a new task when it arrives and removes it as soon as work on it is completed. It has an internal state, its queue size, which indicates the number of uncompleted tasks in the queue.
The server performs the work required by each task assigned to it. It has an internal state with two components. The first is the availability state, which indicates whether the server is available to start a new task or is busy. We assume that the server is nonpreemptive, which in our context means that the server gets busy when it starts work on a new task, and it becomes available again only after the task is completed. The second component of the state is termed actiondependent and takes values in a finite set of positive integers, which represent parameters that affect the performance of the server. More specifically, we assume that the actiondependent state determines the probability that, within a given timeperiod, the server can complete a task. Hence, a decrease in performance causes an increase in the expected time needed to service a task. Such an actiondependent state could, for instance, represent the battery charge level of an energy harvesting module that powers the server or the status of arousal or fatigue of a human operator that assists the server or supervises the work.
The scheduler has access to the queue size
and the entire state of the server. When the server is
available and the queue is not empty, the scheduler decides
whether to assign a new task or to
allow for a rest period. Our
formulation admits nonworkconserving policies
whereby the scheduler may choose to assign rest periods even
when the queue is not empty. This allows the server to
rest as a way to steer the actiondependent
state towards a range that can deliver better
longterm performance.
We adopt a stochastic discretetime framework in which time is uniformly partitioned into epochs, within which new tasks arrive according to a Bernoulli process. The probability of arrival per epoch is termed arrival rate^{1}^{1}1Notice that, unlike the nomenclature we adopt here, arrival rate is commonly used in the context of Poisson arrival processes. This distinction is explained in detail in Section IIB1.. We constrain our analysis to stationary schedulers characterized by policies that are invariant under epoch shifts. We discuss our assumptions and provide a detailed description of our framework in Section II.
Ia Main Problems
The following are the main challenges addressed in this article:
Main Problems:

An arrival rate is qualified as stabilizable when there is a scheduler that stabilizes the queue. Given a server, we seek to compute the supremum of the set of stabilizable arrival rates.

We seek to propose schedulers that have a simple structure and are guaranteed to stabilize the queue for any stabilizable arrival rate.
Notice that, as alluded to above in the scheduler description, we allow nonworkconserving policies. This means that, in addressing Problem Pi), we must allow policies that are a function not only of the queue size, but also of the actiondependent and availability states of the server. The design process for good policies is complicated by the fact that they are a function of these states with intricate dependence, illustrating the importance of addressing Problem Pii).
IB Preview of Results
The following are our main results and their correspondence with Problems Pi) and Pii).

In Theorem III.1, we show that the supremum mentioned in Pi) can be achieved by some function whose domain is the finite set of actiondependent states. The fact that such a quantity can be determined by a finite search is not evident because the queue size is unbounded.

As we state in Theorem III.2, given a server, there is a threshold policy that stabilizes the queue for any stabilizable arrival rate. The threshold policy assigns a new task only when the server is available, the queue is not empty, and the actiondependent state is less than a threshold chosen to be the value (found by a finite search) at which the maximum referred to in Ri) is attained. This is our answer to Problem Pii).
From this discussion we conclude that, to the extent that the stability of the system is concerned, we can focus solely on threshold policies outlined in Rii). We emphasize the fact that, as we discuss in Remark 2 of Section III, Theorem III.2 is valid even when the performance of the server is not monotonic with respect to the actiondependent state.
IC Related Literature
There exists a large volume of literature on queueing systems with timevarying service rate. In particular, with the rapid growth of wireless networks, there has been much interest in understanding and designing efficient scheduling policies with timevarying channel conditions that affect the probability of successful transmissions [2, 3, 6].
There are different formulations researchers considered for designing sound scheduling policies. For example, some adopted an optimization framework in which the objective is to maximize the aggregate utility of flows/users (e.g., [1, 2, 9, 11]). This approach allows the designers to carry out the tradeoff between the aggregate throughout and fairness among flows/users.
Another line of research, which is more relevant to our study, focused on designing throughput optimal schedulers that can stabilize the system for any arrival rate (vector) that lies in the stability region (e.g.,
[3, 14, 19]). However, there is a major difference between these studies and our study.In wireless networks, channel conditions and probability of successful transmission/decoding vary independently of the scheduling decisions chosen by the scheduler. In other words, the scheduler (together with the physical layer schemes) attempts to cope with or take advantage of timevarying channel conditions, which are beyond the control of resource managers and are not affected by scheduling decisions. In our study, on the other hand, the probability of successfully completing a task within an epoch depends on the past history of scheduling decisions. Consequently, the current scheduling decision affects the future efficiency of the server.
Another area that is closely related to our study is task scheduling for human operators/servers. The performance and management of human operators and servers (e.g., bank tellers, toll collectors, doctors, nurses, emergency dispatchers) has been the subject of many studies in the past, e.g., [5, 7, 18, 20]. Recently, with rapid advances in information and sensor technologies, human supervisory control, which requires processing a large amount of information in a short period, potentially causing information overload, became an active research area [12, 17].
As human supervisors play a critical role in the systems (e.g., supervisory control and data acquisition (SCADA)), there is a resurging interest in understanding and modeling the performance of humans under varying settings. Although this is still an active research area, it is well documented that the performance of humans depends on many factors, including arousal and perceived workload [7, 4, 10, 18]. For example, the wellknown YerkesDodson law suggests that moderate levels of arousal are beneficial, leading to the invertedU model [20].
In closely related studies, Savla and Frazzoli [15, 16] investigated the problem of designing a task release control policy. They assumed periodic task arrivals and modeled the dynamics of server utilization, which determines the service time of the server, using a differential equation; the server utilization increases when the server is busy and decreases when it is idle. They showed that, when all tasks bring identical workload, a policy that allows a new task to be released to the server only when its utilization is below a suitably chosen threshold, is maximally stabilizing [16, Theorems III.1 and III.2]
. In addition, somewhat surprisingly, they proved that when task workloads are modeled using independent and identically distributed (i.i.d.) random variables, the maximum achievable throughput increases compared to the homogeneous workload cases.
Paper Structure: A stochastic discretetime model is described in Section II. In it we also introduce notation, key concepts and propose a Markov Decision Process (MDP) framework that is amenable to performance analysis and optimization. Our main results are discussed in Section III, while Section IV describes an auxiliary MDP that is central to our analysis. The proofs of our results are presented in Section V, and Section VI provides concluding remarks.
Ii Stochastic DiscreteTime Framework
In the following subsection, we describe a discretetime framework that follows from assumptions on when the states of the queue and the server are updated and how actions are decided. In doing so, we will also introduce the notation used to represent these discretetime processes. A probabilistic description that leads to a tractable MDP formulation is deferred to Section IIB.
Iia State Updates and Scheduling Decisions: Timing and Notation
We consider an infinite horizon problem in which the physical (continuous) time set is , which we partition uniformly into halfopen intervals of positive duration as follows:
Each interval is called an epoch, and epoch refers to . Our formulation and results are valid regardless of the epoch duration . We reserve to denote continuous time, and is the discretetime index we use to represent epochs.
Each epoch is subdivided into three halfopen subintervals denoted by stages , and (see Fig. 1). As we explain below, stages and are allocated for basic operations of record keeping, updates and scheduling decisions. Although, in practice, the duration of these stages is a negligible fraction of , we discuss them here in detail to clarify the causality relationships among states and actions. We also introduce notation used to describe certain key discretetime processes that are indexed with respect to epoch number.
IiA1 Stage
The following updates take place during stage of epoch :
We assume that at most one new task arrives during each epoch. In addition, the server can work on at most one task at any given time and, within each epoch, the scheduler assigns at most one task to the server. However, these assumptions are not critical to our main findings and can be relaxed to allow more general arrival distributions and batch processing.
The queue size at time is denoted by , and is updated as follows:

If, during epoch , a new task arrives and no task is completed then .

If, during epoch , either (i) a new task arrives and a task is completed or (ii) no new task arrives and no task is completed, then .

If, during epoch , no new task arrives and a task is completed then .
The availability state of the server at time is denoted by and takes values in
We use to indicate that the server is busy working on a task at time . If it is available at time , then . The update mechanism for is as follows:

If , then when either no new task was assigned during epoch , or a new task was assigned and completed during epoch . If and a new task is assigned during epoch which is not completed until , then .

If and the server completes the task by time , then . Otherwise, .
We use to denote the actiondependent state at time , and we assume that it takes values in
The actiondependent state is nondecreasing while the server is working and is nonincreasing when it is idle. In Section IIB, we describe an MDP that specifies probabilistically how transitions to , conditioned on whether the server worked or rested during epoch .
Without loss of generality, we assume that , and are initialized as follows:
The overall state of the server is represented compactly by , which takes values in , defined as follows:
In a like manner, we define the overall state for the MDP taking values in as follows:
From the definition of , it follows that when the queue is empty, there is no task for the server to work on and, hence, it cannot be busy.
IiA2 Stage
It is during stage of epoch that the scheduler issues a decision based on : let represent the set of possible actions that the scheduler can request from the server, where and represent ‘rest’ and ‘work’, respectively. The assumption that the server is nonpreemptive and the fact that no new tasks can be assigned when the queue is empty, lead to the following set of available actions for each possible state in :
(1) 
We denote the action chosen by the adopted scheduling policy at epoch by , which takes values in . As we discuss in Section IIC, we focus on the design of stationary policies that determine as a function of .
IiA3 Stage
A task can arrive at any time during each epoch, but we assume that work on a new task can be assigned to the server only at the beginning of stage . More specifically, the scheduler acts as follows:

If and , then the server starts working on a new task at the head of the queue when stage of epoch begins.

When , the scheduler can also select to signal that no work will be performed by the server during the remainder of epoch. Once this ‘rest’ decision is made, a new task can be assigned no earlier than the beginning of stage of epoch . Since the scheduler is nonworkconserving, it may decide to assign such ‘rest’ periods as a way to possibly reduce and to improve future performance.

If , the server was still working on a task at time . In this case, because the server is nonpreemptive, the scheduler picks to indicate that work on the current task is ongoing and must continue until it is completed and no new task can be assigned during epoch .
IiB State Updates and Scheduling Decisions: Probabilistic Model
Based on the formulation outlined in Section IIA, we proceed to describe a discretetime MDP that models how the states of the server and queue evolve over time for any given scheduling policy.
IiB1 Arrival Process
We assume that tasks arrive during each epoch according to a Bernoulli process . The probability of arrival for each epoch ( ) is called the arrival rate and is denoted by , which is assumed to belong to
. Although we assume Bernoulli arrivals to simplify our analysis and discussion, more general arrival distributions (e.g., Poisson distributions) can be handled only with minor changes as it will be clear.
Notice that, as we discuss in Remark 1 below, our nomenclature for should not be confused with the standard definition of arrival rate for Poisson arrivals. Since our results are valid irrespective of , including when it is arbitrarily small, the remark also gives a sound justification for our adoption of the Bernoulli arrival model by viewing it as a discretetime approximation of the widely used Poisson arrival model.
Remark 1
It is a wellknown fact that, as tends to zero, a Poisson process in continuous time , with arrival rate , is arbitrarily well approximated by with .
IiB2 ActionDependent Server Performance
In our formulation, the efficiency or performance of the server during an epoch is modeled with the help of a service rate function . More specifically, if the server works on a task during epoch , the probability that it completes the task by the end of the epoch is . This holds irrespective of whether the task is newly assigned or inherited as ongoing work from a previous epoch.^{2}^{2}2This assumption is introduced to simplify the exposition. However, more general scenarios in which the probability of task completion within an epoch depends on the total service received by the task prior to epoch can be handled by extending the state space and explicitly modeling the total service received by the task in service. Thus, the service rate function quantifies the effect of the actiondependent state on the performance of the server. The results presented throughout this article are valid for any choice of with codomain .
IiB3 Dynamics of the ActionDependent State
We assume that (i) is equal to either or when is and (ii) is either or if is . This is modeled by the following transition probabilities specified for every and in .
(2a)  
(2b)  
where the parameters , which take values in , model the likelihood that the actiondependent state will transition to a greater or lesser value, depending on whether the action is or , respectively.
IiB4 Transition probabilities for
We consider that is independent of when conditioned on . Under this assumption, the transition probabilities for can be written as follows:
for every , in and in .
We assume that, within each epoch , the events that (a) there is a new task arrival during the epoch and (b) a task being serviced during the epoch is completed by the end of the epoch are independent when conditioned on and . Hence, the transition probability in (IIB4) is given by the following:
Definition 1
(MDP ) The MDP with input and state , which at this point is completely defined, is denoted by .
Table I summarizes the notation for MDP .
set of actiondependent states  
server availability ( available, busy)  
server availability at epoch (takes values in )  
server state components  
server state at epoch (takes values in )  
natural number system .  
queue size at epoch (takes values in )  
state space formed by  
system state at epoch (takes values in )  
possible actions ( = rest, = work)  
MDP whose state is at epoch  
set of actions available at a given state in  
action chosen at epoch .  
PMF  probability mass function 
IiC Evolution of the system state under a stationary policy
We start by defining the class of policies that we consider throughout the paper.
Definition 2
A stationary randomized policy is specified by a mapping that determines the probability that the server is assigned to work on a task or rest, as a function of the system state, according to
Definition 3
The set of admissible stationary randomized policies satisfying
(1) is denoted by .
Convention
We adopt the convention that, unless stated otherwise, a positive arrival rate is preselected and fixed throughout the paper. Although the statistical properties of and associated quantities subject to a given policy depend on , we simplify our notation by not labeling them with .
From (IIB4)  (4), we conclude that subject to a policy in
evolves according to a timehomogeneous Markov chain (MC), which we denote by
. Also, provided that it is clear from the context, we refer to as the system.The following is the notion of system stability we adopt
in our study.
Definition 4 (System stability)
For a given policy in , the system is stable if it satisfies the following properties:

There exists at least one recurrent communicating class.

All recurrent communicating classes are positive recurrent.

The number of transient states is finite.
We find it convenient to define to be the set of randomized policies in , which stabilize the system for the fixed arrival rate .
Before we proceed, let us point out a useful fact under any stabilizing policy in .
Lemma 1
A stable system has a unique positive recurrent communicating class, which is aperiodic. Therefore, there is a unique stationary probability mass function (PMF) for .
Please see Appendix A for a proof.
Definition 5
Given a fixed arrival rate and a stabilizing policy in , we denote the unique stationary PMF and positive recurrent communicating class of by and , respectively.
Iii Main Result
Our main results are Theorems III.1 and III.2, where we state that a soontobe defined quantity , which can be computed efficiently, is the least upper bound of all arrival rates for which there exists a stabilizing policy in (see Definition 4). The theorems also assert that, for any arrival rate less than , there is a stabilizing deterministic threshold policy in with the following structure:
(5) 
where lies in , and is a threshold policy that acts as follows:
(6) 
Notice that, when the server is available and the queue is not empty, assigns a new task only if is less than the threshold and lets the server rest otherwise.
In Section IV, we introduce an auxiliary MDP with finite state space , which can be viewed as the server state in when the queue size is always positive. First, using the fact that is finite, we demonstrate that, for every in , the auxiliary MDP subject to has a unique stationary PMF, which we denote by . Then, we show that, for any stable system under some policy in , we can find a threshold policy for the auxiliary MDP, which achieves the same longterm departure rate of completed tasks as in . As a result, the maximum longterm departure rate of completed tasks in the auxiliary MDP among all threshold policies with in serves as an upper bound on the arrival rate for which we can hope to find a stabilizing policy in .
Making use of this observation, we define the following important quantity:
(7) 
From the definition of the stationary PMF , can be interpreted as the maximum longterm departure rate of completed tasks under any threshold policy of the form in (5), assuming that the queue is always nonempty.
The following are the main results of this paper.
Theorem III.1
(Necessity) If, for a given arrival rate , there exists a stabilizing policy in , then .
Theorem III.2
(Sufficiency) Let be a maximizer of (7). If the arrival rate is strictly less than , then stabilizes the system.
Remark 2

The computation of in (7) along with a maximizing threshold relies on a finite search that can be carried out efficiently.

The theorems are valid for any choice of service rate function that takes values in . In particular, could be multimodal, increasing or decreasing.

The search that yields and an associated does not require knowledge of .
We point out two key differences between our study and [15, 16]. The model employed by Savla and Frazzoli assumes that the service time function is convex, which is analogous to our service rate function being unimodal. In addition, a threshold policy is proved to be maximally stabilizing only for identical task workload. In our study, however, we do not impose any assumption on the service rate function, and the workloads of tasks are modeled using i.i.d. random variables.^{3}^{3}3
To be more precise, our assumptions correspond to the case with exponentially distributed workloads. However, as mentioned earlier, this assumption can be relaxed to allow more general workload distributions.
Iv An Auxiliary MDP
In this section, we describe an auxiliary MDP whose state takes values in and is obtained from by artificially removing the queuelength component. We denote this auxiliary MDP by and its state at epoch by in order to emphasize that it takes values in . The action chosen at epoch is denoted by . We use the overline to denote the auxiliary MDP and any other variables associated with it, in order to distinguish them from those of the server state in .
As it will be clear, we can view as the server state of the original MDP for which infinitely many tasks are waiting in the queue at the beginning, i.e., . As a result, there is always a task waiting for service when the server becomes available.
The reason for introducing is the following: (i) is finite and, hence, is easier to analyze than , and (ii) we can establish a relation between and , which allows us to prove the main results in the previous section by studying instead of . This simplifies the proofs of the theorems in the previous section.
Admissible action sets: As the queue size is no longer a component of the state of , we eliminate the dependence of admissible action sets on , which was explicitly specified in (1) for MDP , while still ensuring that the server is nonpreemptive. More specifically, the set of admissible actions at each element of is given by
(8) 
Consequently, for any given realization of the current state , is required to take values in .
Transition probabilities: We define the transition probabilities that specify , as follows:
where and are in , and is in . Subject to these action constraints, the righthand terms of (IV) are defined, in connection with , as follows:
(10) 
(11a)  
(11b) 
A relation between the transition probabilities of and : From the definition above and (4), we can deduce the following equality: for all ,
(12)  
which holds for any in and in . Notice that the righthand side (RHS) of (12) does not change when we vary across the positive integers. From this, in conjunction with (IIB4), (IV) and (10), we have, for all ,
(13)  
The equality in (13) indicates that also characterizes the transition probabilities of the server state in when the current queue size is positive. This is consistent with our earlier viewpoint that can be considered the server state in initialized with infinite queue length at the beginning. We will explore this relationship in Section V, where we use to prove Theorems III.1 and III.2.
Iva Stationary policies and stationary PMFs of
Analogously to the MDP , we only consider
stationary randomized policies for ,
which are defined below.
Definition 6 (Stationary randomized policies for )
We restrict our attention to stationary randomized policies acting on , which are specified by a mapping , as follows:
for every in and in . The set of all stationary randomized policies for which honor (8) is defined to be .
Recurrent communicating classes of
The MDP subject to a policy in is a finitestate timehomogeneous MC and is denoted by . Because is finite, for any policy in , has a positive recurrent communicating class and a stationary distribution [8]. In fact, there are at most two positive recurrent communicating classes as explained below.
Define a mapping , where
We assume that if the set on the RHS is empty.
Case 1. : First, from the definition of , clearly all states with communicate with each other, but none of these states communicates with any other state with because . Second, because by assumption, all states with communicate with states with . Together with the first observation, this implies that these states with are transient. Therefore, there is only one positive recurrent communicating class given by
(14) 
Case 2. : In this case, it is clear that is an absorbing state and forms a positive recurrent communicating class by iself. Hence, if , as all other states communicate with , the only positive recurrent communicating class is and all other states are transient. On the other hand, if , for the same reason explained in the first case, gives rise to a second positive recurrent communicating class, and all other states with , except for , are transient.
In our study, we often limit our discussion to randomized policies in with . For this reason, for notational convenience, we define the set of randomized policies satisfying this condition by . The reason for this will be explained in the subsequent section.
The following proposition is an immediate consequence
of the above observation.
Corollary 1
For any policy in
,
has a unique stationary PMF,
which we denote by .
Existence of a policy for with an identical steadystate distribution of server state in : One of key facts which we will make use of in our analysis is that, for any stabilizing policy in , we can find a policy in which achieves the same steadystate distribution of server state. To this end, we first define, for each in ,
Definition 7 (Policy projection map )
We define a mapping , where
with
(15) 
We first present a lemma that proves useful in our analysis.
Lemma 2
For every stabilizing policy in , we have .
Please see Appendix B for a proof.
An obvious implication of the lemma is that belongs to for every in , and there exists a unique stationary PMF for , namely .
The following lemma shows that the
steadystate distribution of the server state
in under policy in
is identical to that of
under policy .
Lemma 3
Suppose that . Then, we have
(16) 
A proof is provided in Appendix C.
V Proofs of Main Results
In this section, we begin with a comment on the longterm average departure rate of completed tasks when the system is stable. Then, we provide the proofs of Theorems III.1 and III.2 in Sections VA and VB, respectively.
Remark 3
Recall from our discussion in Section II that, under a stabilizing policy in , there exists a unique stationary PMF . Consequently, the average number of completed tasks per epoch converges almost surely as goes to infinity. In other words,
where , and are the coordinates of . We call the longterm service rate of (for the given arrival rate ). Moreover, because is assumed to be a stabilizing policy, we have .
VA A proof of Theorem iii.1
In order to prove Theorem III.1, we make use of a similar notion of longterm service rate of , which can be viewed in most cases as the average number of completed tasks per epoch. (Step 1) We first establish that, for every stabilizing policy , we can find a related policy in whose longterm service rate equals that of or, equivalently, the arrival rate . (Step 2) We prove that in (7) equals the maximum longterm service rate achievable by any policy in . Together, they tell us .
Longterm service rate of : The longterm service rate associated with under policy in is defined as follows. First, for each in , let be the set of stationary PMFs of . Clearly, by Corollary 1, for any in , there exists a unique stationary PMF and is a singleton. The longterm service rate of in is defined to be
(17) 
Recall that
is the pair taking values in .
Step 1:
The following lemma illustrates that the longterm service
rate achieved by in
equals that of .
Lemma 4
Suppose that is a stabilizing policy in . Then, .
First, note
where follows from Lemma 3, and results from rearranging the summations in terms of . Finally and hold by definition. The lemma follows from Remark 3 that is equal to .
Since belongs to as explained earlier, Lemma 4 implies
(18) 
Step 2: We shall prove that in two steps. First, we establish that there is a stationary deterministic policy that achieves . Then, we show that, for any stationary deterministic policy, we can find a deterministic threshold policy that achieves the same longterm service rate, thereby completing the proof of Theorem III.1.
Let us define to be a subset of , which consists only of stationary deterministic policies for . In other words, if , then for all . Theorem 9.1.8 in [13, p. 451] tells us that if
Comments
There are no comments yet.