In this article, we adopt the discrete-time framework proposed in , in which a scheduler governs when tasks waiting in a first-come-first-serve queue are assigned to a server. The server is non-preemptive, and it has an internal state comprising two components labelled as availability state (busy or available) and activity state, which accounts for the intensity of the effort put in by the server. The activity state depends on current and previous scheduling decisions, and it is useful for modelling performance-influencing factors, such as the state of charge of the batteries of an energy harvesting module that powers one or more components of the server. As a rule, the activity state may increase while the server is busy and, otherwise, decrease gradually while the server is available (or resting). In our approach , a service rate function ascribes to each possible activity state, out of finitely many, a probability that the server can complete a task in one time-step. According to our model of non-preemptivity, once the server becomes busy working on a task it gets to be available again only when the task is completed. When the server is available, the scheduler decides, based on the activity state and the size of the queue, whether to assign a new task to the server. Although our results remain valid for any service rate function, in many applications it is decreasing, which causes the server performance (understood as task completion probability) to worsen as the activity state increases. A vital trade-off the scheduler faces, in this case, is whether to assign a new task when the server is available (resting) or allow it to remain available to possibly ameliorate the activity state as a way to improve future performance.
I-a Problem Statements and Comparison to 
Besides introducing and justifying in detail the formulation adopted here, in  the authors characterize the supremum of all arrival rates for which there is a scheduler that can stabilize the queue. The analysis in  also shows that such a supremum can be computed by a finite search and it identifies simple stabilizing scheduler structures, such as those with a threshold-type configuration.
In this article, we build on the analysis in  to design schedulers that guarantee not only stability but also lessen the rate at which the server is used, which we denote as utilization rate —understood as the average portion of time in which the server is busy working on a task. More specifically, throughout this article, we will investigate and provide solutions to the following two problems.
Given a server and a stabilizable arrival rate, determine a tractable method to compute the infimum of all utilization rates achievable by a stabilizing scheduling policy. Such a fundamental limit is important to determine how effective any given policy is in terms of the utilization rate.
Given a server and a stabilizable arrival rate, determine a tractable method to design policies whose utilization rate is arbitrarily close to the fundamental limit.
I-B Overview of Main Results and Technical Approach
In §III, Theorem 1 states our main result, from which we obtain Corollaries 1 and 2 that constitute our solutions to Problems 1 and 2, respectively. The following are key consequences of these corollaries:
According to Corollary 1, the infimum utilization rate (alluded to in Problem 1) can be computed by solving a finite-dimensional linear program.
If the arrival rate is stabilizable by the server then Corollary 2 guarantees that for each positive gap there is a stabilizing scheduling policy whose utilization rate exceeds the infimum (characterized by Corollary 1) by at most . Notably, such a scheduling policy can be obtained from a solution of a suitably-specified finite-dimensional linear program.
Our main technical approach builds on that of 
, inspired on which we find ways to solve Problems 1 and 2 by tackling simplified versions of them suitably adapted for an appropriately-constructed finite-state controlled Markov chain (denoted asreduced process).
This article is mathematically more intricate than , which is unsurprising considering that it tackles not only stabilization but also regulation of the utilization rate. Among the new concepts and techniques put forth to prove Theorem 1, the distributional convergence results of §V, and the potential-like method used to establish them, are of singular importance —they are also original and relevant in their own right.
I-C Related literature
As mentioned earlier, to the best of our knowledge, our work is the first to study the problem of minimizing the utilization rate of a server whose performance is time-varying and dependent on an internal state that reflects its activity history. For this reason, there are no other results to which we can compare our findings.
An earlier study that examined a system that closely resembles ours is that of Savla and Frazzoli . They studied the problem of designing a maximally stabilizing task release control policy, using a differential system model. Under an assumption that the service time function is convex, they derived bounds on the maximum throughput achievable by any admissible policy for a fixed task workload distribution. In addition, they showed the existence of a maximally stabilizing threshold policy when tasks have the identical workload. Finally, they also demonstrated that the maximum achievable throughput increases when the task workload is not deterministic. However, they did not consider the problem of minimizing utilization ratio in their study.
In addition to the aforementioned study, there are a few research fields that share a key aspect of our problem, which is to design a scheduling policy to optimize the performance with respect to one objective, subject to one or more constraints. For instance, wireless energy transfer has emerged as a potential solution to powering small devices that have low-capacity batteries or cannot be easily recharged, e.g., Internet-of-Things (IoTs) devices [3, 18]. Since the devices need to collect sufficient energy before they can transmit and the transmission rate is a function of transmit power, a transmitter has to decide (i) when to harvest energy and (ii) when to transmit and what transmission rate it should use. The studies reported in [13, 6, 23] examined the problem of maximizing throughput in wireless networks in which communication devices are powered by hybrid access points via wireless energy transfer. In a related study, Shan et al.  studied the problem of minimizing the total transmission delay or completion time of a given set of packets.
Integrated production scheduling and (preventive) maintenance planning in manufacturing, where machines can fail with time-varying rates, shares similar issues as scheduling devices powered by wireless energy transfer [4, 16, 24]. In more traditional approaches, the problems of production scheduling and maintenance scheduling are considered separately, and equipment failures are treated as random events that need to coped with. When the machine failure probability or rate is time-varying and depends on the age since last (preventive) maintenance, the overall production efficiency can be improved by jointly considering both problems. For instance, the authors of  formulated the problem using an MDP model with the state consisting of the system’s age (since the last preventive maintenance) and the inventory level, and investigated the structural properties of optimal policies.
Another area that shares a similar objective is the maximum hand-offs control or sparse control [17, 5, 10, 11, 9]. The goal of the maximum hands-off control is to design a control signal that maximizes the time at which the control signal is equal to zero and inactive. For instance, the authors of  showed that, under the normality condition, the optimal solution sets of a maximum hands-off control problem and an associated -optimal control problem coincide. Moreover, they proposed a self-triggered feedback control algorithm for infinite-horizon problems, which leads to a control signal with a provable sparsity rate, while achieving practical stability of the system. In another study , Chatterjee et al. provided both necessary conditions and sufficient conditions for maximum hands-off control problem. Ikeda and Nagahara  considered a linear time-invariant system and showed that, if the system is controllable and the dynamics matrix is nonsingular, the optimal value of the optimal control problem for the maximum hands-off control is continuous and convex in the initial condition.
Finally, another research problem, which garnered much attention in wireless sensor networks and is somewhat related to the maximum hands-off control, is duty-cycle scheduling of sensors. A common objective for the problem is to minimize the total energy consumption subject to performance constraints on delivery reliability and delays . The authors of 
proposed using a reinforcement learning-based control mechanism for inferring the states of neighboring sensors in order to minimize the active periods. In another study, Vigorito et al. studied the problem of achieving energy neutral operation (i.e., keep the battery charge at a sufficient level) while maximizing the awake times. In order to design a good control policy, they formulated the problem as an optimal tracking problem, more precisely a linear quadratic tracking problem, with the aim of keeping the battery level around some target value.
I-D Paper structure
This article has five sections. After the introduction, in §II we describe the technical framework, including the controlled Markov chains that models the server and specifies a relevant auxiliary reduced process, define key quantities and maps that quantify the utilization rate, characterize key policy sets, specify the notion of stability used throughout the article, and state and prove certain preliminary results. Our main theorem and key results are stated in §III, while §IV and §V present continuity and distributional convergence properties, respectively, that are required in the proof of our main theorem. We defer the most intricate proofs, some of which also require additional auxiliary results, to appendices at the end of the article.
Ii Technical Framework and Key Definitions
This section starts with a synopsis of the framework put forth thoroughly in . It replicates from  what is strictly necessary to make this article self-contained. In this section, we also introduce the concepts, sets, operators and notation that are required to formalize and solve Problems 1 and 2.
We adopt a discrete-time approach in which each discrete-time instant
can be associated with a continuous time epoch, as described in.
Ii-a Stochastic Discrete-time Framework
As in , we consider that the server is represented by the MDP . The state of the server at time has two components representing the activity state and the availability state, respectively. There are possible activity states. The server is either available or busy at time , as indicated by or , respectively. Consequently, the state-space of the server is represented as:
where and are the sets of possible operational and availability states, respectively.
The MDP represents the overall system comprising the server and the queue length. More specifically, the state of the system is , where is the length of the queue at time , and the state-space of is:
Notice that excludes the case in which the server would be busy while the queue is empty.
The action of the scheduler at time is represented by , which takes values in the set . The scheduler directs the server to work at time when and instructs the server to rest otherwise. The assumption that the server is non-preemptive and the fact that no new tasks can be assigned when the queue is empty, lead to the following set of available actions for each possible state in :
We assume that tasks arrive according to a Bernoulli process . The arrival rate is denoted with and takes values in .
Ii-A1 Action-Dependent Server Performance
In our formulation, the efficiency or performance of the server during an epoch is modeled with the help of a service rate function . More specifically, if the server works on a task during epoch , the probability that it completes the task by the end of the epoch is . This holds irrespective of whether the task is newly assigned or inherited as ongoing work from a previous epoch. Thus, the service rate function quantifies the effect of the activity state on the performance of the server. The results presented throughout this article are valid for any choice of with codomain .
Ii-A2 Dynamics of the activity state
We assume that (i) is equal to either or when is and (ii) is either or if is . This is modeled by the following transition probabilities specified for every and in .
where the parameters , which take values in , model the likelihood that the operational state will transition to a greater or lesser value, depending on whether the action is or , respectively.
Ii-A3 Transition probabilities for
We consider that is independent of when conditioned on . Under this assumption, the transition probabilities for can be written as follows:
for every , in and in .
We assume that, within each epoch , the events that (a) there is a new task arrival during the epoch and (b) a task being serviced during the epoch is completed by the end of the epoch are independent when conditioned on and . Hence, the transition probability in (II-A3) is given by the following:
(MDP ) The MDP with input and state , which at this point is completely defined, is denoted by .
Table I summarizes the notation for MDP .
|set of activity states|
|server availability ( available, busy)|
|server availability at epoch (takes values in )|
|server state components|
|server state at epoch (takes values in )|
|natural number system .|
|queue size at epoch (takes values in )|
|state space formed by|
|system state at epoch (takes values in )|
|possible actions ( = rest, = work)|
|MDP whose state is at epoch|
|set of actions available at a given state in|
|action chosen at epoch .|
|PMF||probability mass function|
Ii-A4 Stationary Policies, Stability and Stabilizability
We start by defining the class of policies that we consider throughout the paper.
A stationary randomized policy is specified by a mapping that determines the probability that the server is assigned to work on a task or rest, as a function of the system state, according to
The set of stationary randomized policies satisfying
(3) is denoted by .
Although the statistical properties of subject to a given policy depend on the parameters specifying , including , we simplify our notation by not representing this dependence, unless noted otherwise. With the exception of , which we do not pre-select, we assume that all the other parameters for are given and fixed throughout the paper.
From (II-A3) - (6), we conclude that subject to a policy in evolves according to a time-homogeneous Markov chain (MC), which we denote by . Also, provided that it is clear from the context, we refer to as the system.
The following is the notion of system stability we adopt
throughout this article.
Definition 4 (System stability)
For a given policy in , the system is stable if it satisfies the following properties:
There exists at least one recurrent communicating class.
All recurrent communicating classes are positive recurrent.
The number of transient states is finite.
We find it convenient to define to be the set of randomized policies in , which stabilize the system for an arrival rate .
Before we proceed, let us point out a useful fact under any stabilizing policy in .
[14, Lemma 1] A stable system has a unique positive recurrent communicating class, which is aperiodic. Therefore, there is a unique stationary probability mass function (PMF) for .
Given an arrival rate and a stabilizing policy in , we denote the unique stationary PMF and positive recurrent communicating class of by and , respectively.
Ii-B Utilization Rate: Definition and Infimum
(Utilization rate function) The function that determines the utilization rate in terms of a given stabilizable arrival rate and a stabilizing policy , is defined as:
The utilization rate quantifies the proportion of the time in which the server is working. Notably, the expected utilization rate , computed for with arrival late and stabilized by , coincides with the probability limit of the utilization rate, as defined for instance in  (with ), when the averaging horizon tends to infinity. Using our notation, the aforesaid probability limit can be stated as follows:
where is when and otherwise.
The infimum utilization rate for a given stabilizable arrival rate is defined as:
Ii-C Auxiliary MDP
We proceed with describing an auxiliary MDP whose state takes values in and is obtained from by artificially removing the queue-length component. We denote this auxiliary MDP by and its state at epoch by in order to emphasize that it takes values in . The action chosen at epoch is denoted by . We use the overline to denote the auxiliary MDP and any other variables associated with it, in order to distinguish them from those of the server state in .
Under certain conditions, which we will determine later on, we can determine important properties of by analysing . Notably, we will use the fact that is finite to compute via a finite-dimensional linear program, and also to simplify the proofs of our main results.
As the queue size is no longer a component of the state of , we eliminate the dependence of admissible action sets on , which was explicitly specified in (3) for MDP , while still ensuring that the server is non-preemptive. More specifically, the set of admissible actions at each element of is given by
Consequently, for any given realization of the current state , is required to take values in .
We define the transition probabilities that specify , as follows:
where and are in , and is in . Subject to these action constraints, the right-hand terms of (II-C) are defined, in connection with , as follows:
Ii-D Stationary policies and stationary PMFs of
Analogously to the MDP , we only consider
stationary randomized policies for ,
which are defined below.
Definition 8 (Stationary randomized policies for )
We restrict our attention to stationary randomized policies acting on , which are specified by a mapping , as follows:
for every in and in . The set of all stationary randomized policies for which honor (10) is defined to be .
Following the approach in , henceforth we restrict our analysis to the subset of defined as follows:
The main benefit of focusing on policies in , as stated in [14, Corollary 1], is that has a unique stationary PMF for every in . More specifically, that strategies in rule out the case in which is an absorbing state guarantees the uniqueness of the stationary PMF. Furthermore, from [14, Lemmas 2 and 4] we conclude that restricting to any search that seeks to determine bounds or fundamental limits with respect to stabilizing policies incurs no loss of generality.
Ii-E Service rate of and précis of stabilizability results
We start by defining the service rate of for a given policy in :
The maximal service rate for is defined below.
As stated in [14, Theorems 3.1 and 3.2], any arrival rate lower than is stabilizable. Furthermore, these theorems also assert that any arrival rate above is not stabilizable and that can also be computed by determining which threshold policy , among the finitely many defined in [14, (6)], maximizes .
We define the map as follows:
It follows from its definition that yields a policy for that acts as the given in when the queue is not empty and imposes rest otherwise.
We reserve , without a superscript, to denote a design parameter. It acts as a constraint in the definition of the following policy sets.
(Policy sets and ) Given in , we define the following policy sets:
where is defined as:
We also define the following class of policies generated from and through :
The following proposition establishes important stabilization properties for the policies in .
Let the arrival rate in be given. If is in then is stable, irreducible and aperiodic for any in .
Stability of can be established using the same method adopted in  to prove [14, Theorems 3.2], which uses [14, Lemma 8] to establish a contradiction when is assumed not stable. That is irreducible follows from the fact that, under any policy in , all states of communicate with . That the probability of transitioning away from is less than one implies that the chain is aperiodic.
An immediate consequence of Proposition 1 is that is a nonempty subset of when . This implies that, as far a stabilizability is concerned, there is no loss of generality in restricting our analysis to policies with the structure in (18). More interestingly, from Theorem 1, which will be stated and proved later on in Section III, we can conclude that restricting our methods for solving Problem 2 to policies of the form (18) also incurs no loss of generality.
The following projection map will be important going forward.
Definition 11 (Policy projection map )
Given in , we define a mapping , where
Notice that although the map depends on , for simplicity of notation, we chose not to denote that explicitly. It is worthwhile to note that the map , for a given less than , allows us to establish the following remark comparing the service rate notions for and .
Ii-F Utilization rate of and computation via LP
We now proceed to defining the utilization rate of for a given in . Subsequently, we will define and propose a linear programming approach to computing the infimum of the utilization rates attainable by any policy for subject to a given service rate.
Given a policy in , the following function determines the utilization rate of :
(Infimum utilization rate and ) The infimum utilization rate of for a given departure rate is defined as:
We also define the following approximate infimum utilization rates:
Notice that the infimum that determines and is well-defined because there is a unique stationary PMF for each policy in .
Notice that since , we conclude the that following holds:
We now proceed to outlining efficient ways to compute , which is relevant because, as Corollary 1 indicates in §III, we can use it to compute when . Hence, below we follow the approach in [1, Chapter 4] to construct approximate versions of that are computable using a finite-dimensional linear program (LP). Subsequently, we will obtain the policies in corresponding to solutions of the LP, as is done in [1, Chapter 4]. The policies obtained in this way will form a set for each in that will be useful later on.
(-LP utilization rate )
Let be a given constant in and be a pre-selected departure rate in . The -LP utilization rate is defined as:
where the minimization is carried out over the following set:
Every solution is subject to the following constraints and is compactly represented as :
and the equality below guarantees that every solution will be consistent with :
(Solution set ) For each in and in , we use to represent the set of solutions of the LP specified by (30). We adopt the convention that is empty if and only if the LP is not feasible.
Ii-G LP-based policy sets
For each solution in we can obtain a corresponding policy in for as follows:
(Policy set ) For each in and in , we define the following set of policies :
Here, we adopt the convention that is empty if and only if is empty.
The following proposition will justify choices for we will make at a later stage to guarantee that is nonempty for in .
If in is such that is nonempty then is nonempty for any in and in .
We start by invoking [14, Lemma 7] to conclude that is nonempty, and consequently that