1 Introduction.
We consider a canonical scheduling problem in a discretetime, multiclass, multiserver parallel server queueing system. In particular, we consider a system with distinct queues, and distinct servers. Each queue corresponds to a different class of arrivals; arrivals queue are Bernoulli(), i.i.d across time. Service rates are heterogeneous across every pair of queue and server (i.e., a “link”). At each time step, a central scheduler may match at most one queue to each server. Services are also Bernoulli; thus jobs may fail to be served when matched, and in this case the policy is allowed to choose a different server for the same job in subsequent time step(s). Jobs in queue incur a holding cost per time step spent waiting for service. Letting denote the queue length of queue at time , the performance measure of interest up to time is the cumulative expected holding cost incurred up to time :
(All our analysis extends to the case where the objective of interest is a timediscounted cost, i.e., where the ’th term is scaled by , where the discount factor satisfies .)
Our emphasis in this paper is on solving this problem when the link service rates are a priori unknown; the scheduler only learns the link service rates by matching queues to servers, and observing the outcomes. We use as our benchmark the rule for scheduling, when link service rates are known. The rule operates as follows: at each time step, each link from a nonempty queue to server is given a weight ; all other links are given weight zero. The server then chooses a maximum weight matching on the resulting graph as the schedule for that time step. It is well known that when there is only a single server, this rule delivers the optimal expected holding cost among all feasible scheduling policies. Further, there has been extensive analysis of the performance and optimality properties of this rule even in multiple server settings. (See related work below.)
When service rates are unknown, we measure the performance of any policy using (expected) regret at : this is the expected difference between the cumulative cost of the policy, and the cumulative cost of the rule. Our goal is to characterize policies that minimize regret. In typical learning problems such as the stochastic multiarmed bandit (MAB) problem, optimal policies must resolve an explorationexploitation tradeoff. In particular, in order to minimize regret, the policy must invest effort to learn about unknown actions, some of which may later prove to be suboptimal—and thus incur regret in the process. In such settings, any optimal policy incurs regret that increases without bound as ; for example, for the standard MAB problem, it is well known that optimal regret scales as [15, 3, 1].
In this paper, we show a striking result: in a wide range of settings, the empirical rule—i.e., the
rule applied using the current estimates of the mean service rates—is regret optimal, and further, the resulting optimal regret is bounded by a
constant independent of . Thus, in such settings there is no tradeoff between exploration and exploitation. The scheduler can simply execute the optimal schedule given its current best estimate of the services rates of the links. In other words, the empirical rule benefits from free exploration.We make three main contributions: (1) regret analysis of the empirical rule in the single server setting; (2) stability analysis of the rule in the multiserver setting; and (3) subsequent regret analysis of the empirical rule in the multiserver setting. We summarize these contributions below.

Learning in the singleserver setting. We begin our analysis by focusing on the singleserver setting, where the rule is known to be optimal on any finite time horizon. This setting admits a particularly elegant analysis, due to the following two observations. First, the empirical rule is workconserving, as is the benchmark rule with known service rates. Second, all workconserving scheduling policies have the property that they induce the same busy period distribution on the queueing system. Using this observation, we can couple the empirical rule to the
rule with known service rates, and divide our analysis into epochs defined by busy periods. At the end of any busy period, all queue lengths are identical in both systems: namely, zero. We show that after a sufficiently large number of busy periods have elapsed (namely,
), with high probability the empirical
rule has sufficient knowledge of each arm that it exactly matches the rule going forward. Finally, we use the fact that any workconserving policy induces a queuelength process that is geometrically ergodic to show that the expected regret is bounded by a constant. 
An interlude: Stability in the multiserver setting. Next, we turn our attention to regret analysis in the setting of multiple servers. Here, however, we face a challenge: in contrast to the singleserver setting, where the rule is known to be optimal, with multiple servers the rule may not even be stabilizing, despite the availability of sufficient service capacity. Further, somewhat surprisingly there are no known general results in the literature on stability of the parallel server rule. In order to carry out regret analysis, of course, we require such conditions; therefore we develop them for our analysis. These results are of independent interest.
We provide three results on stability. First, we construct a class of examples that demonstrate that the rule need not be stabilizing. Second, we develop a general condition for stability of the rule on a particular class of queueing networks, where the rule takes the form of a hierarchical static priority rule. Informally, these are networks where the configuration of service rates and costs is such that a priority structure among the queues can be embedded in a hierarchical graph. In particular, we show for these systems that stability is equivalent to geometric ergodicity of the resulting queuelength process. This condition is not directly over model primitives; thus in our third result we provide a stronger sufficient condition for geometric ergodicity of the rule that can be directly checked on model primitives for a generic scheduling problem. We show a number of network configurations for which this condition holds.

Learning in the multiserver setting. Having determined a sufficient condition for stability, we turn our attention to learning in the multiserver setting. We show that for problem instances where the rule with known service rates yields a geometrically ergodic queue length process, the empirical rule yields a difference in queue lengths with the benchmark that decays at least polynomially with time. As in the single server setting, this again results in regret, following two insights that parallel our analysis of the singleserver setting: first, that the system eventually reaches a state of “free exploration”; and second, that the tails of the busy period can be shown to sufficiently light.
1.1 Related work.
Many variants of the dynamic stochastic scheduling problem, for both discrete and continuous time queueing networks, have been long studied [21, 17]
. Conventionally, it has been studied in the Markov decision process framework where it is assumed that the service rates are known a priori, and the proposed solution is usually an
index type policy that schedules nonempty queues according to a static priority order based on the mean service time and holdingcosts. The simplest variant of the problem is that of a multiclass singleserver system for which the rule has been shown to be optimal in different settings [7, 6, 12]. Klimov [14] extended the rule to multiclass singleserver systems with Bernoulli feedback. Van Mieghem [23] studies the case of convex costs for a queue and proves the asymptotic optimality of the generalized rule in heavy traffic. Ansell et al. [2] develop the Whittle’s index rule for an system with convex holdingcosts.The works in [11, 5] study a simple parallel server model—the Nnetwork, which is a two queue, two server model with one flexible and one dedicated server—and propose policies that achieve asymptotic optimality in heavy traffic. Glazebrook and Mora [9] consider the parallel server system with multiple homogeneous servers and propose an index rule that is optimal as arrival rates approach system capacity. Lott and Teneketzis [16] also study the parallel server system with multiple homogeneous servers and derive sufficient conditions to guarantee the optimality of an index policy. Mandelbaum and Stolyar [18] study the continuous time parallel server system with nonhomogeneous servers and convex costs. They prove the asymptotic optimality of the generalized rule in heavy traffic. Among the above papers, only [7, 6, 12, 16] consider the holdingcost across a finite horizon. The rest have their objective as the infinite horizon discounted and/or average costs. Our work provides results for both the finite horizon discounted cost and finite horizon total cost problems.
Another framework in which the problem can be studied is the stochastic multiarmed bandit problem, where the aim is to minimize the regret in finite time. Traditional work in the space of MAB problems focuses on the explorationexploitation tradeoff and investigates various exploration strategies to achieve optimal regret [15, 3, 1]. More recently, explorationfree or greedy algorithms have been studied and shown to be effective in a few contexts. [19] studies the linear bandit problem in the Bayesian setting and shows asymptotic optimality of a greedy algorithm with respect to the known prior. For a variant of the linear contextual bandits, Bastani et al. [4] propose to reduce exploration by dynamically deciding to incorporate exploration only when it is clear that the greedy strategy is performing poorly. For a slightly different variant of the linear contextual bandits, Kannan et al. [13] show that perturbing the context randomly and dynamically can give nontrivial regret bounds for the greedy algorithm with some initial training. In a similar vein, our work proposes to reduce exploration through a conditionalexploration strategy. We show that this policy eventually transforms into a purely greedy strategy because the system naturally provides free exploration.
1.2 Organization of the paper.
We describe the queueing model and main objective of this work in Section 2. In Section 3, we present the analysis for the single server system. In Section 4, we show that the stability region for the rule is a strict subset of the capacity region and give sufficient conditions for geometric ergodicity under the rule. In Section 5, we extend the analysis presented in Section 3 to parallel server systems and show constant order regret when the system is geometrically ergodic under the rule. Appendix 6 is devoted to the study of a special class of scheduling rules called hierarchical rules, for which we exhibit a recursive procedure which verifies geometric ergodicity from the system parameters. The more technical proofs are organized in Appendices 7–11.
2 Problem Setting.
We describe the model, the objective, and the rule.
2.1 Parallel server system with linear costs.
Consider a discretetime parallel server system with queues (indexed by ) and servers (indexed by ). Jobs arrive to queue according to a Bernoulli process with rate independent of other events. Denote the joint arrival process by . At any time, a server can be assigned only to a single job and viceversa; however, multiple servers are allowed to be assigned to different jobs in the same queue. For convenience of exposition, we assume that jobs are assigned according to FCFS. At any time, the probability that a job from queue assigned to server is successfully served is independent of all other events. We denote this joint service distribution by . Jobs that are not successfully served remain in the queue and can be reassigned to any server in subsequent timeslots. The queues have infinite capacity, and denotes the waiting cost per job per timeslot for queue . For this system, a scheduling rule is defined as one that decides, at the beginning of every timeslot, the assignment of servers to queues. It is assumed that

the outcome of an assignment is not known in advance, i.e., in any timeslot, whether or not a scheduled job is served successfully can be observed only at the end of the timeslot;

the waiting cost per job is known for all the queues.
We study the learning variant of the problem, and therefore make the additional assumption that

the arrival rates and success probabilities are unknown.
For timeslots, the expected total waiting cost in finite time is given by
(1) 
Here is the queuelength of queue at the beginning of timeslot , with the evolution dynamics given by the equation
where and
are the arrival vector and allocated service vector respectively. In (
1), the instantaneous waiting cost is a linear function of the queuelengths. [Stability] For a Markov policy , i.e., a scheduling rule that makes decisions in every timeslot based on the current queuestate, the system is said to be stable under if the chain is positive recurrent andwhere is its invariant distribution.
For a given service rate (success probability) matrix and a Markov policy , let the stability region be the set of all arrival rates for which the system is stable under . The capacity region of the parallel server system with service rate matrix is given by . The capacity region can be characterized by the class of staticsplit scheduling policies.
where is the set of all right stochastic matrices.
2.2 The rule.
In this paper, we focus on the rule with linear costs for the parallel server system. This rule (see Algorithm 1), which is a straightforward generalization of the single server rule, allocates servers to jobs based on a priority rule determined by the product of the waiting cost and success probability.
maximize  
subject to  
For a single server system, and when the success probabilities for all the links are known a priori, it has been established that the rule optimizes the expected total waiting cost over a finite time horizon [6]. For a parallel server system, there are no known algorithms that achieve optimal cost as in a single server system. For waiting costs that are strictly convex in queuelengths, Mandelbaum and Stolyar [18] prove that, in heavytraffic, the generalized rule optimizes the instantaneous waiting cost asymptotically.
In order for the rule to be unambiguously defined, we impose the assumption that
(2) 
Our interest lies in designing scheduling algorithms that can mimic the rule in the absence of channel statistics. We evaluate an algorithm based on a finite time performance measure called regret. Conventionally, in bandit literature, regret measures the difference in the performance objective between an adaptive algorithm and a genie algorithm that has an a priori knowledge of the system parameters. For our problem, the genie algorithm applies the rule at every step, using the service matrix . Therefore, regret here is defined as the difference between total waiting costs (given by equation (1)) under the proposed algorithm and the algorithm. For any given parameter set such that , we study the asymptotic behavior of regret as the timeperiod tends to infinity.
3 Learning the Rule—Single Server System.
We first consider the single server system in order to highlight a few key aspects of the problem. We later extend our discussion and results to the parallelserver case in Section 5. For the single server system, we propose a natural ‘learning’ extension of the algorithm, which we refer to as the algorithm, or rule. This scheduling algorithm applies the rule using empirical means for obtained from past observations as a surrogate for the actual success probabilities. Let us denote the queuelengths under the and rules by and respectively. Further, we denote the regret of the algorithm by
where and are the respective expected total waiting costs.
We show that the queuelength error for the algorithm decays geometrically with time. It then follows that the regret scales as a constant with increasing for any . It is interesting to observe that this scaling is achieved only by using the empirical means in every timeslot, without an explicit explore strategy. Our results show that this scheduling policy delivers free exploration due to some unique properties of the single server system, as we describe further below (see Section 3). For single server systems, the definition in Eq. 2 takes the form .
For any such that , there exist constants and such that
for any . In particular, there exists a constant independent of such that the regret satisfies .
Before proving the result, we briefly outline the intuition. The result relies on the following key observation.
Observation 1
The distribution of busy cycles is the same for all work conserving scheduling policies in a single server system.
This can be confirmed by considering a stochastically equivalent system where, for any , jobs arrive to queue with i.i.d. interarrival times distributed as and i.i.d. service times distributed as . In such a system, a scheduling algorithm only decides which part of the work is completed in each time slot and therefore, all work conserving algorithms give the same busy cycle.
We now see how Observation 1 can be used to prove Section 3. This observation implies that the and systems have the same queue length (equal to ) at the end of their common busy cycles. In order for the estimated priority order by the algorithm to agree with , it needs sufficient number of samples for all the links. Since the number of samples for each queue at the end of a busy cycle is equal to the total work (in terms of service time) arrived to the queue, it is sufficient to consider the end of a busy cycle by which the system has seen at least arrivals to every queue. Thus, every work conserving policy has the same number of samples for each of the links at the end of a busy cycle. Finally, we exploit the fact that busy periods have geometrically decaying tails to show that as a consequence, the algorithm makes the same scheduling decision as the rule after a random time that has finite expectation. This argument is a clear example of free exploration, since there is no need to incorporate an explicit exploration strategy into the scheduling algorithm as long as it is work conserving.
Proof.
Proof of Section 3. The crux of the proof lies in characterizing the random time after which the algorithm makes the same scheduling decision as the rule in all future timeslots. In any timeslot , the algorithm makes the same scheduling decision as the rule if (i) , and (ii) the estimated priority order agrees with the rule at .
Our argument crucially relies on Observation 1. We start by noting that the queuelength process under any workconserving algorithm is geometrically ergodic and the busy cycle lengths have geometrically decaying tails. Specifically, there exist constants and such that the first hitting time of the state , denoted by , satisfies
(3) 
To formalize the intuition in the paragraph preceding the proof, let where for some , and let be the end of the busy period that contains . Then from Eq. 3, using Markov’s inequality, we have
(4) 
where the second inequality follows by the definition of . Now, let be the average number of successes in the first assignments of the server to queue . Consider the following two events:

,

.
Then, conditioned on , the algorithm agrees with the rule after , and therefore its queuelength equals that of after . Thus, given , we have
. It is easy to show the following using the Chernoff–Hoeffding bound for Bernoulli random variables.
(5) 
for some and
Note that an scaling with also holds for regret with discounted cost for any discount factor.
4 Stability of the Rule for Parallel Server Systems.
As for the single server system, we are interested in upper bounds on regret for the parallel server system. In the proof of Section 3, we crucially used the property of identically distributed busy cycles over work conserving policies. Note that, in this case, the stability region of rule (or any work conserving policy) is the entire capacity region, and the busy cycles have exponentially decaying tails for any arrival rate in this region.
In this section, we show for the parallel server system that the rule (which is based on linear costs) does not necessarily ensure stability for all arrival rates in the capacity region. In particular, it is not throughput optimal for a general parallel server system. In Subsection 4.2, we characterize a subset of the stability region of rule for which the busy cycles have exponentially decaying tails.
4.1 Instability of the rule in the general case.
As defined in Algorithm 1, the rule allocates server to a job in the queue that maximizes . We show that such a static priority policy, which prioritizes queues irrespective of their queuelengths (other than their being nonempty) could be detrimental to the stability of the system. For e.g., in any system with , the rule prioritizes over for allocation of both the servers, which results in service allocation to only when there are less than jobs in . It is intuitively clear that such a policy is not stabilizing if the arrival rate of is larger than the service rate that this policy can allocate to . We formalize this in the theorem below, where we characterize the set of arrival rates outside the stability region of the rule for a class of systems.
For any system with service rates , costs , and arrival rates satisfying
(6) 
and
(7) 
where
is the stationary distribution of the Markov chain
, there exist positive constants depending on such thatIt is easy to construct an example of a system with parameters satisfying Eqs. 7 and 6 and with . This shows that for systems, the stability region of the rule is, in general, a strict subset of the capacity region. Below, we give such an example: Pick any such that , and . For this choice of , let be the stationary distribution of when served by both servers. Now pick such that . Next, choose such that , and . Clearly, , since and . Thus, since the system parameters satisfy Eqs. 7 and 6, it follows by Subsection 4.1 that .
The criterion for instability in Subsection 4.1 is rather sharp, and this is evidenced by the following result.
Any system with service rates and costs satisfying (6) is stable under the rule if and only if
(8) 
where is the stationary distribution of the Markov chain . In addition, (8) implies that the queueing process is geometrically ergodic under the rule. In particular, there exists a function , such that for some positive constants and , and constants , and a finite set , such that , where denotes the transition kernel of the chain . It is well known that this implies that there exist constants and such that , the first hitting time of the state , satisfies
(9) 
The proofs of Subsections 4.1 and 4.1 can be found in Appendix 7.
4.2 Sufficient conditions for geometric ergodicity of the system.
We now obtain sufficient conditions for the busy cycles to have exponentially decaying tails in terms of the parameters . This condition, in particular, implies that the queuelength process is geometrically ergodic.
Let for . For any , let denote the total service rate assigned by the rule to queue when the queuestate is . If
(10) 
for some , where is the probability simplex in , then we can construct an appropriate Lyapunov function for which the onestep drift given by the algorithm is negative outside a finite set. This enables us to show the following tail probability bound for the busy period of the system.
Let denote the first hitting time of the state under the rule. If Condition (10) is true for some , then there exist constants , , such that, for any ,
(11) 
Details of the proof of this lemma are given in Appendix 8.
Below, we explicitly derive sufficient conditions given by (10) for a couple of examples. Further, for the case of the Nnetwork in Subsection 4.2 (which is a special case of the network in Subsection 4.1), we compare it with the stability region.
Consider the example where queue has priority over queue for all servers. Without loss of generality, let , and let be the stationary distribution of (note that, in every timeslot, service offered to is independent of the current queuelength of ). Then, the stability region is given by
We now obtain a subset of the region (10) by choosing specific values of . For , (10) is satisfied if
To see this, note that for any , we have
Therefore,
which shows that the region given by (10) contains .
Consider the Nnetwork, i.e., a system with , and let the first queue have higher priority according to the rule, i.e., . This is a special case of the system in Subsection 4.1. Let be the stationary distribution of . A closed form expression for can be found in Appendix 11. Thus, for this system, we can determine the stability region analytically through (8). Moreover, as seen in Subsection 4.1, we have geometric ergodicity in all of the stability region . Below, we compare the region given by (10) with .
Case 1:
– Server is allocated to Queue when it has only a single job in its queue. In this case, as discussed above, the stability region is given by
whereas, Condition (10) is equivalent to
This is the stability region of a system where the server has rates to the first queue and to the second queue.
Case 2:
– Server is allocated to Queue when it has only a single job in its queue. In this case, the stability region is given by
whereas, Condition (10) is equivalent to
In this example, while Condition (10) does not cover the entire stability region, the region it covers is “close” to the stability region in some limiting regimes. For example, in Case 1, when , we can show that
Similarly, in Case 2, when , , and , we can show that
5 Learning the Rule—Parallel Server System.
5.1 The algorithm.
We now propose a learning extension of the rule for the parallel server system. Recall that the number of samples for a link in any timeslot is the number of times it has been scheduled before that timeslot. For the single server system, a sufficient number of samples can be ensured without explicit exploration due to the stabilizing property of workconserving policies, all of which have the same busy periods. However, this property does not hold in general for the parallel server system, and thus, a straightforward extension of the rule based on empirical means without explicit exploration may not obtain enough samples to learn the system. The following example shows how a naive extension of the rule could fail to stabilize a network. Consider a network with service rates , costs , and arrival rates satisfying
Clearly, this network is stable under the rule. We show that, under the policy that does not explore and schedules according to the empirical estimates of the service rates, the queues have linear growth with positive probability. For any , let be the empirical estimate of with samples. Let be the event that and . Conditioned on the event (which has a positive probability), the algorithm schedules only links and after obtaining the initial samples. Using Hoeffding’s inequality, we can derive concentrations for the total number of arrivals to each of the queues and the total service offered by links and to show that there exist constants , such that .
As a solution to the above problem, we propose an algorithm that dynamically decides to explore if the number of samples falls below a threshold. We refer to this as the algorithm for parallel server networks, and define it in Algorithm 2 below.
5.1.1 Dynamic explore—conditional greedy.
In each timeslot, the algorithm explores conditionally based on the number of samples, i.e., uses an greedy policy if the minimum number of samples over all links is below some threshold. More specifically, let:

be a collection of assignments such that their union covers the complete bipartite graph;

be the number of samples of link at time ;

;

;

be the estimated rate matrix at time .
At time , if , the algorithm decides to explore with probability , otherwise it follows the rule using the estimated rate matrix .
5.2 regret for the algorithm.
In Subsection 5.2, we prove a regret bound that scales as a constant with increasing for a subset of the capacity region. This subset is given by the region in which the algorithm achieves exponentially decaying busy cycles. In the theorem which follows, we show that the queuelength error for the algorithm decays superpolynomially with time if (11) is satisfied. Again, as in the single server system, this translates to an regret. For any such that (11) is satisfied, we have
for any . In particular, there exists a constant independent of such that the regret satisfies . As for the single server system, the main idea in proving Subsection 5.2 is to characterize the coupling time of the queuelengths of the actual and the genie systems. More specifically, we show that the queuelength of the system at time does not exceed that of the genie system with probability . For this, we first show in Section 9 that the algorithm obtains sufficient number of samples due to its conditional explore policy, thus enabling the algorithm to agree with the rule in its exploit phase after time . In turn, this ensures exponentially decaying tails for the busy cycles after time according to Subsection 4.2.
This concentration for the busy cycles can be used to further show that the following two ‘events’ occur with polynomially high probability:

That the algorithm does not need to explore in the latter half of (Section 9). This can be explained as follows: whenever the system hits the zero state, there is a positive probability that only selective queues are nonempty in the subsequent timeslots. Therefore, for any workconserving algorithm, every link has a positive probability of being scheduled at the beginning of a new busy cycle. If the algorithm stabilizes the system well enough to ensure that it hits the zero state regularly, then it obtains a sufficient number of samples without explicit exploration. We use the busy cycle tail bound in Subsection 4.2 to show that the system hits the zero state often enough to give at least samples in the first half of (The constant depends on the system parameters). This ensures that the algorithm does not need to explore in the latter half of .

That the system hits the zero state at least once in the latter half of (Section 9). This can be verified using the busy cycle concentration in Subsection 4.2.
Next, we show (in Section 9) the following monotonicity property for the rule: if two systems with identical parameters, and initial queuestates satisfying elementwise follow the algorithm, then the same ordering of their respective queuestates is maintained in subsequent timeslots, i.e., for all .
To summarize the argument, we have with polynomially high probability that (i) the algorithm agrees with the rule while exploiting after time (Section 9), (ii) it only exploits in the latter half of and does not explore (Section 9), and (iii) the system reaches the zero state (which is smaller than any state that the genie system could be in) at least once in the latter half of (Section 9). Thus, the monotonicity property (in Section 9 in Appendix 9) shows that the system always maintains a queuelength not exceeding that of the genie system after it first hits the zero state in the latter half of . Effectively, at time , the regret is positive only with probability which gives us the required decay of expected queuelength error in Subsection 5.2.
The proofs of Subsection 5.2 and Sections 9, 9 and 9 are given in detail in Appendices 9 and 10, respectively.
The degradation of convergence rate of the queuelength error from exponential in a single server system to superpolynomial in a parallel server system can be explained by the addition of explicit exploration in the algorithm for the latter. In this situation, we can only show that the algorithm needs to explore with a probability that vanishes at a polynomial rate. However, for exponential convergence, one needs to establish that the algorithm deviates from the rule with a probability that vanishes at an exponential rate. Designing algorithms with the best achievable convergence rates is an area of future work.
5.3 Extension to other genie policies.
We now discuss the scope of generalizing the results in this paper to scheduling policies other than the algorithm. Consider the bipartite graph with queues and servers as the nodes and the links between them as the edges. We define a static priority rule as a scheduling policy which allocates servers to nonempty queues according to a given priority order for the links. For example, the rule is a static priority rule where the priority order of the links is given by the descending order of their weights . Now, consider genie algorithms that are based on static priority rules, i.e., in every timeslot, the same priority order is used to assign servers to nonempty queues. If the algorithm is replaced by any static priority genie algorithm, the same proof technique given above can be applied if the monotonicity property in Subsections 10.1 and 9 holds for the corresponding static priority rule. This monotonicity property can be proved for any rule with queue priority, i.e., a static priority rule where queues have a specified order of priority and, for each queue, the links are ordered according to their service rates to that queue. Therefore, the regret bound in Subsection 5.2 also holds for algorithms where the exploit rule in Algorithm 2 is replaced by rules with queue priority.
Moreover, Subsection 4.2 holds for any static priority algorithm, whereas the region of arrival rates given by Condition (10) depends on the priority rule for a general parallel server system. In Appendix 6, we show that exponential tail bounds for busy cycles hold within the entire stability region for a special class of policies that we refer to as hierarchical rules. Thus, for a hierarchical rule that satisfies the monotonicity property, we can show regret for Algorithm 2 (with the rule replaced by the hierarchical rule) within the entire stability region.
6 Stability of hierarchical rules in parallelserver networks.
In this section we extend the results of Subsections 4.1 and 4.1, and show a special class of rules for which geometric ergodicity holds in the entire stability region.
Consider a queueing network with classes of customers and servers. The queues are labeled as and the servers as . Set and . Each queue can be served by a subset of servers, and each server can serve a subset of queues. For each , let be the subset of servers that can serve queue , and for each , let be the subset of queues that can be served by server . For each and , if queue can be served by server , we denote as an edge in the bipartite graph formed by the nodes in and ; otherwise, we denote . Let be the collection of all these edges. Let be the bipartite graph formed by the nodes (vertices) and the edges . We assume that is connected.
A static priority rule can be identified with a permutation of the edges of the graph , i.e., one–to–one map defined by the priority rule – if edge has higher priority than edge . [Hierarchical Rule] For a static priority rule and for any and with , we say that if for all . A static priority rule is hierarchical if defines a partial order on , and for any and with , either or .
It is easy to see that if is a tree, then every static priority rule is hierarchical.
In the rest of this section, we study only hierarchical rules.
6.1 Hierarchical decomposition.
Consider a queueing network with graph , parameters and , with , under a hierarchical static priority rule . We let , , , , , and denote by the minimal elements of under . The dependence on the arrival rates is suppressed in this notation, since at each step of the decomposition the arrival rates match the original ones , while the service rates are modified.
Consider the subgraph with queue nodes and server nodes . Since consists of minimal elements, it follows that if and . Hence each queue , forms a Markov process, which is geometrically ergodic, since . Let , , denote the stationary distribution of .
Next, we remove the nodes and associated edges from , and denote the resulting graph, which might not be connected, by . We let denote the minimal elements of under . Removing these nodes and and associated edges from , we obtain a graph , and so on by induction. We let denote the largest integer such that .
Let , , and let denote the queueing process restricted to . It is clear that this is Markov. Provided that it is positive recurrent, we let denote its invariant probability measure.
6.2 The structure of the transition kernels.
Let for some . It is clear that the transition kernel of depends on , and thus takes the form . Due to the hierarchical rule, a server may not be available to queue if the queues have sufficient size. It is evident then that the transition kernel of has the following structure. There exists a finite partition of and associated transition kernels , with each corresponding to a queue with arrival rate and served by a subset of the servers , such that
(12) 
We illustrate this via the following example. Consider the ‘W’ network in Figure 1.
It is clear that
(13) 
where we use the notation to denote the transition kernel of a singlequeue, singleserver system with parameters and . Continuing, we also have
(14) 
with , and . Here corresponds to a transient process, with arrivals but no service.
Next we discuss the ergodic properties of the ‘W’ network in Figure 1. Suppose that the arrival rates lie in the capacity region. Then of course and is a geometrically ergodic Markov chain with stationary distribution . It follows by Eq. 13 and the proof of Subsections 4.1 and 4.1 that if
(15) 
then the chain is geometrically ergodic, and if the opposite inequality holds in Eq. 15, then it is transient. Continuing, assume Eq. 15, and let denote the stationary distribution of . Applying the same reasoning to Eq. 14, it follows that if
(16) 
then is geometrically ergodic, otherwise it is not. Thus, combining the above discussion with Subsection 4.1, it is clear that the queueing process is geometrically ergodic if and only if Eqs. 16 and 15 hold.
6.3 The averaged kernel.
Recall the notation introduced in Subsection 6.3. Suppose that the queueing process is geometrically ergodic, and as introduced earlier, let denote its invariant probability measure. We define the averaged kernel of Eq. 12 by
Recall that each kernel corresponds to a singlequeue system with arrival rate , and service rates for a subset of the original server nodes ( might be empty). For each define
(17) 
It is clear that
Comments
There are no comments yet.