Stability and Learning in Strategic Queuing Systems

by   Jason Gaitonde, et al.
cornell university

Bounding the price of anarchy, which quantifies the damage to social welfare due to selfish behavior of the participants, has been an important area of research. In this paper, we study this phenomenon in the context of a game modeling queuing systems: routers compete for servers, where packets that do not get service will be resent at future rounds, resulting in a system where the number of packets at each round depends on the success of the routers in the previous rounds. We model this as an (infinitely) repeated game, where the system holds a state (number of packets held by each queue) that arises from the results of the previous round. We assume that routers satisfy the no-regret condition, e.g. they use learning strategies to identify the server where their packets get the best service. Classical work on repeated games makes the strong assumption that the subsequent rounds of the repeated games are independent (beyond the influence on learning from past history). The carryover effect caused by packets remaining in this system makes learning in our context result in a highly dependent random process. We analyze this random process and find that if the capacity of the servers is high enough to allow a centralized and knowledgeable scheduler to get all packets served even with double the packet arrival rate, and queues use no-regret learning algorithms, then the expected number of packets in the queues will remain bounded throughout time, assuming older packets have priority. This paper is the first to study the effect of selfish learning in a queuing system, where the learners compete for resources, but rounds are not all independent: the number of packets to be routed at each round depends on the success of the routers in the previous rounds.



There are no comments yet.


page 1

page 2

page 3

page 4


Virtues of Patience in Strategic Queuing Systems

We consider the problem of selfish agents in discrete-time queuing syste...

Decentralized Learning in Online Queuing Systems

Motivated by packet routing in computer networks, online queuing systems...

New mechanism for repeated posted price auction with a strategic buyer without discounting

The repeated posted price auction is a game with two players: the seller...

Traffic Optimization for TCP-based Massive Multiplayer Online Games

This paper studies the use of a traffic optimization technique named TCM...

Learning Optimal Reserve Price against Non-myopic Bidders

We consider the problem of learning optimal reserve price in repeated au...

Scaling TCP's Congestion Window for Small Round Trip Times

This memo explains that deploying active queue management (AQM) to count...

Age of Information in a Decentralized Network of Parallel Queues with Routing and Packets Losses

The paper deals with Age of Information in a network of multiple sources...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

In this paper, we consider how to guarantee the efficiency of stochastic queuing systems when routers use simple learning strategies to choose the servers they use, and repeatedly resend their packets until the packet gets served. We show conditions that guarantee the stability of such systems despite the competition of queues and the carryover effects between rounds caused by resending packets.

Understanding how to design complex systems that remain efficient even when used by selfish agents is an important goal of algorithmic game theory. The

price of anarchy [11] measures this inefficiency by comparing the welfare of the Nash equilibrium of the game to the socially optimal solution without considering incentives. This notion has lead to a long line of literature bounding this loss in various games. Our results are analogous in spirit to those of [17], which shows that in the context of routing in networks with delay, the cost of any Nash equilibrium outcome is no more expensive than the centrally designed optimum that carries twice as much flow.

We model the behavior of queues as learners, assuming that their choices of where to send packets satisfies the no-regret guarantee. This guarantee can be ensured by running any of a large set of learning algorithms [19]. Studying learning behavior in games has a long history, dating back to the early work of Robinson [16], see also [9]. Work in the last two decades has extended the Nash equilibrium quality analysis (price of anarchy) to learning outcomes [3, 18, 21]. If all players employ a no-regret learning strategy, then the play converges to a form of correlated equilibrium of the game [10] (players correlating their play by each of them using the history of play to decide their next action), and the price of anarchy analysis extends also to the correlated play. A serious limitation of the model of repeated games considered in these works is the assumption that the games played at different times are independent, in the sense that the outcome of the game at time has no direct effect at time except through the learning of the agents. While this can be a good approximation in some games, there are many applications where this clearly fails. In the context of routing games modeling the morning rush-hour traffic, it is indeed the case that no matter how bad Monday morning traffic was, the traffic jam clears up before Tuesday morning. In contrast, when modeling traffic on a smaller time scale, if one car or packet experiences delay, it remains in the system longer, or may be re-sent later, and hence affects later time periods. In ad-auctions, the remaining budget of the player has a similar carryover effect, as winning the auction in one round decreases the player’s remaining budget in future rounds.

Here, we will consider a queuing system with queues sending packets to servers as a simplified model of a network of queues. In [12], the authors study the performance of a learning algorithm in the same queuing system finding the best server(s) with respect to queue-regret, which measures the expected difference in queue sizes to that of a genie strategy that knows the optimal server. Their primary goal is to study this more refined notion for the queuing setting for this classical stochastic bandit problem, which exhibits more complicated behavior than standard no-regret bounds that grow at least logarithmically with time. Our paper studies a decentralized, multi-queue version of the same system, where each queue uses their own learning algorithm to clear their packets while selfishly competing with each other for service. Our primary focus is on establishing precisely when standard no-regret algorithms can ensure that the competitive queuing system remains stable even in this game-theoretic setting, a concern that does not arise in the learning problem with centralized scheduling.

Our Results. Our main result concerns a multi-agent version of the queuing system of [12], where the queues each use their own no-regret learning algorithm to find and compete for the best servers. We show that if the service rates of the servers is high enough to allow a centralized scheduler to get all packets served even with double the arrival rate and when older packets have priority over younger packets, then the expected length of all queues will remain bounded over all time, assuming the learning algorithms used satisfy the no-regret assumption. Studying the outcome of learning in such systems with carryover effect requires us to study these interactions not just as a repeated game, but as a highly dependent random process.

In this model, a server can serve at most one packet at any time, and packets remaining in the system are queued at the input side. At each time step, a server with service rate

will select one of the packets sent to it (if any), serve it with probability

, and return all unserviced packets to their queues. In other queuing systems, the servers may also have a bounded size queue and would only send back (or drop) packets when they no longer fit on the queue; our simpler model without server queues makes the trade-offs we want to study cleaner. A packet sent to a server is either served or returned and offers instantaneous feedback to the learning algorithms of the queues, in contrast to the bit more informative, but delayed feedback available in real systems.

An important feature of the model is how conflicts are resolved when multiple queues send to the same server in a time period. We show that if the servers select a packet uniformly at random among the arriving packets, then unless the success rates of the servers are prohibitively larger than the arrival rates of the queues (by a factor that grows with the number of queues), learning does not necessarily ensure that all packets will get served in a timely manner: in systems with many queues with low arrival rate, when these queues are selfishly aiming to get good service, the number of unserved packets at queues with high arrival rate may grow linearly over time. Our main result is to show that if packets also carry a timestamp, and servers choose to serve the oldest arriving packet, then this linear blowup cannot arise: if the system has enough capacity to serve all packets when they are centrally coordinated, even with just double the arrival rate, then no-regret learning of the queues guarantees that all packets get served and queue lengths stay bounded in expectation. We also show that this bound of 2 on the required service rate is tight, in that with less than a factor of 2 higher service rate, no-regret learning does not necessarily guarantee the timely service of all packets.

Our Techniques. The carryover effect between rounds caused by packets left in the system, forces us to study these interactions not just as a repeated game, but as a highly dependent random process. Moreover, the randomness arises intrinsically from both the randomized strategic behaviors of the queues and the inherent randomness in the queuing system. To establish the result, we combine game-theoretic properties implied by the no-regret assumption with techniques from random processes to establish the high-probability results.

In analysing the behavior of the queuing system, we have to deal with highly dependent processes. If a queue receives too many packets during a previous period, this has a major effect not only on the outcomes for this queue, but for every other queue it may be competing with. To make the study of these random process more manageable, we use the principle of deferred decisions: rather than considering the state of the queue sizes, each with possibly many packets, we keep track only of the timestamp of the oldest packet in each queue and defer seeing when the next packet arrived until after this one is served. In doing so, the timestamp of the next packets to be cleared and the service successes of the servers are all independent of the current time period, and hence we can use standard concentration bounds.

To prove the bounds on the queue sizes, we use a potential function based on the oldest time stamp in each queue. The main idea of the proof is to argue that when this function has a high enough value, than it must have negative drift. To conclude that the queues remain bounded, we use a powerful theorem of Pemantle and Rosenthal [15]

showing that a sufficiently regular stochastic process with negative drift must have moments uniformly bounded over time. Once we obtain this property for our random process, we then use standard probabilistic techniques to obtain an evidently weaker, but perhaps more interesting, asymptotic control on the almost sure growth of the queues in these queuing systems. We hope that the kinds of qualitative features we establish and the methods of proof for these results will be of interest in the future study of repeated strategic interactions that similarly relax the independence assumptions of the games played at each round.

Further Related Work. As already explained above, the model we study combined features of learning in games with classical queuing systems; both of these areas have large bodies of literature. The classical focus of work on scheduling in queuing systems is to identify policies that achieve optimal throughput (see for example the textbook of [20]). Closest to our model from this literature is the work of [12], which characterizes the queue-regret of learning algorithms that only seek to identify the best servers, but does not consider competition between selfish learners. They characterize queue-regret for the case of a single queue aiming to find the best server, and extend the result to the case of multiple queues scheduled by a single coordinated scheduling algorithm, assuming there is a perfect matching between queues and optimal servers that can serve them. In contrast, we assume that each queue separately learns to selfishly make sure its own packets are served at a high enough rate, offering a game-theoretic model of scheduling packets in a queuing system, and do not make the matching assumption on queues and servers. Compared to classical price of anarchy bounds in repeated games [3, 18, 21], we no longer make the assumption that games at different rounds are independent. Studying this model requires us to combine ideas from the price of anarchy analysis of games with understanding the behavior of stochastic systems.

Our work is one of the first examples of studying the effect of learning in games with carryover effects between rounds. Studying such systems requires understanding a highly dependent random process. Among the large body of literature of such processes, closest to our work is the adversarial queuing systems of [5], who also use the Pemantle and Rosenthal [15] theorem to establish bounded queue sizes in expectation. Another important repeated game setting with such carryover effect is the repeated ad-auction game with limited budgets. The papers of [4, 7, 6] consider such games and offers results on convergence to equilibrium as well as understanding equilibria in the first-price auction settings under a particular behavioral model of the agents. Analyzing such systems for the more commonly used second-price auction system is an important open problem.

2 Preliminaries


In general, random variables will be denoted by capital letters (i.e.

, while vectors will generally be bolded (i.e.

, etc). If a random variable has some distribution , we write . We use the notation

to denote a geometric distribution with parameter


for a Bernoulli distribution that is

with probability and otherwise, and

for a binomial distribution with parameters

and .

We say an event occurs almost surely if it has probability . We use standard and notation, where indicates logarithmic factors are hidden; we will sometimes write if . We will also consider the following norms: for a positive vector , with , we define the following two weighted norms on : and It is easily seen that for any , (where the constants depend on ) via Cauchy-Schwarz, see Lemma 6.1.

Standard Queuing Model. We consider the following discrete-time queuing system illustrated by the figure below, which is a decentralized, competitive version of the model considered by Krishnasamy, et al [12]: there is a system of queues and servers. During each discrete time step , the following occurs:

  1. Each queue receives a new packet with a fixed, time-independent probability . We model this via an independent random variable . This packet has a timestamp that indicates that it was generated in the current time period. We label queues such that , writing for the vector of arrival rates.

  2. Each queue that currently has an uncompleted packet chooses one server to send their oldest unprocessed packet (in terms of timestamp) to.

  3. Each server that receives a packet does the following: first, it only considers the packet it receives with the oldest timestamp (breaking ties arbitrarily). It then processes this packet with a fixed, time-independent probability . We again label servers so that , writing for the vector of service rates.

  4. All unprocessed packets, possibly including the packets that were selected if the corresponding server failed to process it, are then sent back to their respective queues still uncompleted. Queues receive bandit feedback on whether their packet cleared at their chosen server.

Figure 1: Here, three queues compete for two servers. Unserviced packets in each round return to their queue.

We write for the number of unprocessed packets of queue at the beginning of time (before sampling new packets) and for the vector of queue sizes at time . Define as the total number of unprocessed packets in the system at time . Formally, if is the indicator event that queue clears a packet at time and is again the indicator queue received a new packet at time , then we have the recurrence as random variables with and


where we note that is necessarily if (i.e. queue had no packets and didn’t receive a new one in the round, so does not send a packet this time period). This ensures each is integral and nonnegative. We call the above random process the standard model. We will be interested in the stability of this system in the following sense:

Definition 2.1.

The above system is strongly stable under some given dynamics if, for any fixed , the random process satisfies for some absolute constant that does not depend on .

We say that it is almost surely stable if for any , almost surely

That is, the growth of is almost surely subpolynomial.

It is not immediately obvious what the relationship is between these stability properties: it turns out that strong stability implies almost sure stability, which we state in Lemma 3.1 and prove in the Appendix.

Our main goal is to understand the stability properties of these queuing systems when queues select servers as no-regret learners. To get a baseline measure for when this may be possible, we must first understand when a queuing system is stable under centralized coordination: it turns out that an obvious necessary condition on and is also sufficient.

Theorem 2.1.

Suppose and have been preprocessed so that a maximal, equal prefix of ’s is deleted from both and both are nonempty and not both identically zero afterwards. Then the above queuing system is strongly stable for some centralized (coordinated) scheduling policy if and only if for all ,


When (2) holds, we say that the queuing system is (centrally) feasible.

An instructive example to keep in mind is a single-queue, single-server system. Of course, there is no learning or competition in such a process. If , it is well-known that follows a biased random walk on the nonnegative integers towards , and moreover is geometrically ergodic.333Namely, this random process mixes to a stationary distribution on with geometrically decreasing tail probabilities. This in particular implies strong stability. On the other hand, if , say , then it is well-known that the corresponding unbiased random walk satisfies . Therefore, there is a sharp threshold for strong stability.

We give the full proof of the Theorem in the Appendix and only sketch it here: for necessity, when (2) strictly fails, it will be easy to see the queueing system is not stable as the expected total number of packets grows linearly. A slight modification is needed using standard submartingale arguments if instead only equality holds somewhere in (2). For sufficiency, we leverage the well-known connection between majorization of vectors with doubly stochastic matrices. By decomposing these matrices into a convex combination of permutation matrices, one obtains a randomized, coordinated matching schedule between queues and servers that ensures at every time step that the probability of clearing a packet strictly exceeds the probability of receiving a new packet. Each queue size thus follows a biased random walk towards , which will ensure stability.

The Need for Packet Priorities. We will be interested in proving statements of the following form:

Given a queuing system that is centrally feasible even when is scaled up by some explicit constant independent of the parameters of the system (in the sense of Theorem 2.1), then a random process where queues are decentralized and strategic under certain conditions remains stable.

Morally, such a result says that though decentralized, strategic queues cannot coordinate and instead compete for service, if they choose servers according to some reasonable learning algorithm, there only needs to be a small multiplicative factor of slack to keep the queuing system stable. In other words, decentralizing a feasible queuing system and introducing a constant factor of slack will result in a stable system even when queues selfishly compete to clear their own packets.

To see the necessity of timestamps, consider instead a simpler model where there are no timestamps and priorities, and instead each server uniformly randomly picks which packet to process among those that are sent to it in each step. It is easy to see that if a queuing system is feasible even if scaled up by , then it will remain a stable queuing system with reasonably strategic queues. Indeed, by this feasibility assumption, , so that . Therefore, if every queue just always sends to the largest server whenever they have a packet, they will succeed in clearing a packet with probability at least , and it is not too difficult to prove that this results in a strongly stable process by comparing to a random walk biased towards the origin.

It is natural to ask if a better factor is attainable in this alternate model, perhaps even a constant. It turns out that in general, a polynomial in is required:

Theorem 2.2.

In this alternate model, for large enough , there exists a centrally feasible queuing system with queues and servers with the following property: the system remains feasible even if is scaled up by and it is possible for all queues to be in a Nash equilibrium444By this we mean that conditioned on the (randomized) strategies of all other queues in a given time step, each queue sends to a server with highest probability of success. at each time step (and in particular, satisfy no-regret properties as in Assumption 3.2), yet the system is not strongly stable.

While we make little effort to optimize the exponent, this shows that in this model, no sub-polynomial factor is possible in general. The basic reason why this can occur is that low arrival rate queues can saturate the high success rate servers, making it impossible for high arrival rate queues to clear fast enough to offset their higher arrival. In the timestamp model, we will be able to establish constant factor results. The key idea is that the priority system, while more difficult to analyze, results in older queues gaining an advantage on young queues causing the young queues to prefer lower quality servers, so that this situation is impossible. That is, this model implicitly forces fast growing queues to get better service, so long as queues are sufficiently adaptive to take advantage of it.

3 No-Regret in Queuing Systems.

Our main result shows that, if the queuing system has enough slack and all queues satisfy an appropriate high-probability no-regret guarantee, then the queuing system is strongly stable. To this end, we make the following feasibility assumption that asserts that a queuing systems with servers scaled down by would remain feasible:

Assumption 3.1 (Feasibility).

There exists such that for all ,

We will usually use to denote the maximum such value that this inequality holds.

This assumption stipulates that if the service rates were halved, then the queuing system would still be centrally feasible. The parameter controls the quality of learning required for our results. To establish stability results, we will use the following assumption on no-regret algorithms:

Definition 3.1.

Fix some window of length , and for convenience reindex time so that the beginning of this window is at . Let be the indicator variable that queue would have succeeded in clearing a packet at server at time (had she sent there), and let be the identity of the server that queue chooses at time . Note that if queue has no packets at time , then . Then the regret of queue on this window, denoted , is defined as


That is, of queue on some fixed window of length is defined to be the (random) difference between the number of packets queue cleared on these periods compared to the backward-looking number of packets she would have cleared had she simply always sent to the best single server, where the comparison is in hindsight to the best single server on the realized sample path, not to a counterfactural sample path where the queue always chose that server. Note that all these random variables are with respect to the same sample path; the will depend on all previous randomizations and choices by the queues, as these implicitly yield the priorities of the queues. We make the following assumption on the regret of queuing strategies:

Assumption 3.2 (Queues satisfy high-probability no-regret).

All queues select servers using a strategy or algorithm satisfying the following no-regret guarantee: given fixed and a fixed window with length , the regret of queue on this given window of consecutive time steps satisfies with probability at least only over their own randomizations during this window, where is some explicit function. Here, hides constant factors depending on and , but not .

Moreover, we require that the choices of the queue depend only on their past bandit feedback and their past history of ages, but not on their history of queue sizes.

For instance, this assumption holds with EXP3.P.1 with the form of the regret scaling like [1]. Note that this high-probability guarantee is possible in our setting even in the priority model where the random variables of success at each server from the perspective of each queue at each time step depend on all previous actions (via timesteps and priorities), as well as the actions of the other queues in the current time period; see for instance the discussion in Section 9 of Auer, et al [1]. This property is standard and necessary in applying learning algorithms to multi-player games. Using the freezing technique of [13] for EXP3.P.1, one can ensure that such a guarantee holds simultaneously for each window of this length, and not only a fixed window, so the players would not have to be aware which window of size is relevant for our analysis. This is true as freezing guarantees that the probabilities associated with all arms remain high enough throughout the algorithm, which allows us to adapt the classical no-regret analysis starting at any time step for the window of the next time steps.

Dual Model: via Deferred Decision. To prove our main learning result, we will use the principle of deferred-decisions to give an alternate description of the standard system given in Section 2. Suppose that in the standard model, each queue chooses which server to send to at time only depending on past feedback and their current oldest timestamp, but not on . In this case, we can equivalently characterize the evolution of this system keeping only the oldest timestamp of a packet at each queue. To do this, instead of randomly generating new packets at each time step according to a Bernoulli process, each queue only maintains the timestamp of their current oldest unprocessed packet. Once this packet is successfully cleared, the new current oldest unprocessed packet has timestamp generated by sampling a geometric random variable with parameter and adding this to the timestamp of the just-completed packet. If this number exceeds to current time step , this corresponds to having processed all packets that arrived before the current time step, and receiving the next packet in the future.

We will call this random process the dual process. Because the gap between successes in repeated independent trials is given by a random variable, the standard and dual processes can be completely coupled, as described below. Concretely, when the queues use strategies with the above property, the dual process can be described using the following notation:

  1. Time progresses in discrete steps .

  2. At each time , is the timestamp of oldest unprocessed packet of queue at time .

  3. is the age of the current oldest packet of queue in relation to the current time step . That is, measures how old the current oldest unprocessed packet for queue is. We say is the age of queue at time .555Note that while by definition, it is possible that . The interpretation of this is that the queue has cleared all of her packets at time and will receive her next one at time , or equivalently, in steps in the future from the perspective at time .

  4. Queue can send a packet in this time step if . If queue successfully clears her packet, set , where is independent of all past events, and otherwise does not change.

In general, we will write for the vector of current ages of oldest packets. To see the equivalence, consider any standard queuing system with Bernoulli random variables for packet generation. Then, to get a coupled dual system for the same system, use a sequence with the interpretation that is the size of the th gap between successes in the . When queue clears her th packet, her new oldest timestamp increases by as described above. As such gaps between timestamps in the standard model have distributions, the dual system gives the ages of each queue in the standard system at all times and gives an explicit coupling.

The key feature is that, under the assumption that queues choose servers at time only based on at most the , not on , all choices by queues are the same conditioned on just the current timestamp and past feedback as it is conditioned on all the past information in the standard model (which includes arrivals received after the current oldest packet). That is, if denotes the information available to the queues in the standard model at time , and for the dual model, then all choices by the queues at time are the same conditioned on either history. The point of doing so is that will be independent of until the queue clears her th packet (namely, the timestamp of queue ’s th packet is not known until the time queue clears her th packet).

In the dual system, we define stability in the same way as before:

Definition 3.2.

The dual system is said to be strongly stable if, for any fixed , where is a fixed constant depending only on , not on .

The dual system is almost surely stable if, for any , almost surely

Because heuristically

, it is intuitive that our notions of strong stability are equivalent whenever both systems correspond to the same random process. Indeed, this is the case. Moreover, strong stability implies almost sure stability. The basic idea is to use Markov’s inequality and the Borel-Cantelli lemma along an appropriately chosen subsequence of times. We defer this equivalence and implication to the Appendix:

Lemma 3.1.

If the standard and dual models characterize the same queuing dynamics, then strong stability in the standard system is equivalent to strong stability in the dual system.

Moreover, if this holds, then strong stability in either system implies almost sure stability.

As these are completely coupled processes, with same stability properties, it is natural to wonder what we have gained from focusing on the ages of queues rather than their overall sizes. We discuss this further in Remark 3.1.

Stability of No-Regret Queuing Systems

Our main result is the following theorem which asserts that if all queues are no-regret with high-probability over sufficiently large windows, then the queuing system is strongly stable.

Theorem 3.1.

Suppose that Assumption 3.1 holds for the dual queuing system with parameter . Set the following parameters: and for . Let be large enough so that the following holds666Note that this is possible as for any fixed , as well as the exponential decay of the bounds in (27) and (28) in .:


and over steps of our process the sum of the geometric variables of subsequent packet arrivals, and the sum of the Bernoulli server successes concentrate around their expectation with an error probability of at most with the above values of , and . (See the required inequalities at (27) and (28).)

Then, if each queue satisfies Assumption 3.2 on each consecutive time interval of length with probability at least , then the random process under these dynamics is strongly stable.

The technical tool we use to establish the stability of our system in Theorem 3.1 is the following result of Pemantle and Rosenthal:

Theorem 3.2 (Theorem 1 in [15]).

Let be a sequence of nonnegative random variables with the property that

  1. There exists constants such that if , then

    where the -algebra includes the history until period and .

  2. There exists and a constant such that for any history,

Then, for any , there exists an absolute constant not depending on such that for all .

To apply this theorem, we must define an appropriate potential function of queue ages that satisfies the negative drift and bounded moments condition. We define for the following potential functions that will feature prominently in the proof:


This potential function will be useful for the analysis because it isolates the contribution of clearing old packets at each age simultaneously. We now turn to the proof of Theorem 3.1.

Proof. To apply Theorem 3.2, we define the stochastic process by

That is, is the “snapshot” of the potential function when evaluated on that occurs every steps. The filtration is given by , where is the corresponding information of the dual system at time available to the queues.

Summary of the Main Ideas: Before we go though the detailed proof, we offer an outline of the main ideas. To establish the negative drift, we will focus on the -long interval between two and . In this -long window, we use the no-regret condition, as well a concentration bounds on behavior of queues and servers. The main idea of the proof is to consider all queues that have remained old throughout the period. A server either clears many such old packets, or many times during this period no old packet is sent to it. In the second case, we can use the no-regret condition for any queue that still has very old packets, as they would have priority at the server, so these bounds in tandem will imply that many old packets must have cleared. To aid the analysis, we also lower bound the total decrease in ages from clearing packets on this window before accounting for the extra steps of aging, only accounting for this at the end; this allows us to consider the clearing process and aging from time passage separately. Finally, when concentration or the no-regret condition fails, we can trivially upper bound what this contributes to the expected drift and this will be subsumed by the low probability that this occurs in the overall expectation.

To establish the bounded moment condition, it is important to consider the dual process, as Theorem 3.2 also requires that the change cannot be too large for any history. See Remark 3.1 for more details.

Organizing Randomness. Let us first set up how we model the actual queuing process on each consecutive window of steps between and for the probabilistic analysis. In the spirit of “organizing randomness,” at step of this process (step of the actual queuing process), sample up front an independent geometric ensemble with

as well as an independent Bernoulli ensemble with

The interpretation is that the are random indicators if the th server is able to clear a packet, regardless of whether a packet is sent there, at the th step of this block of steps. The have the interpretation that, when queue clears her th packet on this window, her age decreases by (without accounting for the aging from passage of time). Crucially, as queue clears packets on this window of steps, her age decreases by a sum of a prefix of (before accounting for aging as time passes). Observe that this independence arises precisely because of the independence of the geometric ensemble of timestamp differences from the filtration of the dual system that only conditions on past feedback and the realized past sequence of oldest timestamps.

Now, observe that by our choices of parameters and , we have that with probability at least that all of the following “good” events hold on this window:


This simply follows from our assumption that was chosen large enough so that the first two lines hold with probability at least via Corollary 6.2 and Lemma 6.6 in the Appendix, the fact that the no-regret bound held with probability at least for each queue, and taking a union bound. Notice that (7) asserts that every prefix of each of the geometric ensembles is additively not too far from the expectation, relative to .

Threshold Value for . We will show that under the threshold assumption that


then the drift condition holds. We will later use the following simple claim:

Claim 3.1.

Under this assumption, both of the following statements hold:

  1. There exists some such that .

  2. .

Proof. (10) immediately implies by definition of and that

From Lemma 6.1, this implies that

from which both parts follow, the first from averaging. ∎

Using no-regret to bound the number of old packets cleared. Continuing with the proof, we first analyze what happens on the “good” event of (7,8,9). Let be the age of the oldest unprocessed packet of queue at the end of this window of steps, measured with respect to the beginning of the window without accounting for the steps of aging. If queue cleared all her packets that were received before the beginning of this window, then we say . Let be the set of queues that at the end of the steps still have packets that are at least -old with respect to the beginning of the window. Let be an indicator variable that some packet that was at least -old with respect to the beginning of the considered interval was sent to server at the th step in this window. As such queues in evidently have packets that are at least -old throughout this interval, priority and the regret bound (9) implies that the number of packets cleared by any such queue is at least, for any server


This is simply because a queue that is always at least -old throughout the interval would succeed on any server that is successful on a time step (as indicated by ) where no -old packets were sent there.

Let be the number of packets that were at least -old with respect to the beginning of the interval that were cleared in the interval and the number of such packets cleared by queue . Then we clearly have


As every packet processed by queues in contribute to , by instantiating (11) for each queue in with each of the top servers and summing, we also obtain


Multiplying (12) by and summing with the previous equation, we obtain


where the last inequality uses (8).

Bounding the expected drift in assuming the “good event”. Observe that from the construction of , when queue manages to process a packet that is at least -old, decreases either by for some if the new age remains above , or the term vanishes in which case may decrease by less. Crucially, this latter possibility can only happen at most once. Again, write for the number of packets that queue clears during this interval that are at least -old. Then as the decrease in from , denoted , is

(by (7))
(by definition of and )
(by (4) and definition of )

Summing over all , the decrease in before considering aging is at least

where is the ’th largest of the .

Effect of Aging. We now account for the increase due to aging by over the course of this interval. The increase in from this is upper bounded by

Note that this is only exact for those that are nonzero, while is an upper bound for those that are zero. Combining these potential changes, we see that the potential decrease is at least

as and using Assumption 3.1 with the fact that the product of two nonnegative sequences is maximal when both are in the same sorted order (see Lemma 6.9).

Relating and . We now need the following claim that relates the and :

Claim 3.2.

If there exists an such that under the good event assumptions, then

Proof. By (7), we must have

even if queue clears a packet every step in the window. Observe that if , then

and so

We also have

In particular, if there exists some such that , then

It follows that if this holds, then

as claimed. ∎

By Claim 3.1, the precondition of Claim 3.2 holds for our threshold value, so the decrease in on this good event is at least

where the inequality is also Claim 3.1. Translating this into the decrease in , Fact 6.2 implies that the contribution towards the expected decrease on this event, which occurs with probability at least , is at least


Considering “bad” events. We now analyze the bad event where any of these assumptions fails: the worst case is that all queues clear no packets, and so each increases by on the next steps. The increase in is thus at most

where the last inequality is again Claim 3.1. Translating to squareroots again, on this bad event which occurs with probability at most , the contribution of increase to the expected change in is at most


by Fact 6.1. Summing (17) and (18), it follows decreases in expectation by at least

where the last inequality is Lemma 6.1. This proves that the drift condition holds for this stochastic process with the threshold given above.

Bounded th Moments: The last thing to check to apply Theorem 3.2 is show that the increments have conditionally bounded th moments for each even integer to obtain boundedness of our sequence in for all . But this is relatively straightforward: by the Triangle Inequality, it is easy to see that as random variables, the change in is at most

Then the change in is again at most

as random variables. We treat two different cases separately:

  1. Suppose there does not exist such that . Then the change in is at most

    From Fact 6.1, this means the change in is upper bounded as random variables by

    Raising this to the power, expanding, and taking expectations, this term is at most for some constant depending only on and by Lemma 6.7.

  2. Suppose there does exist such that . We claim this implies that for all

    First, note that for any , implies


    as can be confirmed from basic algebra. As by feasibility (as ), our assumption implies , and so

    To prove the claim, we split into more cases: if , the claim holds using the last inequality in the denominator. Otherwise, we must have , in which case by (19),

    Thus, in this case, we have

    By Fact 6.1, this is an upper bound as random variables of the change in , so taking powers, expanding, and taking expectations, we get an upper bound of by Lemma 6.7 for some constant depending only on .

Therefore, Theorem 3.2 applies to the random process , and we conclude that for each , there exists some absolute constant such that for all

In particular, this means that for each ,

To extend this to all not necessarily of this form, it is clear that deterministically,

up to additive and multiplicative constants as in Lemma 6.1, from which we can conclude

for some other constant for each . Note that for