# Network Utility Maximization in Adversarial Environments

Stochastic models have been dominant in network optimization theory for over two decades, due to their analytical tractability. However, these models fail to capture non-stationary or even adversarial network dynamics which are of increasing importance for modeling the behavior of networks under malicious attacks or characterizing short-term transient behavior. In this paper, we consider the network utility maximization problem in adversarial network settings. In particular, we focus on the tradeoffs between total queue length and utility regret which measures the difference in network utility between a causal policy and an "oracle" that knows the future within a finite time horizon. Two adversarial network models are developed to characterize the adversary's behavior. We provide lower bounds on the tradeoff between utility regret and queue length under these adversarial models, and analyze the performance of two control policies (i.e., the Drift-plus-Penalty algorithm and the Tracking Algorithm).

• 4 publications
• 41 publications
05/11/2020

### Learning Algorithms for Minimizing Queue Length Regret

We consider a system consisting of a single transmitter/receiver pair an...
07/29/2019

### Bandit Convex Optimization in Non-stationary Environments

Bandit Convex Optimization (BCO) is a fundamental framework for modeling...
10/31/2020

We study the problem of prediction with expert advice with adversarial c...
12/16/2020

### Learning-NUM: Network Utility Maximization with Unknown Utility Functions and Queueing Delay

Network Utility Maximization (NUM) studies the problems of allocating tr...
09/04/2019

### Stochastic Linear Optimization with Adversarial Corruption

We extend the model of stochastic bandits with adversarial corruption (L...
01/18/2019

### A Utility-Driven Multi-Queue Admission Control Solution for Network Slicing

The combination of recent emerging technologies such as network function...
07/20/2013

### Non-stationary Stochastic Optimization

We consider a non-stationary variant of a sequential stochastic optimiza...

## I Introduction

Stochastic network models have been dominant in network optimization theory for over two decades, due to their analytical tractability. For example, it is often assumed in wireless networks that the variation of traffic patterns and the evolution of channel capacity follow some stationary stochastic process, such as the i.i.d. model and the ergodic Markov model. Many important network control policies (e.g., MaxWeight

[1] and Drift-plus-Penalty policy [2]) have been derived to optimize network performance under those stochastic network dynamics.

In this paper, we investigate efficient network control policies that can maximize network utility within a finite time horizon while keeping the total queue length small in an adversarial environment. In particular, we focus on the following optimization problem:

 maxαt∈DωtT−1∑t=0U(αt,ωt)s.t.T−1∑t=0ai(t)≤T−1∑t=0~bi(t), ∀i (1)

where is the network utility gained in slot under the control action (constrained to some action space ) and the network event (which includes information about exogenous arrivals, link capacities, etc). The sequence of network events follows an arbitrary (possibly adversarial) process. The objective is to maximize the total network utility gained within a finite time horizon subject to the constraint that for each queue the total arrivals do not exceed the total departures during the time horizon.

### I-a Main Results

We develop general adversarial network models and propose a new finite-time performance metric, referred to as utility regret (the formal definition is given in Section II-B):

 RπT=T−1∑t=0U(α∗t,ωt)−T−1∑t=0U(απt,ωt),

where is the sequence of control actions taken by a policy , and is the optimal sequence of actions for solving (1) generated by an “oracle” that knows the future. Note that a control policy may trivially maximize the network utility by simply ignoring the constraint in (1) (e.g., admitting all the exogenous traffic) such that the utility regret become zero or even negative111The negative utility regret may occur since any optimal solution is required to satisfy the constraint in (1) while an arbitrary policy may violate this constraint.. However, such an action may significantly violate the constraint in (1) and lead to large queue length. Therefore, there is a tradeoff between the utility regret and the queue length achieved by a control policy, which is similar to the well-known utility-delay tradeoff in traditional stochastic network optimization [4]. In this paper, we investigate this tradeoff in an adversarial environment. The main results are as follows.

We prove that it is impossible to simultaneously achieve both “low” utility regret and “low” queue length if the adversary is unconstrained. In particular, there exist some adversarial network dynamics such that either the utility regret or the total queue length grows at least linearly with the time horizon under any causal control policy. This impossibility result motivates us to study constrained adversarial dynamics.

We develop two adversarial network models where the network dynamics are constrained to some “admissible” set. In particular, we first consider the -constrained adversary model, where under the optimal policy the total arrivals do not exceed the total services within any window of slots. We then propose a more general adversary model called -constrained adversary, where the total queue length generated by the “oracle” during its sample path is upper bounded by . By varying the values of , the proposed -constrained model covers a wide range of adversarial settings: from a strictly constrained adversary to a fully unconstrained adversary.

We develop lower bounds on the tradeoffs between utility regret and queue length under both the -contrained and the -constrained adversary models. It is shown that no causal policy can simultanesouly achieve both sublinear utility regret and sublinear queue length if or grows linearly with . We also analyze the tradeoffs achieved by two control algorithms: the Drift-plus-Penalty algorithm [2] and the Tracking Algorithm [5, 6] under the two adversarial models. In particular, both algorithms simultaneously achieve sublinear utility regret and sublinear queue length whenever or grows sublinearly with , yet the theoretical regret bound under the Tracking Algorithm is better than that under the Drift-plus-Penalty algorithm. The Tracking Algorithm also asymptotically attains the optimal tradeoffs under the -constrained adversary model.

### I-B Related Work

However, the AQT and the LB models assume that only packet injections are adversarial while the underlying network topology and link states remain fixed. Such a static network model does not capture many adversarial environments, such as wireless networks under jamming attacks where the adversary can control the channel states. Andrews and Zhang [5, 6] extended the AQT model to single-hop dynamic wireless networks, where both packets injections and link states are controlled by an adversary, and prove the stability of the MaxWeight algorithm in this context. Jung et al. [11, 12] further extended the results of [5, 6] to multi-hop dynamic networks. Our window-based -constrained model is inspired by and similar to the adversarial models used in [5, 6, 11, 12].

While the above-mentioned works focused on network stability, our work is most related to the universal network utility maximization problem by Neely [13] where network utility needs to be maximized subject to stability constraints under adversarial network dynamics. Algorithm (time-average) performance is measured with respect to a so-called “-slot look-ahead policy”. Such a policy has perfect knowledge about network dynamics over the next slots but it is required that under this policy the total arrivals to each queue should not exceed the total amount of service offered to that queue during every window of slots. As a result, it is similar to our -constrained model where stringent window constraints have to be enforced.

Our paper expands previous work in a number of fundamental ways. First, we develop lower bounds on the tradeoffs between utility regret and queue length under both the -contrained and the -constrained adversary models. As far as we know, none of the existing works (e.g., [11, 12, 5, 6, 13]) provide lower bounds in any kind of adversarial network models. Second, we provide analysis under the new -constrained adversary model which generalizes the adversarial network dynamics models used by existing works. To the best of our knowledge, existing works (e.g., [11, 12, 5, 6, 13]) all use the -constrained adversary model or similar windows-based variants due to its analytical tractability. In this paper, we propose a new -constrained adversary model which gets rid of the window constrains. Due to the lack of window-based structure, the analysis carried out in existing works cannot be applied to the -constrained model. We develop new analytical results under the new -constrained model by converting the -constrained model to a -constrained model with a carefully selected window size .

### I-C Organization of this Paper

The rest of this paper is organized as follows. We first introduce the system model and relevant performance metrics in Section II. We study the -constrained and -constrained adversary models in Sections III and IV, respectively. Finally, simulation results and conclusions are given in Sections V and VI, respectively.

## Ii System Model

Consider a network with queues (the set of all queues are denoted by ). Time is slotted with a finite horizon . Let denote the network event that occurs in slot

, which indicates the current network parameters, such as a vector of conditions for each link, a vector of exogenous arrivals to each node, or other relevant information about the current network links and exogenous arrivals. The set of all possible network events is denoted by

.

At the beginning of each time slot , the network operator observes the current network event and chooses a control action from some action space that can depend on . The network event and the control action together produce the service vector and the arrival vector . Note that includes both the admitted exogenous arrivals from outside the network to queue , and the endogenous arrivals from other queues (i.e., routed packets from other queues to queue ). Thus, the above network model accounts for both single-hop and multi-hop networks, and the control action may correspond to, for example, joint admission control, routing, rate allocation and scheduling decisions in a multi-hop network. Let be the queue length vector at the beginning of slot (before the arrivals in that slot). The queueing dynamics are

 Qi(t+1)=[Qi(t)+ai(t)−bi(t)]+, ∀i∈N,t∈T,

where

We assume that the sequence of network events are generate according to an arbitrary process (possibly non-stationary or even adversarial), except for the following boundedness assumption. Under any network event and any control action, the arrivals and the service rates in each slot are bounded by constants that are independent of the time horizon : for any and any

 0≤ai(αt,ωt)≤A,  0≤bi(αt,ωt)≤B.

For simplicity, we assume such that both arrivals and services are upper bounded by in each slot.

A policy generates a sequence of control actions within the time horizon. In each slot , the queue length vector, the arrival vector and the service rate vector under policy is denoted by , and , respectively. A causal policy is one that generates the current control action only based on the knowledge up until the current slot . In contrast, a non-causal policy may generate the current control action based on knowledge of the future.

Let be the network utility gained in slot if action is taken under network event . We assume that under any control action and any network event, network utility is bounded:

 Umin≤U(αt,ωt)≤Umax, ∀ωt∈Ω,αt∈Dωt.

A commonly-used form of the network utility function is where is the amount of admitted exogenous traffic to queue in slot . Typical examples include (total throughput), (proportional fairness), etc. In wireless networks with power control, another widely-used network utility function is where is the power allocated to queue in slot . This utility function aims to minimize the total power consumption.

In this paper, we consider the following network utility maximization problem, referred to as NUM.

NUM:

 maxαt∈Dωt T−1∑t=0U(αt,ωt) (2) s.t. T−1∑t=0ai(t)≤T−1∑t=0~bi(t), ∀i∈N, (3)

where is the actual packet departures from queue in slot . The objective (2) is to maximize the total network utility gained in the time horizon. The constraint (3) requires that the total arrivals to each queue should not exceed the total amount of departures from that queue during the time horizon. Note that the above optimization problem is a natural analogue of the traditional stochastic network optimization problem [4], where the time-average utility is maximized subject to certain network stability constraints. Indeed, if we consider a stochastic network with an infinite time horizon, then the objective (2) is equivalent to maximizing time-average network utility, and the constraint (3) requires that the time-average arrival rate to each queue should not exceed the time-average service rate, which is equivalent to rate stability222A network is rate-stable under a control policy if as ..

### Ii-a Asymptotic Notations

Let and be two functions defined on some subset of real numbers. Then if . Similarly, if . Also, if and . In addition, if , and in this case we say that is sublinear in .

### Ii-B Performance Metrics

Our objective is to find a causal control policy that can maximize the network utility while keeping the total queue length small. Note that a network with adversarial dynamics may not have any steady state or well-defined time averages. Hence, it is crucial to understand the transient behavior of the network, and the traditional equilibrium-based performance metrics may not be appropriate in an adversarial setting. As a result, we introduce the notion of utility regret to measure the finite-time performance achieved by a control policy.

###### Definition 1 (Utility Regret).

Given the time horizon , the utility regret achieved by a policy under a sequence of network events is defined to be

 RπT({ω0,⋯,ωT−1})=T−1∑t=0U(α∗t,ωt)−T−1∑t=0U(απt,ωt), (4)

where is an optimal solution to NUM generated by an “oracle” that knows the entire sequence of network events in advance.

In this setup, a policy is chosen and then the adversary selects the sequence of network events that maximize the regret. Intuitively, the notion of utility regret captures the worst-case utility difference between a causal policy and an ideal -slot lookahead non-causal policy.

Note that any optimal solution to NUM is a utility maximizing policy subject to the constraint (3) that it clears all the backlogs within the time horizon. A causal control policy may trivially maximize the network utility by simply ignoring the stability constraint (3) (e.g., admitting all the exogenous traffic) such that the utility regret become zero or even negative. However, such an action may significantly violate the stability constraint (3) and lead to large total queue length. As a result, there is a tradeoff between the utility regret and the total queue length achieved by a causal control policy.

A desirable first order characteristic of a “good” policy is that it simultaneously achieves sublinear utility regret and sublinear queue length w.r.t. the time horizon , i.e., and . Sublinear utility regret guarantees that as the time horizon , meaning that the time-average utility gained under policy asymptotically approaches that under the optimal non-causal policy. In other words, the long-term time-average utility is optimal. Sublinear queue length ensures as , which is equivalent to rate stability. Note that simultaneously achieving sublinear utility regret and sublinear queue length is equivalent to maximizing long-term time-average utility subject rate stability, which is the goal of traditional stochastic network optimization [4].

Note that simultaneously achieving sublinear utility regret and sublinear queue length is just a coarse-grained requirement for a “good” tradeoff between utility regret and queue length. In an adversarial setting with no steady state, the fine-grained growth rates of utility regret and queue length are equally important and should also be well balanced. A better tradeoff in terms of their growth rates implies that the policy has a better learning ability and can adapt to the adversarial environment faster.

Unfortunately, the following theorem shows that in general no causal policy can simultaneously achieve both sublinear utility regret and sublinear queue length.

###### Theorem 1.

For any causal policy , there exists a sequence of network events such that either the utility regret or the total queue length .

###### Proof.

We prove this theorem by considering a specific one-hop network with 2 users and constructing a sequence of adversarial network dynamics such that either the utility regret or the total queue length grows at least linearly with the time horizon . More specifically, the time horizon is split into two parts. In the first slots, the adversary just generates some regular network events, let the policy run and observes the queue lengths of the two users. In the remaining slots, the adversary sets the capacity to zero for the user with a longer queue and creates sufficient capacity for the other user such that the performance of the causal policy is significantly degraded while the “oracle” can still perform very well. See Appendix A for details. ∎

Theorem 1 shows that is it impossible to achieve sublinear utility regret while maintaining sublinear queue length, if the adversary has unconstrained power in determining the network dynamics. As a result, in the following two sections, we develop two adversary models where the sequence of network events (i.e. network dynamics) that the adversary can select is constrained to some “admissible” set. In Section III, we consider the -constrained adversary model that is an extension of the widely-known yet very stringent model used in Adversarial Queueing Theory. Next in Section IV, we develop a more relaxed adversary model called the -constrained adversary. Lower bounds on the tradeoffs between utility regret and queue length as well as the performance of some commonly-used algorithms are analyzed under the two adversary models.

In this section, we investigate the -constrained adversary model which is an extension of the classical Adversarial Queueing Theory (AQT) model [8]. It has stringent constraints on the set of admissible network dynamics that the adversary can set, yet is analytically tractable, which facilitates our subsequent investigation of a more relaxed adversary model in Section IV. We first give the definition of -constrained network dynamics.

###### Definition 2 (W-Constrained Dynamics).

Given a window size , a sequence of network events is -constrained if

 t+W−1∑τ=ta∗i(τ)≤t+W−1∑τ=tb∗i(τ), ∀i∈N,t∈T, (5)

where and is the optimal solution to NUM under the above sequence of network events.

Note that if there exist multiple optimal solutions to NUM, then constraint (5) is only required to be satisfied by any one of them. Any network satisfying the above is called a -constrained network. In other words, under the optimal (possibly non-causal) policy, the total amount of arrivals to each queue does not exceed the total amount of service offered to that queue during any window of slots.

Denote by the set of all sequences of network events that are -constrained. Then the -constrained adversary can only select the sequence of network events from the constrained set .

In the following, we first provide a lower bound on the tradeoffs between utility regret and queue length under the -constrained adversary model (Section III-A), and then analyze the tradeoffs achieved by several common control policies (Section III-B). Note that throughout this section we mainly focus on the dependence of utility regret and queue length on and while treating the number of users a constant.

### Iii-a Lower Bound on the Tradeoffs

The following theorem provides a lower bound on the tradeoffs between utility regret and queue length under the -constrained adversary model.

###### Theorem 2.

For any causal policy , there exists a sequence of network events such that

 RπT({ω0,⋯,ωT−1})+c∑iQπi(T)≥c′W,

where is some constant.

###### Proof.

We prove this theorem by constructing a sequence of network events such that lower bound is attained. The construction is similar to the one used in the proof of Theorem 1. The difference is that the constructed sequence of network events is -constrained here. See Appendix B for the detailed proof. ∎

Note that if the window size is comparable with the time horizon , i.e., , the above theorem shows that no causal policy can simultaneously achieve sublinear utility regret and sublinear queue length under the -constrained adversary model. On the other hand, if , there might exist some causal policy that attains sublinear utility and subinear queue length simultaneously, which we investigate in the next section. In particular, we show that the above lower bound can be asymptotically attained by some causal policy.

### Iii-B Algorithm Performance in W-Constrained Networks

In this section, we analyze the tradeoffs between utility regret and queue length achieved by two network control algorithms under the -constrained adversary model. The first is the famous Drift-plus-Penalty algorithm [2] that was proved to achieve good utility-delay tradeoffs in stochastic networks. The second is a generalized version of the Tracking Algorithm [5, 6] that was originally proposed for Adversarial Queueing Theory. In particular, we show that the Tracking Algorithm attains the tradeoff lower bound in Theorem 2.

#### Iii-B1 Drift-plus-Penalty Algorithm

In each slot , the Drift-plus-Penalty algorithm observes the current network event and the queue length vector , and choose the following control action :

 αDPt=argmaxαt∈Dωt∑iQi(t)(bi(αt,ωt)−ai(αt,ωt))+VU(αt,ωt), (6)

where is a parameter controlling the tradeoffs between utility regret and queue length. Note that corresponds to the drift part while is the penalty part.

The control actions in the Drift-plus-Penalty algorithm can be usually decomposed into several actions. For example, in one-hop networks without routing, corresponds to the admitted exogenous arrival vector in slot . Suppose that the utility function is in the form . Then the Drift-plus-Penalty algorithm can be decomposed into the solutions of two sub-problems.

 a(t)=argmaxa∑i(VUi(ai)−Qi(t)ai).
• (Resource Allocation and Scheduling) Choose

 b(t)=argmaxb∑iQi(t)bi.

The first part is usually a convex optimization problem while the second part corresponds to the MaxWeight policy [1].

The following theorem gives the performance of the Drift-plus-Penalty algorithm in -constrained networks.

###### Theorem 3.

In any -constrained network, the Drift-plus-Penalty algorithm with parameter achieves utility regret and the total queue length is .

###### Proof.

The proof is based on the Lyapunov drift analysis. However, instead of considering the one-slot drift as in the traditional stochastic analysis, we find upper bounds on the -slot drift-plus-penalty term and make sample-path arguments. See Appendix C for details. ∎

There are several important observations about Theorem 3. First, if parameter is set appropriately, then sublinear utility regret and sublinear queue length can be simultaneously achieved by the Drift-plus-Penalty algorithm in -constrained networks as long as . For example, if , then setting yields the utility regret of and the total queue length of .

Noticing that sublinear utility regret and sublinear queue length cannot be achieved simultaneously by any causal policy if (Theorem 2), we have the following corollary.

###### Corollary 1.

Under the -constrained adversary model, sublinear utility regret and sublinear queue length are simultaneously achievable if and only if .

Second, the performance of the Drift-plus-Penalty algorithm could be much worst than the lower bound in Theorem 2. For example, if , then one of the tradeoffs implied by the lower bound is that the utility regret is and the total queue length is also , which is not achievable by the Drift-plus-Penalty algorithm. In the next section, we develop an algorithm that has a better performance and attains the lower bound.

#### Iii-B2 Tracking Algorithm

The tradeoff bounds achieved by the Drift-plus-Penalty algorithm is relatively loose as compared to the lower bound in Theorem 2. In this section, we develop the Tracking Algorithm that has a better performance and attains the lower bound in Theorem 2.

The original Tracking Algorithm was proposed in [5, 6] to solve a scheduling problem under the Adversarial Queueing Theory model. However, it only works for a very specific network model: (i) the network has to be single-hop where the arrival vector is independent of the control action, and (ii) the control action has to satisfy the primary interference constraints, i.e., only one link incident on the same node can be activated in each slot. Next, we extend the original Tracking Algorithm to accommodate the general network model considered in this paper.

Let be the set of all possible network events that could happen in each slot. In order for the Tracking Algorithm to work, the cardinality of has to be finite (otherwise it could be discretized into a finite set as in [5]). For example, in a single-hop network, suppose each network event corresponds to a couple where is a vector of exogenous packet arrivals in slot and a vector of link states in slot . For any link and time , assume that and is an integer, and each link only has a finite number of states. Then .

The Tracking Algorithm is given in Algorithm 1. It maintains an action queue for each type of network events . The action queue stores the optimal actions that the Tracking Algorithm should have taken when network event occurred. Note that the sequence of optimal control actions cannot be calculated online but can be calculated every slots due to the window structure (5). In the Tracking Algorihtm, the sequence of optimal actions during each window are added to the action queues in batch at the end of this window (steps 8-9). Here, the optimal actions during a window corresponds to any optimal solution to (7) (which is also a part of the optimal solution to NUM). In each slot , the Tracking Algorithm first observes the current network event . If the corresponding action queue is not empty (i.e., there are some actions we should have taken but have not taken yet), the algorithm just sets the control action as the first action in the action queue , and the action is removed from the action queue (steps 3-5). If the action queue is empty, the algorithm may take any feasible action. In our analysis, we assume that no action is taken when the action queue is empty.

 maxt∑τ=t−W+1U(ατ,ωτ)s.t.t∑τ=t−W+1ai(ατ,ωτ)≤t∑τ=t−W+1bi(ατ,ωτ), ∀iατ∈Dωτ, ∀τ. (7)

The following theorem gives the tradeoff between utility regret and queue length achieved by the Tracking Algorithm under the -constrained adversary model.

###### Theorem 4.

In any -constrained network, the Tracking Algorithm achieves utility regret and the total queue length is .

###### Proof.

Since the Tracking Algorithm updates the optimal actions every slots and replays these actions whenever possible, the number of unfulfilled actions in any action queue is at most . Thus, the performance gap between the Tracking Algorithm and optimal policy is also . See Appendix D for details. ∎

There are several important observations about Theorem 4. First, under the -constrained adversary model, sublinear utility regret and sublinear queue length can be simultaneously achieved by the Tracking Algorithm as long as . Moreover, the tradeoff achieved by the Tracking Algorithm is better than that of the Drift-plus-Penlaty algorithm, in terms of their dependence on and . For example, if , the Tracking Algorithm can achieve utility regret and total queue length, while such a tradeoff is not attainable by the Drift-plus-Penalty algorithm.

Second, the Tracking Algorithm asymptotically achieves the lower bound in Theorem 2 in the sense that it ensures that for any . As a result, the Tracking Algorithm asymptotically achieves the optimal tradeoff between utility regret and queue length w.r.t. and .

Third, the Tracking Algorithm needs to maintain a virtual queue for each type of network events while the size of the network event space may be exponential in the number of users . As a result, the Tracking Algorithm may not be a practical algorithm. The purpose of presenting the Tracking Algorithm is to demonstrate that the lower bound in Theorem 2 could be asymptotically achieved by a causal policy. Note that Andrews and Zhang [5] proposed a method to get rid of the exponential dependence on , at the expense of much more involved algorithm.

Finally, the Tracking Algorithm described in Algorithm 1 only achieves one point in the tradeoff curve since it only tracks the optimal solution to NUM. One approach to enable tunable tradeoffs is to relax the optimization problem (7). For example, the first constraint in (7) can be modified to

 t∑τ=t−W+1ai(ατ,ωτ)≤t∑τ=t−W+1bi(ατ,ωτ)+V,

for some parameter . Clearly, by tuning the value of , the optimal solution to (7) (denoted by ) can achieve different tradeoffs. By tracking the solution , the Tracking Algorithm can achieve tunable tradeoffs. The analysis of the tunable Tracking Algorithm is similar to the proof of Theorem 4 but requires more specific assumptions on the utility function, and is omitted due to space constraints.

Note that the above Tracking Algorithm requires as a parameter. We discuss how to properly select the value of in Section IV-C2.

The aforementioned -constrained model is relatively restrictive, where the stringent constraints (5) have to be satisfied for every window of slots. In this section, we consider a general adversary model where the window constraints (5) are relaxed.

The new adversary model is parameterized by the inherent variation in the sequence of network events, which is measured as follows. Given a sequence of network events and a (possibly non-causal) policy, we define

 Vπ({ω0,⋯,ωT−1})=maxt≤T∑iQπi(t).

The above function measures the peak queue length achieved by policy during its sample path. We further define to be the peak queue length during the sample path of the optimal solution to NUM under the sequence of network events . If there are multiple optimal solutions to NUM, then the one with the smallest value of is considered. Note that only depends on and measures the inherent variations in the sequence of network events.

Now we define the notion of -constrained network dynamics where the value of is constrained by some budget .

###### Definition 3 (Vt-Constrained Dynamics).

Given some , a sequence of network events is -constrained if

 V∗({ω0,⋯,ωT−1})≤VT.

Any network satisfying the above is called a -constrained network. Denote by the set of all possible sequences of network events that are -constrained. A -constrained adversary can only select the sequence of network events from the set .

Note that we restrict the range of to since the peak queue length within slots is at most . Any larger value of has the same effect as . Note also that the larger is, the more variations the network could have. By varying the value of from 0 to , the above -constrained adversary model covers a wide range of adversarial settings: from a strictly constrained adversary (, i.e., the arrivals should not exceed the services for each queue in every slot) to a completely unconstrained adversary ().

In the following, we first provide a lower bound on the tradeoffs between utility regret and queue length under the -constrained adversary model in Section IV-A and then analyze the performance of the Drift-plus-Penalty policy and the Tracking Algorithm in Section IV-B.

### Iv-a Lower Bound on the Tradeoffs

The following theorem provides a lower bound on the tradeoffs between utility regret and queue length under the -constrained adversary model.

###### Theorem 5.

For any causal policy , there exists a sequence of network events such that

 RπT({ω0,⋯,ωT−1})+c∑iQπi(T)≥c′VT,

where is some constant.

###### Proof.

The proof is the same as that for Theorem 2 except that we replace with , thus omitted for brevity. ∎

Theorem 5 shows that if , then no causal policy can simultaneously achieve sublinear utility regret and sublinear queue length under the -constrained adversary model. On the other hand, if , there might exist some causal policy that attains sublinear utility regret and sublinear queue length simultaneously, which we investigate in Section IV-B.

### Iv-B Algorithm Performance in Vt-Constrained Networks

In this section, we analyze the tradeoffs between utility regret and queue length achieved by two algorithms in -constrained networks: the Drift-plus-Penalty algorithm and the Tracking Algorithm. In particular, we show that both algorithms simultaneously achieve sublinear utility regret and sublinear queue length if .

#### Iv-B1 Drift-plus-Penalty Algorithm

The Drift-plus-Penalty algorithm discussed in Section III-B can be directly applied to the -constrained setting. The following theorem gives the tradeoffs between utility regret and queue length achieved by the Drift-plus-Penalty algorithm under the -constrained adversary model.

###### Theorem 6.

In any -constrained network, the Drift-plus-Penalty algorithm with parameter achieves utility regret and the total queue length is .

###### Proof.

We first divide the time horizon into frames of slots. Then we apply the analysis used in the -constrained adversary model and derive bounds on the -slot drift-plus-penalty term, which further leads to upper bounds on utility regret and queue length. The value of is carefully chosen to optimize these bounds. See Appendix E for details. ∎

There are several observations about Theorem 6. First, the Drift-plus-Penalty algorithm achieves sublinear utility regret and sublinear queue length under the -constrained adversary model whenever . For example, if and we set , then the utility regret and the total queue length are both . Notice that sublinear utility regret and sublinear queue length cannot be simultaneously achieved by any causal policy if (Theorem 5). We have the following corollary.

###### Corollary 2.

Under the -constrained adversary model, sublinear utility regret and sublinear queue length are simultaneously achievable if and only if .

Second, the Drift-plus-Penalty algorithm does not attain the lower bound in Theorem 5. For example, if , one of the tradeoffs implied by the lower bound is that the utility regret is and the total queue length is also , which is not achievable by the Drift-plus-Penalty algorithm. In fact, although the Drift-plus-Penalty algorithm can achieve sublinear utility regret and sublinear queue length, the tradeoff bound in Theorem 6 is relatively loose. In the next section, we show that the Tracking Algorithm can achieve a better tradeoff bound than the Drift-plus-Penalty algorithm.

#### Iv-B2 Tracking Algorithm

The Tracking Algorithm introduced under the -constrained adversary model requires that the window constraints (5) be satisfied for some window size . However, there might be no window structure under the -constrained adversary model and thus the Tracking Algorithm cannot be directly applied in -constrained networks. We slightly modify the Tracking Algorithm in two aspects. First, the window size is set to be under the -constrained adversary model. Second, in step 8 of the original Tracking Algorithm, the optimization problem (7) is modified to be

 maxt∑τ=t−W+1U(ατ,ωτ)s.t.t∑τ=t−W+1ai(ατ,ωτ)≤t∑τ=t−W+1bi(ατ,ωτ)+VT, ∀iατ∈Dωτ, ∀τ. (8)

In particular, the first constraint in (7) is relaxed by allowing some bursts up to . Note that by the definition of -constrained networks, the optimal solution to NUM is also a feasible solution to (8). Under the above setting, the utility regret and the total queue length achieved by the Tracking Algorithm in -constrained networks is given by the following theorem333As is discussed in Section III-B2, the set of possible network events should be finite in order for the Tracking Algorithm to work..

###### Theorem 7.

Under the -constrained adversary model, the Tracking Algorithm achieves utility regret and the total queue length is .

###### Proof.

The proof is similar to the analysis under the -constrained adversary model, except that an additional terms is added in the first constraint of (8). See Appendix F for details. ∎

There are several important observations about Theorem 7. First, the Tracking Algorithm can simultaneously achieve sublinear utility regret and sublinear queue length whenever . Second, the performance of the Tracking Algorithm is better than that under the Drift-plus-Penalty algorithm in -constrained networks. For example, if we set and , then the Tracking Algorithm achieves utility regret and queue length, which is not achievable by the Drift-plus-Penalty algorithm. Finally, the Tracking Algorithm does not attain the tradeoff lower bound in Theorem 5. Thus, finding a causal policy that can close the gap remains an open problem.

Note that the above Tracking Algorithm requires as a parameter. We discuss how to properly select the value of in Section IV-C2.

### Iv-C Discussions

#### Iv-C1 Relationship between Adversary Models

The -constrained adversary model generalizes the -constrained adversary model: any sequence of network events that are -constrained must also be -constrained with due to the window structure (note that the peak queue length under the optimal policy is at most ). The analysis in the -constrained adversary model also gives a more general condition for sublinear utility regret and sublinear queue length.

#### Iv-C2 Choosing Parameters for Tracking Algorithm

Note that the Tracking Algorithm requires as a parameter. Unfortunately, in practice, it is impossible to know the precise value of for a given network in advance. To alleviate this issue, we can search for the correct value of . Note that the range for is . Then one may perform binary search to find the correct value of by running the Tracking algorithm with different values of over multiple episodes within the time horizon (e.g., if the time horizon is slots, then one episode could be slots). Similar techniques can be applied if the Tracking Algorithm is used in -constrained networks where the value of is required as input parameters.

## V Simulations

In this section, we empirically validate the theoretical bounds derived in this paper and compare the performance of the Drift-plus-Penalty and the Tracking Algorithm.

Figure 1 illustrates the growth of the total queue length and the utility regret with the time horizon under the Drift-plus-Penalty algorithm (with different values of ) and the Tracking algorithms. First, when , the Drift-plus-Penalty can simultaneously achieve sublinear utility regret and sublinear queue length, if the parameter is set appropriately (for example, ). Note that setting to some very large value (e.g., ) still achieves sublinear utility regret and sublinear queue length, though the theoretical bound on queue length (see Theorem 3) is at least linear in when , which shows that the performance upper bound is not tight in this scenario. The Tracking Algorithm also simultaneously achieves sublinear utility regret and sublinear queue length when . However, when , both algorithms fail to achieve desirable performance: either the utility regret or the queue length grows linearly with . In fact, the lower bound in Theorem 2 shows that no causal policy can achieve both sublinear utility regret and sublinear queue length if .

Figure 2 shows the tradeoffs between utility regret and queue length under the Drift-plus-Penalty algorithm and the Tracking Algorithm, where we fix the time horizon to be slots and the window size . Note that for the Drift-plus-Penalty algorithm, we plot a tradeoff curve (since it achieves different tradeoffs by tuning the parameter ), while only a single tradeoff point is plotted for the Tracking Algorithm. It is observed that the Tracking Algorithm achieves a better tradeoff point that is not achievable by the Drift-plus-Penalty algorithm. In addition, the theoretical lower bound for any causal policy (Theorem 2) and the theoretical performance upper bounds for both algorithms (Theorems 3 and 4) are also validated in the figure.

## Vi Conclusions

In this paper, we focus on optimizing network utility within a finite time horizon under adversarial network models. We show that no causal policy can simultaneously achieve both sublinear utility regret and sublinear queue length if the network dynamics are unconstrained, and investigate two constrained adversary models. We first consider the restrictive -constrained adversary model and then propose a more relaxed -constrained adversary model. Lower bounds on the tradeoffs between utility regret and queue length are derived under the two adversary models, and the performance of two control policies is analyzed, i.e., the Drift-plus-Penalty algorithm and the Tracking Algorithm. It is shown that the Tracking Algorithm asymptotically attains the optimal tradeoffs under the -constrained adversary model and that the Tracking Algorithm has a better tradeoff bound than that of the Drift-plus-Penalty

## References

• [1] L. Tassiulas and A. Ephremides, “Stability properties of constrained queueing systems and scheduling policies for maximum throughput in multihop radio networks,” IEEE transactions on automatic control, vol. 37, no. 12, pp. 1936–1948, 1992.
• [2] M. J. Neely, E. Modiano, and C.-P. Li, ‘Fairness and optimal stochastic control for heterogeneous networks,” IEEE/ACM Transactions on Networking (TON), vol. 16, no. 2, pp. 396–409, 2008.
• [3] Y. Zou, J. Zhu, X. Wang, and L. Hanzo, “A survey on wireless security: Technical challenges, recent advances, and future trends,” Proceedings of the IEEE, vol. 104, no. 9, pp. 1727–1765, 2016.
• [4] M. J. Neely, “Stochastic network optimization with application to communication and queueing systems,” Synthesis Lectures on Communication Networks, vol. 3, no. 1, pp. 1–211, 2010.
• [5] M. Andrews and L. Zhang, “Scheduling over a time-varying userdependent channel with applications to high speed wireless data,” in The 43rd Annual IEEE Symposium on Foundations of Computer Science, 2002. Proceedings., 2002, pp. 293–302.
• [6] M. Andrews and L. Zhang, “Scheduling over nonstationary wireless channels with finite rate sets,” IEEE/ACM Transactions on Networking, vol. 14, no. 5, pp. 1067-1077, Oct 2006.
• [7] R. L. Cruz, “A calculus for network delay. i. network elements in isolation,” IEEE Transactions on information theory, vol. 37, no. 1, pp. 114–131, 1991.
• [8] A. Borodin, J. Kleinberg, P. Raghavan, M. Sudan, and D. P. Williamson, “Adversarial queuing theory,” Journal of the ACM (JACM), vol. 48, no. 1, pp. 13–38, 2001.
• [9] M. Andrews, B. Awerbuch, A. Fern´andez, T. Leighton, Z. Liu, and J. Kleinberg, “Universal-stability results and performance bounds for greedy contention-resolution protocols,” Journal of the ACM (JACM), vol. 48, no. 1, pp. 39–69, 2001.
• [10] V. Cholvi and J. Echag¨ue, “Stability of fifo networks under adversarial models: State of the art,” Computer Networks, vol. 51, no. 15, pp. 4460–4474, 2007.
• [11] M. Andrews, K. Jung, and A. Stolyar, “Stability of the max-weight routing and scheduling protocol in dynamic networks and at critical loads,” in

Proceedings of the Thirty-ninth Annual ACM Symposium on Theory of Computing

, ser. STOC ’07. ACM, 2007, pp. 145–154.
• [12] S. Lim, K. Jung, and M. Andrews, “Stability of the max-weight protocol in adversarial wireless networks,” IEEE/ACM Trans. Netw., vol. 22, no. 6, pp. 1859–1872, Dec. 2014.
• [13] M. J. Neely, “Universal scheduling for networks with arbitrary traffic, channels, and mobility,” in Decision and Control (CDC), 2010 49th IEEE Conference on. IEEE, 2010, pp. 1822–1829.
• [14] Shai et al., “Online learning and online convex optimization,” in

Foundations and Trends in Machine Learning

, vol. 4, no. 2, pp. 107-194, 2012

## Appendix A Proof to Theorem 1

We prove this theorem by constructing a sequence of network events such that either utility regret or total queue length grows at least linearly with the time horizon . Consider a single-hop network with 2 links. In each slot , the central controller observes the current network event , where is the exogenous arrival vector and is the channel rate vector for each link in slot . Then the controller makes an admission control and a scheduling decision. The constraint on the admission control action is for each link , and the constraint on the scheduling decision is that at most one of the links can be served in each slot. The network utility is a function of the admitted traffic vector , i.e., , where is convex and strictly increasing in . In particular, any subderivative of over the range is lower bounded by some constant . Typical examples of such utility functions are (total throughput) and (proportional fairness).

Without loss generality, assume that the time horizon is an even number. The exogenous arrivals and channel rates in the first slots are

 A1(t)=A2(t)=2,  S1(t)=S2(t)=2, ∀t=0,⋯,T2−1.

For any causal policy , let and be the number of packets cleared over link 1 and 2 during the first slots, respectively. Also let and be the number of admitted packets to link 1 and link 2 during the first slots, respectively. Then the queue length vector after the first slots is

 Qπi(T/2)=Aπi−Bπi, i=1,2.

Under the scheduling constraint, the total number of packets that can be cleared in the first slots is at most . Then we have , which implies that . Define . In the remaining slots, the adversary can set

 Ai∗(t)=0, Si∗(t)=0, t=T/2,⋯,T−1.

For the other link (its index is denoted by ), the adversary can set

 Ai′(t)=0, Si′(t)=2, t=T/2,⋯,T−1.

Since there is no capacity to clear any packet over link in the remaining slots, we have

 Qπi∗(T)=Qπi∗(T/2)=Aπi∗−Bπi∗. (9)

Note that the optimal non-causal policy can admit all the exogenous traffic while keeping the total queue length by serving link in the first slots and serving link in the remaining slots. As a result, the utility regret is

 RπT({ω0,⋯,ωT−1})=T−1∑t=0[U(a∗(t))−U(aπ(t))]=T−1∑t=0∑i[Ui(a∗i(t))−Ui(aπi(t))]≥cT−1∑t=0∑i(a∗i(t)−aπi(t))=c(2T−Aπ1−Aπ2), (10)

where the inequality is due to the concavity of the utility function and the fact that the subderivatives of the utility function are lower-bounded by . The last equality holds because the total admitted traffic by the optimal policy is while the total admitted traffic by the causal policy is . Then it follows that

 RπT({ω0,⋯,ωT−1})+c∑iQπi(T)≥RπT({ω0,⋯,ωT−1})+cQπi∗(T)≥c(2T−Aπ1−Aπ2+Aπi∗−Bπi∗)=c(2T−Aπi′−Bπi∗)≥c(2T−T−T/2)=cT/2,

where the second inequality is due to (9) and (10), and the last inequality holds because the total admitted traffic over link is and the amount of cleared traffic over is by the definition of . Therefore, it is impossible for any causal policy to simultaneously achieve both sublinear utility regret and sublinear queue length, otherwise .

Remark: Note that the above construction requires the value of . We can eliminate the dependence on the time horizon by using the standard Doubling Tricks (see Section 2.3.1 in [14]).

## Appendix B Proof to Theorem 2

We prove this theorem by constructing a sequence of network events such that the lower bound is attained. Consider the same network setting as in the proof of Theorem 1. Without loss generality, assume that the window size is an even number. The exogenous arrivals and channel rates in the first slots are

 A1(t)=A2(t)=2,  S1(t)=S2(t)=2, ∀t=0,⋯,W2−1.

For any causal policy , let and be the number of packets cleared over link 1 and 2 during the first slots, respectively. Also let and be the number of admitted packets to link 1 and link 2 during the first slots, respectively. Then the queue length vector after the first slots is

 Qπi(W/2)=Aπi−Bπi, i=1,2.

Under the scheduling constraint, the total number of packets that can be cleared in the first slots is at most . Then we have , which implies that . Define . In the remaining slots, the adversary can set

 Ai∗(t)=0, Si∗(t)=0, t=W/2,⋯,T−1.

For the other link (its index is denoted by ), the adversary can set

 Ai′(t)=0, Si′(t)=2, t=W/2,⋯,T−1.

Since there is no capacity to clear any packet over link in the remaining slots, we have

 Qπi∗(T)=Qπi∗(W/2)=Aπi∗−Bπi∗. (11)

Note that the optimal non-causal policy can admit all the exogenous traffic while satisfying the window constraints (5) by serving link in the first slots and serving link in the remaining