# On Distributed Online Convex Optimization with Sublinear Dynamic Regret and Fit

In this work, we consider a distributed online convex optimization problem, with time-varying (potentially adversarial) constraints. A set of nodes, jointly aim to minimize a global objective function, which is the sum of local convex functions. The objective and constraint functions are revealed locally to the nodes, at each time, after taking an action. Naturally, the constraints cannot be instantaneously satisfied. Therefore, we reformulate the problem to satisfy these constraints in the long term. To this end, we propose a distributed primal-dual mirror descent based approach, in which the primal and dual updates are carried out locally at all the nodes. This is followed by sharing and mixing of the primal variables by the local nodes via communication with the immediate neighbors. To quantify the performance of the proposed algorithm, we utilize the challenging, but more realistic metrics of dynamic regret and fit. Dynamic regret measures the cumulative loss incurred by the algorithm, compared to the best dynamic strategy. On the other hand, fit measures the long term cumulative constraint violations. Without assuming the restrictive Slater's conditions, we show that the proposed algorithm achieves sublinear regret and fit under mild, commonly used assumptions.

There are no comments yet.

## Authors

• 8 publications
• 10 publications
• 6 publications
• 2 publications
• 35 publications
03/06/2019

### Distributed Online Convex Optimization with Time-Varying Coupled Inequality Constraints

This paper considers distributed online optimization with time-varying c...
06/21/2018

### Online Saddle Point Problem with Applications to Constrained Online Convex Optimization

We study an online saddle point problem where at each iteration a pair o...
01/14/2017

### An Online Convex Optimization Approach to Dynamic Network Resource Allocation

Existing approaches to online convex optimization (OCO) make sequential ...
05/16/2019

### Online Learning over Dynamic Graphs via Distributed Proximal Gradient Algorithm

We consider the problem of tracking the minimum of a time-varying convex...
02/17/2016

### Online optimization and regret guarantees for non-additive long-term constraints

We consider online optimization in the 1-lookahead setting, where the ob...
01/08/2022

### Lazy Lagrangians with Predictions for Online Learning

We consider the general problem of online convex optimization with time-...
12/07/2021

### Improving Dynamic Regret in Distributed Online Mirror Descent Using Primal and Dual Information

We consider the problem of distributed online optimization, with a group...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## I Introduction

Many problems of practical interest, including network resource allocation [1], target tracking [2], network routing [3], online regression [4], and spam filtering [5] can be framed in an Online Convex Optimization (OCO) framework. The OCO framework first introduced in [3] aims to minimize a time varying convex objective function which is revealed to the observer in a sequential manner. For a detailed review of OCO, please see [5, 4]. In this work, we consider a constrained OCO problem, with time-varying (potentially adversarial) constraints.

Recently, distributed OCO frameworks have gained popularity as they distribute the computational and memory resources across multiple nodes rather than having a central node perform all the operations [6, 2, 7, 8, 9]. We consider the constrained OCO problem in a distributed framework, where the convex objective is assumed to be decomposed and distributed across a set of multiple communicating agents. Each agent takes its own action with the goal of minimizing the dynamically varying global function while satisfying its individual constraints. Next, we discuss the related work along with the performance metrics we use to evaluate the performance of the proposed algorithm.

### I-a Related Work

Regret: The performance in OCO problems is quantified in terms of how well the agent does as compared to an offline system, over time. In other words, how much the agent “regrets” not having the information, which was revealed to it post-hoc, to begin with. Since regret is cumulative over time, an algorithm that achieves sub-linear increase in regret with time, asymptotically achieves zero average loss. It is naturally desirable to compare against an offline system, the action(s) of which are “optimal” in some sense.

Static Regret: The initial work on OCO, starting with [3, 4, 5], almost exclusively focused on static regret , which uses an optimal static solution, in hindsight, as the benchmark. In other words, the fictitious offline adversary w.r.t. which the online system measures its regret, chooses the best fixed strategy, assuming it has access to the entire information, which is revealed to the online system over time horizon .

 RegsT≜T∑t=1ft(xt)−minxT∑t=1ft(x).

Under standard regularity conditions, for general OCO problems, a tight upper bound of has been shown for static regret [3, 10]

. However, for applications such as online parameter estimation or tracking moving targets, where the quantity of interest also evolves over time, comparison with a static benchmark is not sufficient.

This deficiency led to the development of dynamic regret [11, 12]. Rather than comparing the performance relative to a fixed optimal strategy, a more demanding benchmark is used. More precisely, at each time instant, our fictitious adversary utilizes one-step look-ahead information to adopt the optimal strategy at the current time instant.

 RegdT≜T∑t=1ft(xt)−T∑t=1minxft(x).

In this work, we adopt the notion of dynamic regret as the performance metric. It must, however, be noted that, in the worst case, it is impossible to achieve sublinear dynamic regret [3]. For such problems, the growth of dynamic regret is captured by the regularity measure which measures variations of the minimizer sequence over time (see in Theorem V.5).

Constraints: The conventional approaches for OCO are based on projection-based gradient descent-like algorithms. However, when working with functional inequality constraints (as opposed to simple convex feasible set constraints), the projection step in itself is computationally intensive. This led to the development of primal-dual algorithms for OCO [13, 14, 15]. Instead of attempting to satisfy the constraints at each time instant, the constraints are satisfied in the long run. In other words, the cumulative accumulation of instantaneous constraint violations (often simply called fit) is shown to be sublinear in . This formulation allows constraint violations at some instants to be “taken-care-of” by strictly feasible actions at other times.111Some more recent works [15] have considered the more stringent constraint violation metric .

Initially the constraints were assumed to be static across time [13, 14]. However, subsequent literature [16, 1] demonstrated that the analysis for primal-dual methods can be generalized to even handle time-varying inequality constraints. Minor variations of primal-dual methods, which replace the dual update step with virtual-queue (modified Lagrange multiplier) updates have also been proposed to handle time-varying [17] and stochastic constraints [18].

Distributed OCO Problems: So far we have only discussed centralized problems. Suppose the OCO system has a network of agents, and local cost (and possibly constraint) functions are revealed to each agent over time. The global objective is to minimize the total cost function, while also satisfying all the constraints. And each agent can only communicate with those agents that are in its immediate neighborhood. This distributed OCO problem is more challenging and much less studied in the literature than the centralized problem.

Distributed OCO problems with static set constraints have been widely studied in recent years [6, 2, 7, 8, 9]. Again here, the literature on distributed OCO with dynamic regret is much sparser than for static regret. The authors in [2] have proposed a dynamic mirror descent based algorithm, where primal update steps are alternated with local consensus steps. The authors in [8] have proposed a distributed primal-dual algorithm for the OCO problem with coupled inequality constraints. The constraint functions are static over time. This has been generalized for time-varying coupled constraints in [9], where the authors have shown sublinearity of regret and fit, both w.r.t. dynamic and static benchmarks. However, to the best of our knowledge, the distributed OCO problem with a dynamic benchmark, even with static non-coupled inequality constraints has so far not been considered in the literature.

### I-B Our Contributions

In this work, we consider a distributed online convex optimization problem, where both the cost functions and the time-varying inequality constraints are revealed locally to the individual nodes. We propose a primal-dual mirror-descent based algorithm, which alternates between the local primal and dual update steps and the consensus steps to mix the local primal variables with the immediate neighbors. Importantly, we show that the proposed algorithm achieves sublinear dynamic regret and fit.

### I-C Paper Organization and Notations

The paper is organized as follows: the problem formulation is discussed in Section II, along with the definitions of the performance metrics. In Section III, we provide some background results and the assumptions required for providing theoretical guarantees. We propose our primal-dual mirror descent based algorithm in Section IV, followed by the theoretical results in Section V. Finally, we conclude the paper in Section VI.

Notations:Vectors are denoted with lowercase bold letters, e.g., , while matrices are denoted using uppercase bold letters, e.g., . The set of positive integers is represented by . We use to denote the -dimensional non-negative orthant. For , the set is denoted by . We denote by the Euclidean norm for vectors, and the induced 2-norm for matrices. denotes a zero vector, where the dimension is clear from the context. denotes the projection onto .

## Ii Problem Formulation

We consider a network of agents. At each time instant , each agent takes an action , where the set

is fixed across time, across all the nodes. Then, a set of local loss functions

with are revealed to the individual nodes, leading to individual loss at node . Additionally, another set of local functions with are revealed, corresponding to local constraints . The network objective is to minimize the global average of the local cost functions , while also satisfying all the local constraint functions .

 minxt∈X ft(xt)≜n∑i=1fi,t(xt) subject to gi,t(xt)≤0m,∀ i∈[n]. (1)

Since the objective is to minimize the global function , the nodes need to communicate among themselves. We next define the metrics used to measure the performance of the proposed approach.

### Ii-a Performance Metrics - Dynamic Regret and Fit

We use the recently defined notion of dynamic regret [12, 11] to measure the performance relative to a time-varying benchmark.

 RegdT ≜1nn∑i=1T∑t=1ft(xi,t)−T∑t=1ft(x∗t), (2)

where is the local action of agent at time , while is the solution of the following problem.

 x∗t∈argminx∈X{ft(x)∣∣gi,t(x)≤0,∀ i∈[n]}, (3)

As pointed out earlier, it is impossible to satisfy the time-varying constraints instantaneously, since they are revealed post-hoc. As a surrogate, to ensure the local constraints are satisfied in the long run, we use the distributed extension of fit as the performance metric. Fit has been used in the context of both time-invariant [13], as well as time-varying constraints [1, 6], for single node problems. Our definition is motivated by the one given in [7] for continuous time problems. It measures the average accumulation of constraint violations over time.

 FitdT (4)

Here, is the constraint violation at agent , if it adopts the actions of agent . Note that is different from requiring the constraint to be met at every time instant .

Next, we discuss the assumptions and some background required for the analysis of the proposed OCO framework. Note that the following assumptions are standard for decentralized OCO problems [2, 9].

## Iii Background and Assumptions

### Iii-a Network

We assume the agents are connected together via an undirected graph . denotes the set of nodes of the graph, each of which represents an agent. is the set of edges between the nodes. implies that nodes and are connected in the graph. The set of edges has an associated weight matrix , such that

 W={[W]ij>0 if (i,j)∈Eij=0 else. (5)

The set of neighbors of node is, therefore, defined as . Note that .

Assumption A: The network is connected. The weight matrix is symmetric, doubly stochastic, such that

 n∑i=1[W]ij=n∑j=1[W]ij=1. (6)

Next, we discuss the properties of the local cost functions and constraints.

### Iii-B Local Objective Functions and Constraints

Assumption B: We assume the following conditions on the set , the objective and constraint functions.

1. The set is convex and compact. Therefore, there exists a positive constant such that

 ∥x−y∥≤d(X), ∀ x,y∈X. (7)
2. The local node functions are Lipschitz continuous on , i.e.,

 ∥fi,t(x)−fi,t(y)∥≤L∥x−y∥∥gi,t(x)−gi,t(y)∥≤L∥x−y∥ (8)

for any .

3. The functions are convex and uniformly bounded on the set , i.e., there exists a constant such that

 ∥fi,t(x)∥≤F,∥gi,t(x)∥≤F, (9)

4. exist and are uniformly bounded on , i.e., there exists a constant such that

 ∥∇fi,t(x)∥≤G,∥∇gi,t(x)∥≤G, (10)

.

Next, we briefly discuss the Bregman Divergence measure, which is crucial to the proposed mirror descent based approach.

### Iii-C Bregman Divergence

Suppose we are given a -strongly convex function , i.e. , . The Bregman Divergence w.r.t. is defined as

 DR(x,y)≜R(x)−R(y)−⟨x−y,∇R(y)⟩. (11)

Since is -strongly convex, for any

 DR(x,y)≥μ2∥y−x∥2. (12)

We assume the following conditions on .

Assumption C:

1. Separate Convexity property [19]: Given and scalars on the

-dimensional probability simplex, the Bregman Divergence satisfies

 DR(x,m∑i=1αiyi)≤m∑i=1αiDR(x,yi). (13)
2. The Bregman divergence satisfies the following Lipschitz continuity condition [20]

 |DR(x,y)−DR(z,y)|≤K∥x−z∥ (14)

for any . This condition is satisfied if is Lipschitz continuous on . Consequently,

 DR(x,y)≤Kd((X)), ∀ x,y∈X, (15)

where is defined in (7).

We next give a result on Bregman divergence from [9] which is crucial to our analysis.

###### Lemma III.1.

Let be a -strongly convex function. Also, assume is a closed, convex set in and is a convex function. Assume that exists . Then, given , the regularized Bregman projection

 y=argminx∈X{h(x)+DR(x,z)}, (16)

satisfies the following inequality

 (17)

.

### Iii-D Projection

For a set , the projection operator is defined as

 PA(y)≜argminx∈A∥x−y∥2, (18)

. For closed and convex , projection always exists and is unique. If , projection is denoted by and it satisfies

 ∥∥[x]+−[y]+∥∥≤∥x−y∥,∀ x,y∈Rd. (19)

## Iv Distributed Primal-Dual Mirror Descent based Algorithm

We next discuss the proposed distributed primal-dual mirror descent based algorithm for online convex optimization with time-varying constraints. The pseudo-code is outlined in Algorithm 1. The algorithm runs in parallel at all the nodes. At the end of time , is the action (primal variable) at node . Following this, the local functions are revealed to the agent. The corresponding function values and gradients are utilized to carry-out the updates in the next time step . First, each agent performs the primal update locally (Step 8). This is followed by the dual update (Step 11). Note that the projection ensures that the dual variable lies in the non-negative orthant . At the end of each time step, an average consensus step is taken across the nodes, where the local updated primal variables are received from the neighbors, to compute the action .

###### Remark 1.

Note that the primal and dual update steps employ different step-sizes, and , respectively. This idea originated in [14] and leads to flexibility in terms of the trade-off between the bounds on dynamic regret and fit.

In the next section, we bound the dynamic regret and fit which result from Algorithm 1, and show them to be sublinear in the time-horizon .

## V Dynamic Regret and Fit Bounds

First, we discuss some intermediate results required to show the sublinearity of dynamic regret and fit. We have omitted the proofs due to space limitations. Our analysis follows closely the work in [2] and [9].

### V-a Some Intermediate Results

###### Lemma V.1.

Suppose Assumption B holds. , , generated by Algorithm 1 satisfy

 ∥∥qi,t∥∥≤Fβt (20) Δt+12γt+1≤nB212γt+1+n∑i=1qTi,t[∇gi,t(xi,t)](yi,t+1−xi,t) +(G2αt+1μ+βt+12)n∑i=1∥qi∥2+n∑i=1(qi,t−qi)Tgi,t(xi,t) +μ4αt+1n∑i=1∥yi,t+1−xi,t∥2 (21)

where ,

 Δt+1≜n∑i=1[∥qi,t+1−qi∥2−(1−γt+1βt+1)∥qi,t−qi∥2],

and are arbitrary vectors in .

###### Remark 2.

The penalty term in the dual update (step 11, Algorithm 1) helps in upper bounding the local dual variables. This idea was initially used in [13] and helps get rid of the requirement of Slater’s condition. measures the regularized drift of the local dual variables. See [11] and [9] for similar results, respectively in centralized and distributed contexts.

Next, we sum the left hand side of (21) over to get

 T∑t=1Δt+12γt+1=12T∑t=1(1γt−1γt+1+βt+1)n∑i=1∥qi,t−qi∥2 −12n∑i=1[1γ1∥qi,1−qi∥2−1γT+1∥qi,T+1−qi∥2]. (22)

Recall that , . We combine (21) and (22), and define such that

 gc(q1,…,qn)≜n∑i=1qTi(T∑t=1gi,t(xi,t)) −[12γ1+T∑t=1(G2αt+1μ+βt+12)]n∑i=1∥qi∥2 ≤nB212T∑t=1γt+1+T∑t=1n∑i=1qTi,t[∇gi,t(xi,t)](yi,t+1−xi,t) +T∑t=1n∑i=1qTi,tgi,t(xi,t)+T∑t=1μ4αt+1n∑i=1∥yi,t+1−xi,t∥2 −12T∑t=1(1γt−1γt+1+βt+1)n∑i=1∥qi,t−qi∥2. (23)

The function will be used later in Lemma V.4 to upper bound both the dynamic regret and fit, by appropriately choosing .

Before looking at the primal updates, we first consider one of the constituent terms in (2).

 ft(xi,t)−ft(x∗t)=ft(xi,t)−ft(¯xt)+ft(¯xt)−ft(x∗t) ≤L∥xi,t−¯xt∥+ft(¯xt)−ft(x∗t) (24) =1nn∑j=1{fj,t(¯xt)−fj,t(x∗t)+fj,t(xj,t)−fj,t(xj,t)} +L∥xi,t−¯xt∥ ≤1nn∑j=1{fj,t(xj,t)−fj,t(x∗t)}+L∥xi,t−¯xt∥ +Lnn∑j=1∥∥xj,t−¯xt∥∥. (25)

We use assumption (B2) to obtain both (24), (25). Now, from the definition of dynamic regret (2), we get

 RegdT ≤1nn∑i=1T∑t=11nn∑j=1{fj,t(xj,t)−fj,t(x∗t)} +2Lnn∑i=1T∑t=1∥xi,t−¯xt∥. (26)

Next, we upper bound both the terms in (26). First, we upper bound the first term in the following lemma.

###### Lemma V.2.

Suppose Assumptions A-C hold. , , if is the sequence generated by Algorithm 1. Then,

 T∑t=1n∑i=1[fi,t(xi,t)−fi,t(x∗t)] ≤nG2μT∑t=1αt+1−T∑t=1n∑i=1μ4αt+1∥yi,t+1−xi,t∥2 −T∑t=1n∑i=1qTi,t[gi,t(xi,t)+∇gi,t(xi,t)(yi,t+1−xi,t)] +n∑i=1[1α2DR(x∗1,xi,1)−1αT+2DR(x∗T+1,xi,T+1)] +nKαT+2T∑t=1∥x∗t+1−x∗t∥+nKd((X))αT+2. (27)

Next, we upper bound the second term in (26). This is the consensus error of the primal variables.

###### Lemma V.3.

(Network Error): Suppose Assumptions A-C hold. Then, the local estimates generated by Algorithm 1 satisfy

 ∥xi,t−¯xt∥≤t−1∑τ=0√nσt−τ2(W)Gατ+1μ(1+Fβτ+1) (28)

, where .

is the second largest eigenvalue of

in magnitude.

###### Remark 3.

The network error bound is (28) is independent of the node index . The dependence on captures the speed with which mixing of iterates happens. The smaller the value of , the faster the network error diminishes. Moreover, the choice of the primal update step sizes and the dual update regularization parameters has a crucial role to play in bounding the network error. As we shall see in Theorem V.5, carefully choosing these leads to sublinear regret and fit.

Next, we combine (23) and Lemma V.2 resulting in two intermediate bounds, which shall be needed to subsequently bound the dynamic regret and fit respectively.

###### Lemma V.4.

Suppose Assumptions A-C hold. Then, the sequences generated by Algorithm 1 satisfy

 T∑t=1n∑i=1(fi,t(xi,t)−fi,t(x∗t))≤nB212T∑t=1γt+1+nG2μT∑t=1αt+1 +n∑i=1[1α2DR(x∗1,xi,1)−1αT+2DR(x∗T+1,xi,T+1)] +nKαT+2T∑t=1∥x∗t+1−x∗t∥+nKd((X))αT+2 −12T∑t=1(1γt−1γt+1+βt+1)n∑i=1∥qi,t∥2, (29)

and

 n∑i=1∥∥ ∥∥[T∑t=1gi,t(xi,t)]+∥∥ ∥∥2 ≤4[12γ1+T∑t=1(G2αt+1μ+βt+12)]{2nFT +nB212T∑t=1γt+1+nG2μT∑t=1αt+1 +n∑i=1[1α2DR(x∗1,xi,1)−1αT+2DR(x∗T+1,xi,T+1)] +nKαT+2T∑t=1∥x∗t+1−x∗t∥+nKd((X))αT+2 −12T∑t=1(1γt−1γt+1+βt+1)n∑i=1∥qi,t−¯qi∥2}. (30)
###### Remark 4.

(29) follows by adding (23) and (27), and substituting . Similarly, (30) is obtained by adding (23) and (27), and substituting

 ¯qi=[∑Tt=1gi,t(xi,t)]+2[12γ1+∑Tt=1(G2αt+1μ+βt+12)], ∀ i∈[n]. (31)

Before presenting out final result, we need to use the following upper bound to bound the fit.

 1nn∑i=11nn∑j=1∥∥ ∥∥[T∑t=1gi,t(xj,t)]+∥∥ ∥∥2 ≤2[2LT∑t=1∥xi,t−¯xt∥]2+2nn∑i=1∥∥ ∥∥[T∑t=1gi,t(xi,t)]+∥∥ ∥∥2. (32)

This follows from Lipschitz continuity of the constraint functions (Assumption (B2)). Since, we have bounded both the terms in (32) (the first term in Lemma V.3, and the second term in Lemma V.4), we are now ready to present our final result on the sublinearity of both dynamic regret and fit.

### V-B Dynamic Regret and Fit Bounds

###### Theorem V.5.

Suppose Assumptions A-C hold, and be the sequence of local estimates generated by Algorithm 1. We choose the step sizes

 αt=1ta, βt=1tb, γt=1t1−b,∀ t∈N+ (33)

where, and . Then for any .

 RegdT≤R1Tmax{a,1−a+b}+2KTaC∗T, (34) 1nn∑i=11nn∑j=1∥∥ ∥∥[T∑t=1gi,t(xj,t)]+∥∥ ∥∥2 ≤D1T2−b+D2T1+a−bC∗T+D3T2+2b−2a. (35)

Here, , and are constants independent of , and

 C∗T≜T∑t=1∥x∗t+1−x∗t∥ (36)

is the accumulated dynamic variation of the comparator sequence .

###### Remark 5.

The dynamic regret is sublinear as long as the cumulative consecutive variations of the dynamic comparators is sublinear. This is the standard requirement for sublinearity of dynamic regret [11, 2, 9].

###### Remark 6.

A similar argument as above holds for (35). As long as is sublinear, we have

 1nn∑i=11nn∑j=1∥∥ ∥∥[T∑t=1gi,t(xj,t)]+∥∥ ∥∥2=o(T2). (37)

Note that (35) has , while fit (4) is defined with . However, for large enough , each of the constituent terms in (37) are . Consequently, , . Therefore, we get a sublinear fit

 1nn∑i=11nn∑j=1∥∥ ∥∥[T∑t=1gi,t(xj,t)]+∥∥ ∥∥=o(T). (38)

## Vi Conclusion

In this work, we considered a distributed OCO problem, with time-varying (potentially adversarial) constraints. We proposed a distributed primal-dual mirror descent based approach, in which the primal and dual updates are carried out locally at all the nodes. We utilized the challenging, but more realistic metric of dynamic regret and fit. Without assuming the more restrictive Slater’s conditions, we achieved sublinear regret and fit under mild, commonly used assumptions. To the best of our knowledge, this is the first work to consider distributed OCO problem with non-coupled local time-varying constraints, and achieve sublinear dynamic regret and fit.

## References

• [1] T. Chen, Q. Ling, and G. B. Giannakis, “An online convex optimization approach to proactive network resource allocation,” IEEE Transactions on Signal Processing, vol. 65, no. 24, pp. 6350–6364, 2017.
• [2] S. Shahrampour and A. Jadbabaie, “Distributed online optimization in dynamic environments using mirror descent,” IEEE Transactions on Automatic Control, vol. 63, no. 3, pp. 714–725, 2017.
• [3] M. Zinkevich, “Online convex programming and generalized infinitesimal gradient ascent,” in

Proceedings of the 20th International Conference on Machine Learning (ICML-03)

, 2003, pp. 928–936.
• [4] S. Shalev-Shwartz et al., “Online learning and online convex optimization,” Foundations and Trends® in Machine Learning, vol. 4, no. 2, pp. 107–194, 2012.
• [5] E. Hazan et al., “Introduction to online convex optimization,” Foundations and Trends® in Optimization, vol. 2, no. 3-4, pp. 157–325, 2016.
• [6] A. Koppel, F. Y. Jakubiec, and A. Ribeiro, “A saddle point algorithm for networked online convex optimization,” IEEE Transactions on Signal Processing, vol. 63, no. 19, pp. 5149–5164, 2015.
• [7] S. Paternain, S. Lee, M. M. Zavlanos, and A. Ribeiro, “Distributed constrained online learning,” arXiv preprint arXiv:1903.06310, 2019.
• [8] X. Li, X. Yi, and L. Xie, “Distributed online optimization for multi-agent networks with coupled inequality constraints,” arXiv preprint arXiv:1805.05573, 2018.
• [9] X. Yi, X. Li, L. Xie, and K. H. Johansson, “Distributed online convex optimization with time-varying coupled inequality constraints,” arXiv preprint arXiv:1903.04277, 2019.
• [10] E. Hazan, A. Agarwal, and S. Kale, “Logarithmic regret algorithms for online convex optimization,” Machine Learning, vol. 69, no. 2-3, pp. 169–192, 2007.
• [11] E. C. Hall and R. M. Willett, “Online convex optimization in dynamic environments,” IEEE Journal of Selected Topics in Signal Processing, vol. 9, no. 4, pp. 647–662, 2015.
• [12] O. Besbes, Y. Gur, and A. Zeevi, “Non-stationary stochastic optimization,” Operations research, vol. 63, no. 5, pp. 1227–1244, 2015.
• [13] M. Mahdavi, R. Jin, and T. Yang, “Trading regret for efficiency: online convex optimization with long term constraints,” Journal of Machine Learning Research, vol. 13, no. Sep, pp. 2503–2528, 2012.
• [14] R. Jenatton, J. Huang, and C. Archambeau, “Adaptive algorithms for online convex optimization with long-term constraints,” in International Conference on Machine Learning, 2016, pp. 402–411.
• [15] J. Yuan and A. Lamperski, “Online convex optimization for cumulative constraints,” in Advances in Neural Information Processing Systems, 2018, pp. 6137–6146.
• [16] W. Sun, D. Dey, and A. Kapoor, “Safety-aware algorithms for adversarial contextual bandit,” in Proceedings of the 34th International Conference on Machine Learning-Volume 70.   JMLR. org, 2017, pp. 3280–3288.
• [17] X. Cao, J. Zhang, and H. V. Poor, “A virtual-queue-based algorithm for constrained online convex optimization with applications to data center resource allocation,” IEEE Journal of Selected Topics in Signal Processing, vol. 12, no. 4, pp. 703–716, 2018.
• [18] H. Yu, M. Neely, and X. Wei, “Online convex optimization with stochastic constraints,” in Advances in Neural Information Processing Systems, 2017, pp. 1428–1438.
• [19] H. H. Bauschke and J. M. Borwein, “Joint and separate convexity of the bregman distance,” in Studies in Computational Mathematics.   Elsevier, 2001, vol. 8, pp. 23–36.
• [20] A. Jadbabaie, A. Rakhlin, S. Shahrampour, and K. Sridharan, “Online optimization: Competing with dynamic comparators,” in Artificial Intelligence and Statistics, 2015, pp. 398–406.