I Introduction
Many problems of practical interest, including network resource allocation [1], target tracking [2], network routing [3], online regression [4], and spam filtering [5] can be framed in an Online Convex Optimization (OCO) framework. The OCO framework first introduced in [3] aims to minimize a time varying convex objective function which is revealed to the observer in a sequential manner. For a detailed review of OCO, please see [5, 4]. In this work, we consider a constrained OCO problem, with timevarying (potentially adversarial) constraints.
Recently, distributed OCO frameworks have gained popularity as they distribute the computational and memory resources across multiple nodes rather than having a central node perform all the operations [6, 2, 7, 8, 9]. We consider the constrained OCO problem in a distributed framework, where the convex objective is assumed to be decomposed and distributed across a set of multiple communicating agents. Each agent takes its own action with the goal of minimizing the dynamically varying global function while satisfying its individual constraints. Next, we discuss the related work along with the performance metrics we use to evaluate the performance of the proposed algorithm.
Ia Related Work
Regret: The performance in OCO problems is quantified in terms of how well the agent does as compared to an offline system, over time. In other words, how much the agent “regrets” not having the information, which was revealed to it posthoc, to begin with. Since regret is cumulative over time, an algorithm that achieves sublinear increase in regret with time, asymptotically achieves zero average loss. It is naturally desirable to compare against an offline system, the action(s) of which are “optimal” in some sense.
Static Regret: The initial work on OCO, starting with [3, 4, 5], almost exclusively focused on static regret , which uses an optimal static solution, in hindsight, as the benchmark. In other words, the fictitious offline adversary w.r.t. which the online system measures its regret, chooses the best fixed strategy, assuming it has access to the entire information, which is revealed to the online system over time horizon .
Under standard regularity conditions, for general OCO problems, a tight upper bound of has been shown for static regret [3, 10]
. However, for applications such as online parameter estimation or tracking moving targets, where the quantity of interest also evolves over time, comparison with a static benchmark is not sufficient.
This deficiency led to the development of dynamic regret [11, 12]. Rather than comparing the performance relative to a fixed optimal strategy, a more demanding benchmark is used. More precisely, at each time instant, our fictitious adversary utilizes onestep lookahead information to adopt the optimal strategy at the current time instant.
In this work, we adopt the notion of dynamic regret as the performance metric. It must, however, be noted that, in the worst case, it is impossible to achieve sublinear dynamic regret [3]. For such problems, the growth of dynamic regret is captured by the regularity measure which measures variations of the minimizer sequence over time (see in Theorem V.5).
Constraints: The conventional approaches for OCO are based on projectionbased gradient descentlike algorithms. However, when working with functional inequality constraints (as opposed to simple convex feasible set constraints), the projection step in itself is computationally intensive. This led to the development of primaldual algorithms for OCO [13, 14, 15]. Instead of attempting to satisfy the constraints at each time instant, the constraints are satisfied in the long run. In other words, the cumulative accumulation of instantaneous constraint violations (often simply called fit) is shown to be sublinear in . This formulation allows constraint violations at some instants to be “takencareof” by strictly feasible actions at other times.^{1}^{1}1Some more recent works [15] have considered the more stringent constraint violation metric .
Initially the constraints were assumed to be static across time [13, 14]. However, subsequent literature [16, 1] demonstrated that the analysis for primaldual methods can be generalized to even handle timevarying inequality constraints. Minor variations of primaldual methods, which replace the dual update step with virtualqueue (modified Lagrange multiplier) updates have also been proposed to handle timevarying [17] and stochastic constraints [18].
Distributed OCO Problems: So far we have only discussed centralized problems. Suppose the OCO system has a network of agents, and local cost (and possibly constraint) functions are revealed to each agent over time. The global objective is to minimize the total cost function, while also satisfying all the constraints. And each agent can only communicate with those agents that are in its immediate neighborhood. This distributed OCO problem is more challenging and much less studied in the literature than the centralized problem.
Distributed OCO problems with static set constraints have been widely studied in recent years [6, 2, 7, 8, 9]. Again here, the literature on distributed OCO with dynamic regret is much sparser than for static regret. The authors in [2] have proposed a dynamic mirror descent based algorithm, where primal update steps are alternated with local consensus steps. The authors in [8] have proposed a distributed primaldual algorithm for the OCO problem with coupled inequality constraints. The constraint functions are static over time. This has been generalized for timevarying coupled constraints in [9], where the authors have shown sublinearity of regret and fit, both w.r.t. dynamic and static benchmarks. However, to the best of our knowledge, the distributed OCO problem with a dynamic benchmark, even with static noncoupled inequality constraints has so far not been considered in the literature.
IB Our Contributions
In this work, we consider a distributed online convex optimization problem, where both the cost functions and the timevarying inequality constraints are revealed locally to the individual nodes. We propose a primaldual mirrordescent based algorithm, which alternates between the local primal and dual update steps and the consensus steps to mix the local primal variables with the immediate neighbors. Importantly, we show that the proposed algorithm achieves sublinear dynamic regret and fit.
IC Paper Organization and Notations
The paper is organized as follows: the problem formulation is discussed in Section II, along with the definitions of the performance metrics. In Section III, we provide some background results and the assumptions required for providing theoretical guarantees. We propose our primaldual mirror descent based algorithm in Section IV, followed by the theoretical results in Section V. Finally, we conclude the paper in Section VI.
Notations:Vectors are denoted with lowercase bold letters, e.g., , while matrices are denoted using uppercase bold letters, e.g., . The set of positive integers is represented by . We use to denote the dimensional nonnegative orthant. For , the set is denoted by . We denote by the Euclidean norm for vectors, and the induced 2norm for matrices. denotes a zero vector, where the dimension is clear from the context. denotes the projection onto .
Ii Problem Formulation
We consider a network of agents. At each time instant , each agent takes an action , where the set
is fixed across time, across all the nodes. Then, a set of local loss functions
with are revealed to the individual nodes, leading to individual loss at node . Additionally, another set of local functions with are revealed, corresponding to local constraints . The network objective is to minimize the global average of the local cost functions , while also satisfying all the local constraint functions .(1) 
Since the objective is to minimize the global function , the nodes need to communicate among themselves. We next define the metrics used to measure the performance of the proposed approach.
Iia Performance Metrics  Dynamic Regret and Fit
We use the recently defined notion of dynamic regret [12, 11] to measure the performance relative to a timevarying benchmark.
(2) 
where is the local action of agent at time , while is the solution of the following problem.
(3) 
As pointed out earlier, it is impossible to satisfy the timevarying constraints instantaneously, since they are revealed posthoc. As a surrogate, to ensure the local constraints are satisfied in the long run, we use the distributed extension of fit as the performance metric. Fit has been used in the context of both timeinvariant [13], as well as timevarying constraints [1, 6], for single node problems. Our definition is motivated by the one given in [7] for continuous time problems. It measures the average accumulation of constraint violations over time.
(4) 
Here, is the constraint violation at agent , if it adopts the actions of agent . Note that is different from requiring the constraint to be met at every time instant .
Iii Background and Assumptions
Iiia Network
We assume the agents are connected together via an undirected graph . denotes the set of nodes of the graph, each of which represents an agent. is the set of edges between the nodes. implies that nodes and are connected in the graph. The set of edges has an associated weight matrix , such that
(5) 
The set of neighbors of node is, therefore, defined as . Note that .
Assumption A: The network is connected. The weight matrix is symmetric, doubly stochastic, such that
(6) 
Next, we discuss the properties of the local cost functions and constraints.
IiiB Local Objective Functions and Constraints
Assumption B: We assume the following conditions on the set , the objective and constraint functions.

The set is convex and compact. Therefore, there exists a positive constant such that
(7) 
The local node functions are Lipschitz continuous on , i.e.,
(8) for any .

The functions are convex and uniformly bounded on the set , i.e., there exists a constant such that
(9) 
exist and are uniformly bounded on , i.e., there exists a constant such that
(10) .
Next, we briefly discuss the Bregman Divergence measure, which is crucial to the proposed mirror descent based approach.
IiiC Bregman Divergence
Suppose we are given a strongly convex function , i.e. , . The Bregman Divergence w.r.t. is defined as
(11) 
Since is strongly convex, for any
(12) 
We assume the following conditions on .
Assumption C:

Separate Convexity property [19]: Given and scalars on the
dimensional probability simplex, the Bregman Divergence satisfies
(13)
We next give a result on Bregman divergence from [9] which is crucial to our analysis.
Lemma III.1.
Let be a strongly convex function. Also, assume is a closed, convex set in and is a convex function. Assume that exists . Then, given , the regularized Bregman projection
(16) 
satisfies the following inequality
(17) 
.
IiiD Projection
For a set , the projection operator is defined as
(18) 
. For closed and convex , projection always exists and is unique. If , projection is denoted by and it satisfies
(19) 
Iv Distributed PrimalDual Mirror Descent based Algorithm
We next discuss the proposed distributed primaldual mirror descent based algorithm for online convex optimization with timevarying constraints. The pseudocode is outlined in Algorithm 1. The algorithm runs in parallel at all the nodes. At the end of time , is the action (primal variable) at node . Following this, the local functions are revealed to the agent. The corresponding function values and gradients are utilized to carryout the updates in the next time step . First, each agent performs the primal update locally (Step 8). This is followed by the dual update (Step 11). Note that the projection ensures that the dual variable lies in the nonnegative orthant . At the end of each time step, an average consensus step is taken across the nodes, where the local updated primal variables are received from the neighbors, to compute the action .
Remark 1.
Note that the primal and dual update steps employ different stepsizes, and , respectively. This idea originated in [14] and leads to flexibility in terms of the tradeoff between the bounds on dynamic regret and fit.
In the next section, we bound the dynamic regret and fit which result from Algorithm 1, and show them to be sublinear in the timehorizon .
V Dynamic Regret and Fit Bounds
First, we discuss some intermediate results required to show the sublinearity of dynamic regret and fit. We have omitted the proofs due to space limitations. Our analysis follows closely the work in [2] and [9].
Va Some Intermediate Results
Lemma V.1.
Suppose Assumption B holds. , , generated by Algorithm 1 satisfy
(20)  
(21) 
where ,
and are arbitrary vectors in .
Remark 2.
The penalty term in the dual update (step 11, Algorithm 1) helps in upper bounding the local dual variables. This idea was initially used in [13] and helps get rid of the requirement of Slater’s condition. measures the regularized drift of the local dual variables. See [11] and [9] for similar results, respectively in centralized and distributed contexts.
Next, we sum the left hand side of (21) over to get
(22) 
Recall that , . We combine (21) and (22), and define such that
(23) 
The function will be used later in Lemma V.4 to upper bound both the dynamic regret and fit, by appropriately choosing .
Before looking at the primal updates, we first consider one of the constituent terms in (2).
(24)  
(25) 
We use assumption (B2) to obtain both (24), (25). Now, from the definition of dynamic regret (2), we get
(26) 
Next, we upper bound both the terms in (26). First, we upper bound the first term in the following lemma.
Lemma V.2.
Suppose Assumptions AC hold. , , if is the sequence generated by Algorithm 1. Then,
(27) 
Next, we upper bound the second term in (26). This is the consensus error of the primal variables.
Lemma V.3.
(Network Error): Suppose Assumptions AC hold. Then, the local estimates generated by Algorithm 1 satisfy
(28) 
, where .
is the second largest eigenvalue of
in magnitude.Remark 3.
The network error bound is (28) is independent of the node index . The dependence on captures the speed with which mixing of iterates happens. The smaller the value of , the faster the network error diminishes. Moreover, the choice of the primal update step sizes and the dual update regularization parameters has a crucial role to play in bounding the network error. As we shall see in Theorem V.5, carefully choosing these leads to sublinear regret and fit.
Next, we combine (23) and Lemma V.2 resulting in two intermediate bounds, which shall be needed to subsequently bound the dynamic regret and fit respectively.
Lemma V.4.
Remark 4.
Before presenting out final result, we need to use the following upper bound to bound the fit.
(32) 
This follows from Lipschitz continuity of the constraint functions (Assumption (B2)). Since, we have bounded both the terms in (32) (the first term in Lemma V.3, and the second term in Lemma V.4), we are now ready to present our final result on the sublinearity of both dynamic regret and fit.
VB Dynamic Regret and Fit Bounds
Theorem V.5.
Suppose Assumptions AC hold, and be the sequence of local estimates generated by Algorithm 1. We choose the step sizes
(33) 
where, and . Then for any .
(34)  
(35) 
Here, , and are constants independent of , and
(36) 
is the accumulated dynamic variation of the comparator sequence .
Remark 5.
Vi Conclusion
In this work, we considered a distributed OCO problem, with timevarying (potentially adversarial) constraints. We proposed a distributed primaldual mirror descent based approach, in which the primal and dual updates are carried out locally at all the nodes. We utilized the challenging, but more realistic metric of dynamic regret and fit. Without assuming the more restrictive Slater’s conditions, we achieved sublinear regret and fit under mild, commonly used assumptions. To the best of our knowledge, this is the first work to consider distributed OCO problem with noncoupled local timevarying constraints, and achieve sublinear dynamic regret and fit.
References
 [1] T. Chen, Q. Ling, and G. B. Giannakis, “An online convex optimization approach to proactive network resource allocation,” IEEE Transactions on Signal Processing, vol. 65, no. 24, pp. 6350–6364, 2017.
 [2] S. Shahrampour and A. Jadbabaie, “Distributed online optimization in dynamic environments using mirror descent,” IEEE Transactions on Automatic Control, vol. 63, no. 3, pp. 714–725, 2017.

[3]
M. Zinkevich, “Online convex programming and generalized infinitesimal
gradient ascent,” in
Proceedings of the 20th International Conference on Machine Learning (ICML03)
, 2003, pp. 928–936.  [4] S. ShalevShwartz et al., “Online learning and online convex optimization,” Foundations and Trends® in Machine Learning, vol. 4, no. 2, pp. 107–194, 2012.
 [5] E. Hazan et al., “Introduction to online convex optimization,” Foundations and Trends® in Optimization, vol. 2, no. 34, pp. 157–325, 2016.
 [6] A. Koppel, F. Y. Jakubiec, and A. Ribeiro, “A saddle point algorithm for networked online convex optimization,” IEEE Transactions on Signal Processing, vol. 63, no. 19, pp. 5149–5164, 2015.
 [7] S. Paternain, S. Lee, M. M. Zavlanos, and A. Ribeiro, “Distributed constrained online learning,” arXiv preprint arXiv:1903.06310, 2019.
 [8] X. Li, X. Yi, and L. Xie, “Distributed online optimization for multiagent networks with coupled inequality constraints,” arXiv preprint arXiv:1805.05573, 2018.
 [9] X. Yi, X. Li, L. Xie, and K. H. Johansson, “Distributed online convex optimization with timevarying coupled inequality constraints,” arXiv preprint arXiv:1903.04277, 2019.
 [10] E. Hazan, A. Agarwal, and S. Kale, “Logarithmic regret algorithms for online convex optimization,” Machine Learning, vol. 69, no. 23, pp. 169–192, 2007.
 [11] E. C. Hall and R. M. Willett, “Online convex optimization in dynamic environments,” IEEE Journal of Selected Topics in Signal Processing, vol. 9, no. 4, pp. 647–662, 2015.
 [12] O. Besbes, Y. Gur, and A. Zeevi, “Nonstationary stochastic optimization,” Operations research, vol. 63, no. 5, pp. 1227–1244, 2015.
 [13] M. Mahdavi, R. Jin, and T. Yang, “Trading regret for efficiency: online convex optimization with long term constraints,” Journal of Machine Learning Research, vol. 13, no. Sep, pp. 2503–2528, 2012.
 [14] R. Jenatton, J. Huang, and C. Archambeau, “Adaptive algorithms for online convex optimization with longterm constraints,” in International Conference on Machine Learning, 2016, pp. 402–411.
 [15] J. Yuan and A. Lamperski, “Online convex optimization for cumulative constraints,” in Advances in Neural Information Processing Systems, 2018, pp. 6137–6146.
 [16] W. Sun, D. Dey, and A. Kapoor, “Safetyaware algorithms for adversarial contextual bandit,” in Proceedings of the 34th International Conference on Machine LearningVolume 70. JMLR. org, 2017, pp. 3280–3288.
 [17] X. Cao, J. Zhang, and H. V. Poor, “A virtualqueuebased algorithm for constrained online convex optimization with applications to data center resource allocation,” IEEE Journal of Selected Topics in Signal Processing, vol. 12, no. 4, pp. 703–716, 2018.
 [18] H. Yu, M. Neely, and X. Wei, “Online convex optimization with stochastic constraints,” in Advances in Neural Information Processing Systems, 2017, pp. 1428–1438.
 [19] H. H. Bauschke and J. M. Borwein, “Joint and separate convexity of the bregman distance,” in Studies in Computational Mathematics. Elsevier, 2001, vol. 8, pp. 23–36.
 [20] A. Jadbabaie, A. Rakhlin, S. Shahrampour, and K. Sridharan, “Online optimization: Competing with dynamic comparators,” in Artificial Intelligence and Statistics, 2015, pp. 398–406.
Comments
There are no comments yet.