Many problems of practical interest, including network resource allocation , target tracking , network routing , online regression , and spam filtering  can be framed in an Online Convex Optimization (OCO) framework. The OCO framework first introduced in  aims to minimize a time varying convex objective function which is revealed to the observer in a sequential manner. For a detailed review of OCO, please see [5, 4]. In this work, we consider a constrained OCO problem, with time-varying (potentially adversarial) constraints.
Recently, distributed OCO frameworks have gained popularity as they distribute the computational and memory resources across multiple nodes rather than having a central node perform all the operations [6, 2, 7, 8, 9]. We consider the constrained OCO problem in a distributed framework, where the convex objective is assumed to be decomposed and distributed across a set of multiple communicating agents. Each agent takes its own action with the goal of minimizing the dynamically varying global function while satisfying its individual constraints. Next, we discuss the related work along with the performance metrics we use to evaluate the performance of the proposed algorithm.
I-a Related Work
Regret: The performance in OCO problems is quantified in terms of how well the agent does as compared to an offline system, over time. In other words, how much the agent “regrets” not having the information, which was revealed to it post-hoc, to begin with. Since regret is cumulative over time, an algorithm that achieves sub-linear increase in regret with time, asymptotically achieves zero average loss. It is naturally desirable to compare against an offline system, the action(s) of which are “optimal” in some sense.
Static Regret: The initial work on OCO, starting with [3, 4, 5], almost exclusively focused on static regret , which uses an optimal static solution, in hindsight, as the benchmark. In other words, the fictitious offline adversary w.r.t. which the online system measures its regret, chooses the best fixed strategy, assuming it has access to the entire information, which is revealed to the online system over time horizon .
. However, for applications such as online parameter estimation or tracking moving targets, where the quantity of interest also evolves over time, comparison with a static benchmark is not sufficient.
This deficiency led to the development of dynamic regret [11, 12]. Rather than comparing the performance relative to a fixed optimal strategy, a more demanding benchmark is used. More precisely, at each time instant, our fictitious adversary utilizes one-step look-ahead information to adopt the optimal strategy at the current time instant.
In this work, we adopt the notion of dynamic regret as the performance metric. It must, however, be noted that, in the worst case, it is impossible to achieve sublinear dynamic regret . For such problems, the growth of dynamic regret is captured by the regularity measure which measures variations of the minimizer sequence over time (see in Theorem V.5).
Constraints: The conventional approaches for OCO are based on projection-based gradient descent-like algorithms. However, when working with functional inequality constraints (as opposed to simple convex feasible set constraints), the projection step in itself is computationally intensive. This led to the development of primal-dual algorithms for OCO [13, 14, 15]. Instead of attempting to satisfy the constraints at each time instant, the constraints are satisfied in the long run. In other words, the cumulative accumulation of instantaneous constraint violations (often simply called fit) is shown to be sublinear in . This formulation allows constraint violations at some instants to be “taken-care-of” by strictly feasible actions at other times.111Some more recent works  have considered the more stringent constraint violation metric .
Initially the constraints were assumed to be static across time [13, 14]. However, subsequent literature [16, 1] demonstrated that the analysis for primal-dual methods can be generalized to even handle time-varying inequality constraints. Minor variations of primal-dual methods, which replace the dual update step with virtual-queue (modified Lagrange multiplier) updates have also been proposed to handle time-varying  and stochastic constraints .
Distributed OCO Problems: So far we have only discussed centralized problems. Suppose the OCO system has a network of agents, and local cost (and possibly constraint) functions are revealed to each agent over time. The global objective is to minimize the total cost function, while also satisfying all the constraints. And each agent can only communicate with those agents that are in its immediate neighborhood. This distributed OCO problem is more challenging and much less studied in the literature than the centralized problem.
Distributed OCO problems with static set constraints have been widely studied in recent years [6, 2, 7, 8, 9]. Again here, the literature on distributed OCO with dynamic regret is much sparser than for static regret. The authors in  have proposed a dynamic mirror descent based algorithm, where primal update steps are alternated with local consensus steps. The authors in  have proposed a distributed primal-dual algorithm for the OCO problem with coupled inequality constraints. The constraint functions are static over time. This has been generalized for time-varying coupled constraints in , where the authors have shown sublinearity of regret and fit, both w.r.t. dynamic and static benchmarks. However, to the best of our knowledge, the distributed OCO problem with a dynamic benchmark, even with static non-coupled inequality constraints has so far not been considered in the literature.
I-B Our Contributions
In this work, we consider a distributed online convex optimization problem, where both the cost functions and the time-varying inequality constraints are revealed locally to the individual nodes. We propose a primal-dual mirror-descent based algorithm, which alternates between the local primal and dual update steps and the consensus steps to mix the local primal variables with the immediate neighbors. Importantly, we show that the proposed algorithm achieves sublinear dynamic regret and fit.
I-C Paper Organization and Notations
The paper is organized as follows: the problem formulation is discussed in Section II, along with the definitions of the performance metrics. In Section III, we provide some background results and the assumptions required for providing theoretical guarantees. We propose our primal-dual mirror descent based algorithm in Section IV, followed by the theoretical results in Section V. Finally, we conclude the paper in Section VI.
Notations:Vectors are denoted with lowercase bold letters, e.g., , while matrices are denoted using uppercase bold letters, e.g., . The set of positive integers is represented by . We use to denote the -dimensional non-negative orthant. For , the set is denoted by . We denote by the Euclidean norm for vectors, and the induced 2-norm for matrices. denotes a zero vector, where the dimension is clear from the context. denotes the projection onto .
Ii Problem Formulation
We consider a network of agents. At each time instant , each agent takes an action , where the set
is fixed across time, across all the nodes. Then, a set of local loss functionswith are revealed to the individual nodes, leading to individual loss at node . Additionally, another set of local functions with are revealed, corresponding to local constraints . The network objective is to minimize the global average of the local cost functions , while also satisfying all the local constraint functions .
Since the objective is to minimize the global function , the nodes need to communicate among themselves. We next define the metrics used to measure the performance of the proposed approach.
Ii-a Performance Metrics - Dynamic Regret and Fit
where is the local action of agent at time , while is the solution of the following problem.
As pointed out earlier, it is impossible to satisfy the time-varying constraints instantaneously, since they are revealed post-hoc. As a surrogate, to ensure the local constraints are satisfied in the long run, we use the distributed extension of fit as the performance metric. Fit has been used in the context of both time-invariant , as well as time-varying constraints [1, 6], for single node problems. Our definition is motivated by the one given in  for continuous time problems. It measures the average accumulation of constraint violations over time.
Here, is the constraint violation at agent , if it adopts the actions of agent . Note that is different from requiring the constraint to be met at every time instant .
Iii Background and Assumptions
We assume the agents are connected together via an undirected graph . denotes the set of nodes of the graph, each of which represents an agent. is the set of edges between the nodes. implies that nodes and are connected in the graph. The set of edges has an associated weight matrix , such that
The set of neighbors of node is, therefore, defined as . Note that .
Assumption A: The network is connected. The weight matrix is symmetric, doubly stochastic, such that
Next, we discuss the properties of the local cost functions and constraints.
Iii-B Local Objective Functions and Constraints
Assumption B: We assume the following conditions on the set , the objective and constraint functions.
The set is convex and compact. Therefore, there exists a positive constant such that
The local node functions are Lipschitz continuous on , i.e.,
for any .
The functions are convex and uniformly bounded on the set , i.e., there exists a constant such that
exist and are uniformly bounded on , i.e., there exists a constant such that
Next, we briefly discuss the Bregman Divergence measure, which is crucial to the proposed mirror descent based approach.
Iii-C Bregman Divergence
Suppose we are given a -strongly convex function , i.e. , . The Bregman Divergence w.r.t. is defined as
Since is -strongly convex, for any
We assume the following conditions on .
We next give a result on Bregman divergence from  which is crucial to our analysis.
Let be a -strongly convex function. Also, assume is a closed, convex set in and is a convex function. Assume that exists . Then, given , the regularized Bregman projection
satisfies the following inequality
For a set , the projection operator is defined as
. For closed and convex , projection always exists and is unique. If , projection is denoted by and it satisfies
Iv Distributed Primal-Dual Mirror Descent based Algorithm
We next discuss the proposed distributed primal-dual mirror descent based algorithm for online convex optimization with time-varying constraints. The pseudo-code is outlined in Algorithm 1. The algorithm runs in parallel at all the nodes. At the end of time , is the action (primal variable) at node . Following this, the local functions are revealed to the agent. The corresponding function values and gradients are utilized to carry-out the updates in the next time step . First, each agent performs the primal update locally (Step 8). This is followed by the dual update (Step 11). Note that the projection ensures that the dual variable lies in the non-negative orthant . At the end of each time step, an average consensus step is taken across the nodes, where the local updated primal variables are received from the neighbors, to compute the action .
Note that the primal and dual update steps employ different step-sizes, and , respectively. This idea originated in  and leads to flexibility in terms of the trade-off between the bounds on dynamic regret and fit.
In the next section, we bound the dynamic regret and fit which result from Algorithm 1, and show them to be sublinear in the time-horizon .
V Dynamic Regret and Fit Bounds
First, we discuss some intermediate results required to show the sublinearity of dynamic regret and fit. We have omitted the proofs due to space limitations. Our analysis follows closely the work in  and .
V-a Some Intermediate Results
Suppose Assumption B holds. , , generated by Algorithm 1 satisfy
and are arbitrary vectors in .
The penalty term in the dual update (step 11, Algorithm 1) helps in upper bounding the local dual variables. This idea was initially used in  and helps get rid of the requirement of Slater’s condition. measures the regularized drift of the local dual variables. See  and  for similar results, respectively in centralized and distributed contexts.
Next, we sum the left hand side of (21) over to get
The function will be used later in Lemma V.4 to upper bound both the dynamic regret and fit, by appropriately choosing .
Before looking at the primal updates, we first consider one of the constituent terms in (2).
Next, we upper bound both the terms in (26). First, we upper bound the first term in the following lemma.
Suppose Assumptions A-C hold. , , if is the sequence generated by Algorithm 1. Then,
Next, we upper bound the second term in (26). This is the consensus error of the primal variables.
The network error bound is (28) is independent of the node index . The dependence on captures the speed with which mixing of iterates happens. The smaller the value of , the faster the network error diminishes. Moreover, the choice of the primal update step sizes and the dual update regularization parameters has a crucial role to play in bounding the network error. As we shall see in Theorem V.5, carefully choosing these leads to sublinear regret and fit.
Suppose Assumptions A-C hold. Then, the sequences generated by Algorithm 1 satisfy
Before presenting out final result, we need to use the following upper bound to bound the fit.
This follows from Lipschitz continuity of the constraint functions (Assumption (B2)). Since, we have bounded both the terms in (32) (the first term in Lemma V.3, and the second term in Lemma V.4), we are now ready to present our final result on the sublinearity of both dynamic regret and fit.
V-B Dynamic Regret and Fit Bounds
Suppose Assumptions A-C hold, and be the sequence of local estimates generated by Algorithm 1. We choose the step sizes
where, and . Then for any .
Here, , and are constants independent of , and
is the accumulated dynamic variation of the comparator sequence .
In this work, we considered a distributed OCO problem, with time-varying (potentially adversarial) constraints. We proposed a distributed primal-dual mirror descent based approach, in which the primal and dual updates are carried out locally at all the nodes. We utilized the challenging, but more realistic metric of dynamic regret and fit. Without assuming the more restrictive Slater’s conditions, we achieved sublinear regret and fit under mild, commonly used assumptions. To the best of our knowledge, this is the first work to consider distributed OCO problem with non-coupled local time-varying constraints, and achieve sublinear dynamic regret and fit.
-  T. Chen, Q. Ling, and G. B. Giannakis, “An online convex optimization approach to proactive network resource allocation,” IEEE Transactions on Signal Processing, vol. 65, no. 24, pp. 6350–6364, 2017.
-  S. Shahrampour and A. Jadbabaie, “Distributed online optimization in dynamic environments using mirror descent,” IEEE Transactions on Automatic Control, vol. 63, no. 3, pp. 714–725, 2017.
M. Zinkevich, “Online convex programming and generalized infinitesimal
gradient ascent,” in
Proceedings of the 20th International Conference on Machine Learning (ICML-03), 2003, pp. 928–936.
-  S. Shalev-Shwartz et al., “Online learning and online convex optimization,” Foundations and Trends® in Machine Learning, vol. 4, no. 2, pp. 107–194, 2012.
-  E. Hazan et al., “Introduction to online convex optimization,” Foundations and Trends® in Optimization, vol. 2, no. 3-4, pp. 157–325, 2016.
-  A. Koppel, F. Y. Jakubiec, and A. Ribeiro, “A saddle point algorithm for networked online convex optimization,” IEEE Transactions on Signal Processing, vol. 63, no. 19, pp. 5149–5164, 2015.
-  S. Paternain, S. Lee, M. M. Zavlanos, and A. Ribeiro, “Distributed constrained online learning,” arXiv preprint arXiv:1903.06310, 2019.
-  X. Li, X. Yi, and L. Xie, “Distributed online optimization for multi-agent networks with coupled inequality constraints,” arXiv preprint arXiv:1805.05573, 2018.
-  X. Yi, X. Li, L. Xie, and K. H. Johansson, “Distributed online convex optimization with time-varying coupled inequality constraints,” arXiv preprint arXiv:1903.04277, 2019.
-  E. Hazan, A. Agarwal, and S. Kale, “Logarithmic regret algorithms for online convex optimization,” Machine Learning, vol. 69, no. 2-3, pp. 169–192, 2007.
-  E. C. Hall and R. M. Willett, “Online convex optimization in dynamic environments,” IEEE Journal of Selected Topics in Signal Processing, vol. 9, no. 4, pp. 647–662, 2015.
-  O. Besbes, Y. Gur, and A. Zeevi, “Non-stationary stochastic optimization,” Operations research, vol. 63, no. 5, pp. 1227–1244, 2015.
-  M. Mahdavi, R. Jin, and T. Yang, “Trading regret for efficiency: online convex optimization with long term constraints,” Journal of Machine Learning Research, vol. 13, no. Sep, pp. 2503–2528, 2012.
-  R. Jenatton, J. Huang, and C. Archambeau, “Adaptive algorithms for online convex optimization with long-term constraints,” in International Conference on Machine Learning, 2016, pp. 402–411.
-  J. Yuan and A. Lamperski, “Online convex optimization for cumulative constraints,” in Advances in Neural Information Processing Systems, 2018, pp. 6137–6146.
-  W. Sun, D. Dey, and A. Kapoor, “Safety-aware algorithms for adversarial contextual bandit,” in Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 2017, pp. 3280–3288.
-  X. Cao, J. Zhang, and H. V. Poor, “A virtual-queue-based algorithm for constrained online convex optimization with applications to data center resource allocation,” IEEE Journal of Selected Topics in Signal Processing, vol. 12, no. 4, pp. 703–716, 2018.
-  H. Yu, M. Neely, and X. Wei, “Online convex optimization with stochastic constraints,” in Advances in Neural Information Processing Systems, 2017, pp. 1428–1438.
-  H. H. Bauschke and J. M. Borwein, “Joint and separate convexity of the bregman distance,” in Studies in Computational Mathematics. Elsevier, 2001, vol. 8, pp. 23–36.
-  A. Jadbabaie, A. Rakhlin, S. Shahrampour, and K. Sridharan, “Online optimization: Competing with dynamic comparators,” in Artificial Intelligence and Statistics, 2015, pp. 398–406.