Random coordinate descent algorithm for open multi-agent systems with complete topology and homogeneous agents

We study the convergence in expectation of the Random Coordinate Descent algorithm (RCD) for solving optimal resource allocations problems in open multi-agent systems, i.e., multi-agent systems that are subject to arrivals and departures of agents. Assuming all local functions are strongly-convex and smooth, and their minimizers lie in a given ball, we analyse the evolution of the distance to the minimizer in expectation when the system is occasionally subject to replacements in addition to the usual iterations of the RCD algorithm. We focus on complete graphs where all agents interact with each other with the same probability, and provide conditions to guarantee convergence in open system. Finally, a discussion around the tightness of our results is provided.

Authors

• 4 publications
• 1 publication
• 15 publications
• 6 publications
• 1 publication
09/05/2019

Lower bound performances for average consensus in open multi-agent systems (extended version)

We derive fundamental limitations on the performances of intrinsic avera...
09/11/2020

Stability of Decentralized Gradient Descent in Open Multi-Agent Systems

The aim of decentralized gradient descent (DGD) is to minimize a sum of ...
04/22/2014

A Formal Analysis of Required Cooperation in Multi-agent Planning

Research on multi-agent planning has been popular in recent years. While...
09/07/2020

Open Multi-Agent Systems with Variable Size: the Case of Gossiping

We consider open multi-agent systems, which are systems subject to frequ...
09/15/2017

Open Multi-Agent Systems: Gossiping with Random Arrivals and Departures

We consider open multi-agent systems. Unlike the systems usually studied...
10/21/2018

Dynamic Average Diffusion with randomized Coordinate Updates

This work derives and analyzes an online learning strategy for tracking ...
09/20/2019

Regularized Diffusion Adaptation via Conjugate Smoothing

The purpose of this work is to develop and study a distributed strategy ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

We consider the optimal resource allocation problem stated as follows, where a budget must be distributed among agents according to some weight distribution while minimizing the total cost built upon local costs (the weights are thus scalar):

 minx∈Rn f(x)=n∑i=1fi(xi) subject to n∑i=1aixi=b. (1)

Such problems arise in different fields of research, including power systems [yi2016initialization], actuator networks [teixeira2013distributed], and games [liang2017distributed]. Some of the first approaches introduced to solve (1) rely on distributed algorithms based on the well known gradient descent [xiao2006optimal]. Algorithms of this type however require computing the full gradient of the network, such that the computational complexity can be too high for large systems.

To reduce the computational complexity of gradient-based algorithms, Nesterov introduced in [nesterov2012efficiency] the coordinate descent algorithm where optimization steps are performed along only one direction at each iteration. Several extensions of this algorithm have been developed, including a block coordinate update [richtarik2014iteration], where more than one direction is optimized at each iteration. In such algorithms, the sequence of coordinates in which updates are performed plays an important role, and it is well-known that randomized choices can guarantee convergence. Hence, [necoara2013random] proposed a random coordinate descent (RCD) algorithm, where at each iteration only a pair of local gradients must be evaluated, and where that pair is randomly selected, guaranteeing convergence at the same time as reducing computational complexity.

In some applications of (1), agents are able to join and leave the system at a time-scale similar to that of the process. Consider for instance the integration of distributed energy resources [dominguez2012decentralized], where some devices (agents) supplying a total amount of resource (budget) can sometimes be unavailable because of a fault or where local objective-functions might be time-varying (e.g., due to environmental conditions for photovoltaic systems). When the size of the system increases, the probability for such perturbations to happen at the scale of the whole system increases as well, giving rise to optimization problems in open multi-agent systems. In that case, arrivals and departures have a significant effect on the course of algorithms and even the most basic algorithms fail to guarantee convergence due to the continuous change of the set of agents. In particular, arrivals and departures of agents result in variations of the cost functions during the process, and hence of the location of the minimizer as well, which prevent convergence.

Motivated by the possible changes of the functions in (1), we analyze the performance of the RCD algorithm introduced in [necoara2013random] in a system subject to possible replacements of cost functions at each iteration. We extend the results of [necoara2013random] by analysing the convergence rate in expectation of the distance to the minimizer in open systems, under the assumption that each iteration is either an RCD update or a replacement. In this work, we focus on complete graphs such that each pair of agents updates its state at some iteration with the same probability, and we assume that the local objective functions are smooth and strongly convex. We then analyze the tightness of our results by considering the particular case of quadratic cost functions, and relying on the PESTO toolbox [PESTO], which allows deriving exact empirical bounds for convex problems.

I-a State of the art

In the last years, traditional algorithms have been applied and analysed in open multi-agent systems, such as gossiping in [de2020open, hendrickx2017open, OMAS:ARXIV_ITAC:FPL:2020], dynamic consensus in [franceschelli2020stability, dashti2019dynamic], and stochastic interactions in [vizuete2020influence, varma2018open]. Optimization in open system is also getting attention, such as in [hsieh2021optimization], or in [OpenDo:OpenDGDStability] where the authors studied the stability of the decentralized gradient descent algorithm where the agents try to reach agreement and can be replaced at each iteration.

An alternative line of work on time-varying objective functions, called online optimization [DO:online-varyingFunctions, shahrampour2017distributed], aims at building at each time

an estimate

in a way that keeps the regret function, commonly defined as

 RegT:=∑Tt=1(ft(xt)−minxft(x)), (2)

as small as possible. Nevertheless, our work is essentially different because the objective of our algorithms is to be at all times as close as possible to the instantaneous minimizer of (1).

Ii Problem statement

For two vectors

, we denote by the standard Euclidean inner product, and the Euclidean norm by . We also denote the vector of size constituted of only ones by

and the identity matrix of dimension

by . Let denote the ball of radius centered at .

Ii-a Resource allocation problem

We consider the resource allocation problem (1) where we restrict our attention to 1-dimensional local cost functions for all , and make the following classical assumption.

Assumption 1

Each function is continuously differentiable, -strongly convex (i.e., is convex) and -smooth (i.e., , ).

We let denote the set containing the functions satisfying Assumption 1 let denote the condition number of those functions. Notice that also satisfies Assumption 1, so that . This implies that the solution to (1), denoted , is unique. Moreover, and for some scalar .

In open systems, the functions can be replaced in the process so that the global minimizer changes along. To ensure that the local cost functions are consistent with each other, and prevent arbitrary changes of functions, and thus of , we follow the approach in [OpenDo:OpenDGDStability] and restrict the location of the local minimizers without loss of generality.

Assumption 2

The minimizer of each function denoted satisfies and .

The following assumption restricts our attention to the particular case where a given budget must be allocated among agents with the same priority, or where the budget is provided by a group of homogeneous agents.

Assumption 3

There holds , and we denote the feasible set of (1) in that case by

 Sb:={x∈Rn|⟨1,x⟩=b}. (3)

Ii-B Random Coordinate Descent algorithm

To problem (1), we associate a network constituted of agents such that each agent has access to a local function and a local variable . The agents can exchange information according to an undirected and connected graph where .

The Random Coordinate Descent (RCD) algorithm introduced in [necoara2013random] involves the update of the states of only a pair of neighbouring agents at each iteration, so that the numerical complexity is cheap. At a given iteration and for some feasible estimate , a pair of agents is randomly selected with probability to update as

 x+i=xi+di x+j=xj+dj,

where and are determined by solving

 (4)

This choice follows the observation that for any , the function is -smooth, and thus satisfies by definition

 g(z)≤g(w)+⟨∇g(w),z−w⟩+β2∥z−w∥2. (5)

Solving (4) thus amounts to minimizing the right hand side of (5) while ensuring that the next estimate is still feasible. Following the approach in [necoara2013random], the problem is solved by

 (6)

Under Assumption 3, one gets the following iteration rule

 x+=x−1βQij∇f(x), (7)

where is a matrix filled with zeroes except for the four following entries

 [Qij]i,i=[Qij]j,j=12; [Qij]i,j=[Qij]j,i=−12.

In this preliminary work, we restrain to fully connected networks as in [de2020open, hendrickx2017open]. We have thus all-to-all (possible) communications and each edge has the same probability to be selected at an iteration of the RCD algorithm.

Assumption 4

The graph is fully connected, and for all there holds .

Hence, under Assumption 4, there holds

 ∑(i,j)∈EpijQij=p2L, (8)

where is the Laplacian matrix of , given by

 L=nIn−1n1⊤n. (9)

Ii-C Function replacement

In this analysis, we consider that the system is open. In particular, any agent can be replaced during the process, in which case it receives a new local objective function satisfying Assumptions 1 and 2 and maintains its label and estimate so that is preserved. Let denote the local objective function held by the agent labelled at iteration , then (1) can be reformulated in our setting as

 minx∈Sbfk(x):=n∑i=1fki(xi). (10)

The solution of (10) thus changes with replacements, and we denote . Let be the estimate of at iteration , we define the following error metric:

 Ck:=∥xk−x∗,k∥2. (11)

Our goal is to derive a convergence rate for criterion (11) in expectation given by , where is a sequence generated by the Random Coordinate Descent algorithm (7) applied in a system subject to possible replacements of agents.

Iii Convergence of RCD in closed system

In this section we analyze the convergence rate in expectation of the RCD algorithm for criterion (11) in closed system for our setting. In that case, the minimizer does not depend on , since the local objective functions remain the same during the process. Therefore, we refer to that minimizer as in this section.

A related result was presented in [necoara2013random], where such convergence rate in expectation was derived for the objective value . Proposition 1 is thus an extension of that result for our metric, and will serve as an intermediate result for working on open systems. Interestingly, one can show that while is always decreasing, the metric can increase for certain choices of edges. We consider the following iteration rule, which is a generalization of (7) for general positive step-sizes :

 x+=x−hQij∇f(x). (12)
Proposition 1

Let a function and . Under Assumptions 1 to 4, for any positive scalar , and for any initial point , then the update rule (12) applied on the randomly selected pair of agents satisfies

 E[∥x+−x∗∥2]≤(1−hαn−1)∥x−x∗∥2. (13)

Starting from the update rule (12), there holds

 E[∥x+−x∗∥2] =∑(i,j)∈EpE[∥x−hQij∇f(x)−x∗∥2] =∑(i,j)∈Ep∥x−hQij∇f(x)−x∗∥2.

Let . It follows that

 H(x)= ∥x−x∗∥2+h2∑(i,j)∈Ep∥Qij∇f(x)∥2 −2h∑(i,j)∈Ep⟨Qij∇f(x),x−x∗⟩.

From (8), there holds . Moreover, one has and , and it follows

 n∑i=1p∥Qij∇f(x)∥2 =p∇f(x)⊤⎛⎝∑(i,j)∈E(Qij)2⎞⎠∇f(x) =p2∇f(x)⊤L∇f(x); n∑i=1p⟨Qij∇f(x),x−x∗⟩ =p∇f(x)⊤⎛⎝∑(i,j)∈EQij⎞⎠(x−x∗) =p2∇f(x)⊤L(x−x∗).

Hence, using from Assumption 4, there holds

 H (x)=∥x−x∗∥2 +1n−1(h2n∇f(x)⊤L∇f(x)−2hn∇f(x)⊤L(x−x∗)).

The optimality conditions of our problem imply for some , and from (9) we have

 L∇f(x∗)=1nnλ∗−1n1⊤n1nλ∗=0.

Hence, since , since

, and since the largest eigenvalue of

is , there holds

 h2n∇f(x)⊤L∇f(x) =h2n(∇f(x)−∇f(x∗))⊤L(∇f(x)−∇f(x∗)) ≤h2∥∇f(x)−∇f(x∗)∥2.

Moreover, one has , and using (9) yields

 hn∇f(x)⊤L(x−x∗) =h⟨∇f(x)−1n1⊤n∇f(x)1n,x−x∗⟩ =h⟨∇f(x),x−x∗⟩.

Furthermore, since , and since is -strongly convex and -smooth, it follows that

 ⟨∇f(x),x−x∗⟩ =⟨∇f(x)−∇f(x∗),x−x∗⟩ ≥β−1∥∇f(x)−∇f(x∗)∥21+κ−1+α∥x−x∗∥21+κ−1,

where we remind is the condition number of . Re-injecting those expressions into that of yields

 H(x)≤ ∥x−x∗∥2−1n−1(2hα1+κ−1∥x−x∗∥2) +1n−1((h2−2hβ−11+κ−1)∥∇f(x)−∇fx∗)∥2).

Observe that for , we have

 h2−2hβ−11+κ−1≤0,

so that for any there holds

 H(x)≤(1−21+κ−1αhn−1)∥x−x∗∥2.

Finally, since , there holds

which concludes the proof.

Observe that for , the iteration rule (12) corresponds to that of the RCD algorithm given in (7), which yields the following convergence rate in closed system

 E[∥x+−x∗∥2]≤(1−1(n−1)κ)∥x−x∗∥2. (14)

This also corresponds to the contraction rate observed in an open system upon one iteration where no replacement takes place. Observe moreover that this rate is linear, and similar to that of a gradient descent algorithm [nesterov2018lectures, MAL-050].

Iv Convergence of RCD in open system

We now consider that the system is open and suffers from occasional replacements of agents so that the local objective functions change. In particular, when a replacement occurs, then the replaced agent is uniformly randomly selected, and receives a new objective function satisfying Assumptions 1 and 2 while maintaining its estimate.

Let denote the event that an RCD iteration as defined in (7) happens on the pair of agents , and let denote the event of a replacement of agent as described above. Then we define the set of all possible events as

 Ξ=⎛⎝⋃(i,j)∈EUij⎞⎠∪(⋃i∈VRi). (15)

We consider that at each iteration one event takes place, so that we can define the history of the process up to iteration as follows:

 ωk={(1,ξ1),…,(k,ξk)}, (16)

with for all . We will work under the following assumption of statistical independence:

Assumption 5

The events constituting any sequence of events are independent of each other and of the state of the system, so that at any iteration , the event is a RCD update with probability , and a replacement with probability .

The assumption above guarantees that the replacements and RCD updates happening in the system are independent processes, and allows analyzing the behavior of the RCD algorithm by decoupling the impact of these. In the remainder of this section, we will analyze the convergence rate of the algorithm by analyzing separately the effect of updates of the algorithm and of replacements on the error metric (11).

Observe that the probabilities and act at the whole system level, and can equivalently be replaced by the corresponding probabilities acting on every single agent and edge on the system. In particular, it follows from Assumption 5

 pe=2n(n−1)pU pa=1npR, (17)

where and respectively stand for the probabilities that any given edge gets activated at a RCD update, and that any given agent (whichever it is) is replaced at some iteration.

Iv-a Impact of replacements on the error

In this section, we analyze how much the minimizer of Problem (10) is impacted by replacements, and to what extent the error is affected by these. Observe that the way we model replacements legitimates the analysis of the effect of a single change, as only one replacement at most can occur at a given iteration.

We first provide in the next proposition the region in which that minimizer can be located in our setting.

Proposition 2

Let denote the condition number of , and let . If satisfies Assumptions 1 and 2 for all , then:

 argminx∈Sbf(x)∈B(0,Rb,κ). (18)

Let such that , and let denote the minimizer of without the constraint. Observe that from Assumption 2, there holds since it amounts to evaluating every local function at their minimal values. Moreover, we have so that , and it follows that . Hence, since is -strongly convex, there holds

 f(x)≥α2∥x−¯x∗∥2>α2(1+|b|n)2κn=βn2(1+|b|n)2.

Now let . Since is -smooth, and since from Assumption 2, there holds

 f(xb)≤β2∥xb−¯x∗∥2≤βn2(1+|b|n)2.

Hence, since , there holds

 f(x)>βn2(1+|b|n)2≥f(xb)≥f(x∗),

and we conclude that cannot be the minimizer of (10).

We can now analyze the impact of a function change on the location of the minimizer. Without loss of generality, we assume that the function that gets replaced is , and for the functions satisfying Assumptions 1 and 2, we define

 x(1) :=argminx∈Sb(∑n−1i=1fi(xi)+f(1)n(xn)); x(2) :=argminx∈Sb(∑n−1i=1fi(xi)+f(2)n(xn)). (19)

We provide in the next proposition an upper bound on , built upon Proposition 2.

Proposition 3

Consider and from (IV-A), then

 ∥x(2)−x(1)∥2≤4nκ(1+1√κ+|b|n)2. (20)

From Proposition 2, the minimizer of Problem (10) satisfies

 ∥x∗,k∥2≤n(1+(1+|b|n)√κ)2.

Hence, the conclusion follows from

 ∥x(2)−x(1)∥2≤2(∥x(2)∥2+∥x(1)∥2),

as both and are such minimizers.

The bound obtained in Proposition 3 builds on the possibility for all agents to be replaced at once in a single iteration. As a consequence, the results we derive using it are valid for that more general setting. This also means that this result is a source of conservatism in the particular setting where only one agent can get replaced at a time, and it is expected that a tighter bound can be obtained in that case, especially regarding its dependence in . This possibility is discussed in detail in Section IV-C, through the study of a specific case, and based on the PESTO toolbox for performance estimation [PESTO]. However, the analysis in general remains open shall be the object of future work.

We can now evaluate the effect of replacements on the expected error .

Proposition 4

Let denote the event of a replacement happening in the system. Then there holds

 E[Ck+1|R]≤2E[Ck]+8nκ(1+1√κ+|b|n)2. (21)

Let us fix some event sequence . Using Assumption 5, there holds

 E[Ck+1|R,ωk−1]=n∑i=1piE[Ck+1|Ri,ωk−1],

where is the probability that agent is the replaced agent at the occurrence of a replacement.

Let denote the minimizer of (10) before the replacement, so that . In the event , the estimates satisfy , and there holds

 Ck+1 =∥xk+1−x∗,k+1∥2 ≤(∥xk−x∗,k∥+∥x∗,k−x∗,k+1∥)2 ≤2(Ck+∥x∗,k−x∗,k+1∥2),

where we have used the fact that for to obtain the last inequality. It then follows from Proposition 3 that

 ∥x∗,k−x∗,k+1∥2≤4nκ(1+1√κ+|b|n)2,

so that

 E[Ck+1|Ri,ωk−1]≤2Ck+8nκ(1+1√κ+|b|n)2.

The conclusion then follows from for all by definition and from taking the expectation over .

Iv-B Convergence rate

We now analyze the convergence in expectation of the RCD algorithm when the system is subject to replacements. Relying on the definition of the replacement process, our approach allows decoupling the effects of the algorithm and of replacements by considering that either a replacement or an update of the algorithm happens at each iteration. Therefore, our results strongly depend on the analysis of the effect of replacement events obtained in the previous section. Moreover, our methodology can be extended to different algorithms than the Random Coordinate Descent, as the impact of function changes is independent of the algorithm.

We provide in the following theorem a convergence rate in expectation for our error metric (11) in a system subject to replacements.

Theorem 1

Under Assumptions 1 to 5, the iteration rule (7) applied on a system subject to replacements generates a sequence of estimates satisfying for all

 E[Ck+1]≤(2−pU(1+1(n−1)κ))E[Ck]+Γ, (22)

with

 Γ=8(1−pU)(1+1√κ+|b|n)2nκ. (23)

Let and respectively denote the occurence of a RCD update and of a replacement. There holds

 E[Ck+1] =pUE[Ck+1|U]+pRE[Ck+1|R],

where we remind stands for the probability that an event is a RCD iteration, and the complementary probability that an event is a replacement, so that .

The first term corresponds to the convergence rate in expectation of a RCD iteration in closed system with a step-size of . Hence, from Proposition 1, there holds

 E[Ck+1|U] ≤(1−1(n−1)κ)E[Ck].

Similarly, the second term is obtained from Proposition 4 and there holds

Combining those expressions, and using the fact that concludes the proof.

The convergence rate obtained in Theorem 1 allows upper bounding the performance of the RCD algorithm under replacements events.

First observe that convergence is guaranteed as long as the probability for an event to be an RCD update satisfies

 pU>κ(n−1)κ(n−1)+1, (24)

which corresponds to the worst-case contraction rate guaranteeing contraction in expectation at each iteration.

Let us denote the expected number of replacements happening between two RCD updates in the whole system. Then one can reformulate (22) in terms of using the fact that , and it follows that convergence is guaranteed as long as

 ρR<1(n−1)κ, (25)

namely as long as on average at most one replacement happens every RCD updates. This is equivalently formulated in terms of and which we remind respectively denote the probability that at an event a particular agent is replaced and a particular pair of agents performs a RCD update (see (17)), and it follows that .

Observe moreover that the recurrence equation (22) can be solved, yielding

 E[Ck]−γ≤⎛⎜⎝1+ρR−1(n−1)κ1+ρR⎞⎟⎠k(E[C0]−γ), (26)

where

 γ=8nκ(1+1√κ+|b|n)ρR1(n−1)κ−ρR, (27)

so that provided convergence occurs, there holds

 limk→∞E[Ck]≤γ.

Observe that conservatism is induced by the term in the numerator of the contraction rate of (26). It exhibits how replacements can get in the way of convergence. In particular, as increases, is expected to grow unbounded. Conversely, with decreasing, it is expected that , until as (i.e., in total absence of replacements, or equivalently as ). In that case, one retrieves the contraction rate of Proposition 1, and

 E[Ck]≤(1−1(n−1)κ)kE[C0].

Iv-C Tightness Analysis

A critical part determining the tightness of our result is the analysis of the impact of a function change from Proposition 3, currently in . That result is most likely conservative because it includes the possibility for all the functions to be replaced at once, whereas only replacements of single functions are allowed by our model. In this section, we show why we expect a possible improvement of that result that does not scale with , with two different approaches.

We consider the particular case where every local objective function is quadratic, as defined in the following assumption.

Assumption 6

For all , there holds , for some , and for some .

Under Assumption 6, it is possible to obtain an alternative result for Proposition 3 in that yields the following theorem that is proved in Appendix -A.

Theorem 2

Under Assumptions 1 to 6, the iteration rule (7) applied on a system subject to replacements generates a sequence of estimates satisfying for all

 E[Ck+1]≤(2−pU(1+1(n−1)κ))E[Ck]+Γ′, (28)

with

 Γ′=(1−pU)8(κ3+κn−2κn+(|b|+n)2(κ−1)2κ2(κ2n2+n−1)n4).

The difference between Theorems 1 and 2 lies in the terms and , which are respectively in and . That difference illustrates the possible improvement achievable for our bound with respect to at the cost of its tightness in . Fig. 1 presents the results of the computations of based on 10000 realizations of the process and the upper bound given by (28) for a network constituted of agents, with , and . The figure seems to confirm the tightness of the convergence rate derived for quadratic functions provided that is not too large.

PESTO analysis

The possibility to improve our bound is also illustrated by an analysis performed using the PESTO toolbox [PESTO], which allows deriving numerical exact bounds for questions related to convex functions. Using PESTO, we obtain an upper bound on a generalization of , where and are defined in (IV-A), for multi-dimensional functions . Details on the way the analysis with PESTO was performed are presented in Appendix -B.

The results of the PESTO analysis are presented in Fig. 2, and suggest a sublinear increase of the bound with for some fixed , and with . Additional numerical exploration of that result suggests a possible asymptotic independence of the bound with respect to for fixed values of and with (similar results were observed for other values of ), and we conjecture the following bound, also illustrated in Fig. 2:

 ∥x(2)−x(1)∥2≤(κ+1)2−c1κ3n+κ+c2, (29)

for some . The above conjecture would yield an equivalent result as that of Theorem 2 with

 Γ′=2(1−pU)((κ+1)2−c1κ3n+κ+c2).

Interestingly, whereas grows in for most values of and , some choices yield a linear growth of in (e.g., if , as shown in Fig. 2). Moreover, does not grow with anymore, consistently with the improvement that we expect to achieve for future work.

V Conclusion

In this work we analyzed the random coordinate descent algorithm for a complete graph in an open multi-agent systems scenario when agents can be replaced during the iterations. We analyzed the behavior of the minimizer under replacement events, and derived an upper bound for the error in expectation and conditions for its stability.

As future work, we would like to improve the bounds for general classes of functions following the discussion on tightness performed in Section IV-C, especially since tighter bounds were obtained for particular settings and can be conjectured empirically. Possible extensions include considering agents interacting through networks with different graph topologies, and generalizing the constraint to general . Also, it would be interesting to consider the case where the states of the agents in the network are -dimensional and where more than one edge can be updated at each iteration.