Improved Convergence Rates for Distributed Resource Allocation

06/16/2017 ∙ by Angelia Nedić, et al. ∙ 0

In this paper, we develop a class of decentralized algorithms for solving a convex resource allocation problem in a network of n agents, where the agent objectives are decoupled while the resource constraints are coupled. The agents communicate over a connected undirected graph, and they want to collaboratively determine a solution to the overall network problem, while each agent only communicates with its neighbors. We first study the connection between the decentralized resource allocation problem and the decentralized consensus optimization problem. Then, using a class of algorithms for solving consensus optimization problems, we propose a novel class of decentralized schemes for solving resource allocation problems in a distributed manner. Specifically, we first propose an algorithm for solving the resource allocation problem with an o(1/k) convergence rate guarantee when the agents' objective functions are generally convex (could be nondifferentiable) and per agent local convex constraints are allowed; We then propose a gradient-based algorithm for solving the resource allocation problem when per agent local constraints are absent and show that such scheme can achieve geometric rate when the objective functions are strongly convex and have Lipschitz continuous gradients. We have also provided scalability/network dependency analysis. Based on these two algorithms, we have further proposed a gradient projection-based algorithm which can handle smooth objective and simple constraints more efficiently. Numerical experiments demonstrates the viability and performance of all the proposed algorithms.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

This paper deals with a decentralized resource allocation problem, which is defined over a connected network of agents, as follows:

(1a)
(1b)
(1c)

For each agent

, the vector

is its local decision variable. The objective function is convex and the constraint set is a nonempty closed and convex set, both of which are privately known by agent only. The equality constraints, , are coupling the agents’ decisions, where is a given resource demand vector for agent .

I-a Literature review

A particular problem that falls under the preceding resource allocation formulation is the economic dispatch problem in which each is a quadratic function and every constraint set is a box, when the direct current power flow model is used [1]. Problems sharing similar forms have received extensive attention due to the emergence of smart city concepts. For example, references [2] and [3] both consider the economic dispatch in a smart grid with an extra consideration of a random wind power injection. Algorithms proposed in both references are accompanied with discussions of basic convergence properties. Some earlier theoretical papers which have focused on decentralized algorithm design for solving the “unconstrained version” () of (1) are available in the literature [4, 5]. Reference [4] considers a class of algorithms that randomly pick pairs of neighbors to perform updates. Under convexity assumption, an rate on the objective optimality residual in expectation is derived over fixed graphs; under strong convexity assumption, an geometric rate is obtained also on the expectation of the objective optimality residual. Here, is the number of iterations the concerned algorithm has performed, and is the gradient Lipschitz constant for the objective function . The quantity is the condition number of the function which is a scalar (no less than ) defined as the ratio of the gradient Lipschitz constant and the strong convexity constant of . The quantity

is the second smallest eigenvalue of a certain graph-dependent matrix. With a uniform assignment of probabilities,

scales at the order of (though it is possible to considerably improve on this if the probabilities are chosen in a centralized way depending on the graph). Reference [5] gives an algorithm which is shown to have an rate for the decay of the squared gradient consensus violation over time-varying graph sequences; here is a constant which measures how long it takes for a time-varying graph sequence to be jointly connected. Reference [6] proposes a “consensus plus innovations” method for solving problem (1), and the convergence of the method is established for quadratic objectives under a diminishing step size selection. Based on the alternating direction method of multipliers (ADMM), reference [7] provides a class of algorithms which can handle problem (1) with convergence guarantees. In particular, under the assumption that the objective functions are convex, the convergence properties are established; when the per-agent constraints (1c) are absent (i.e., ), under the assumptions that the objective functions are strongly convex and have Lipschitz continuous gradients, a linear convergence (geometric) rate is shown. By using the ADMM, if a center (in a star-shaped network) is allowed to carry a part of computational tasks, a more general problem formulation beyond (1) can be handled. Such a formulation and its distributed algorithms have been found to be useful in Internet services over hybrid edge-cloud networks [8]. Reference [9] studies the special case when , and considers solving the problem over time-varying networks. Under the strong convexity and the gradient Lipschitz continuity of the objective function , the algorithm in reference [9] is proved to have a geometric convergence rate . In other words, for the algorithm in [9] to reach an -accuracy, the number of iterations needs to be of the order . This translates to an scalability in the number of agents, and it is the best scalability result (with the size of the network) that currently exists in the literature. Reference [10] proposes a dual-based algorithm with a diminishing step size for solving (1), for which an convergence rate is derived. However, such algorithms with vanishing step sizes cannot be extended to handle problems with time-varying objectives and usually exhibit slow performance. A very recent work [11] has proposed a class of algorithms to handle the resource sharing problem under the conic constraints. The algorithms are built on a modified Lagrangian function and an ADMM-like scheme for seeking a saddle point of the Lagrangian function, which has been shown to have an rate for convex agents’ objective functions. Though the problem formulation in [11] treats (1) as a special case, the regulation assumptions111In reference [11], the constraint (1b) is replaced by a more general constraint where is a convex cone. However, it is required that the interior of is nonempty which does not apply to the linear constraints in (1b). that have been used to obtain the convergence rates do not apply to problem (1). Based on consensus and push-sum approaches [12], a recent reference [13] proposes a distributed algorithm for solving problem (1) over time-varying directed networks and provides convergence guarantees. Aside from the above algorithms which are all discrete-time methods, there are some continuous-time algorithms such as the one in reference [14], where convergence under general convexity assumption is ensured. Table I summarizes the most relevant references with the convergence rates and the scalability results for distributed algorithms for solving problem (1), and it illustrates the results of this paper with respect to the existing work. A very recent work by [15] proposes algorithms for decentralized consensus optimization for smooth and strongly convex objectives. By applying Nesterov’s acceleration to the dual problem of the consensus optimization, the algorithms in [15] attain optimal geometric convergence rate of the first-order algorithms, and that a decentralized resource allocation algorithm can scale in the order of with the number of agents. Nevertheless, to enjoy this rate/scalability improvement, one needs to know the strong convexity constant and the gradient Lipschitz constant . In contrast, the algorithms we study in this paper only ask for knowing the parameter . Furthermore, the algorithms and analysis in [15] are specified for smooth strongly convex objectives. It is unclear how one can modify such schemes in order to solve convex problems, nonsmooth problems, or problems with constraints/projections.

Reference uncon. strongly convex unconstrained constrained scalability
[4] geometric
[5]
[7] geometric
[9] geometric
[10]
[11] geometric
this paper geometric
TABLE I: The convergence rates and scalability results for distributed resource allocation algorithms for problem (1), which is convex in all instances. The scalar is the Lipschitz-gradient constant, while the condition number where is the strong convexity constant for . The rates are given in terms of the number of iterations, while the “scalability” column shows how the algorithm’s geometric rate depends on the number of agents, , and the condition number, . By saying “unconstrained” in the table, we mean that . The quantity used in reference [4] is the second smallest eigenvalue of a certain graph-dependent matrix (see Subsection I-A of this paper for more details).

I-B Our contributions

In this paper, we design an algorithm for solving problem (1) from an unconventional point of view. We consider synchronous updates and connected undirected communication networks with time-varying topologies. For general convex functions (without requirements of strong convexity and smoothness), our basic method has convergence rate222A nonnegative sequence is said to be convergent to at an rate if . In contrast, it is said to have an rate if ., which is slightly better than the sub-linear convergence rates achieved in the literature. When the objective functions are strongly convex and smooth, and for all , we show a geometric convergence of the method. Furthermore, we find that the algorithm scales in the order of , with the number of agents, which is better than the best scaling that has been currently achieved in the literature (see reference [9] where the scaling is ). For the case when the objective function is smooth and for all , we also provide a gradient-based algorithm that achieves an rate under convexity assumption, and a geometric rate under the strong convexity assumption for the objective functions . Finally, based on these two methods, we provide a combined optimization strategy which finds an optimal solution of problem (1) by using a gradient-projection at each iteration.

Ii Resource Allocation and Its Connection to Consensus Optimization

Ii-a Notation and basic assumptions

Some of the notation may not be standard but it enables us to present our algorithm and analysis in a compact form. Throughout the paper, we let agent hold a local variable , a function , and a constraint set of problem (1). We define

Our basic assumption is that problem (1) is convex, which is formalized as follows.

Assumption 1.

(Functional properties) For any , the function is convex while the set is nonempty, closed and convex.

We define as the indicator function of the set , namely,

We also define a composite function for agent , as follows:

Under Assumption 1, the functions are proper, closed, and convex, and so are the functions since the domain of is . Furthermore, under Assumption 1, the subdifferential sets satisfy

(2)

(see Theorem 23.8 of [16]). Moreover, since coincides with the normal cone of at , we have that

In addition to the network objective defined in (1a), we introduce two more network-wide aggregate functions,

(3)

where

(4)

Similarly, we define a matrix by using the vectors , .

Letting be a subgradient of at , we construct a matrix of subgradients , as follows:

(5)

and, similarly, the matrices and are defined using subgradients of and at , respectively. We drop the tilde in the notation when the function under consideration is differentiable (i.e., a subgradient becomes a gradient). Each row of , , , , and corresponds to the information available to agent only.

We use to denote a vector with all entries equal to 1, where the size of the vector is to be understood from the context. We say that a matrix is consensual if its rows are identical, i.e., for all agents and . For a given matrix , stands for its Frobenius norm, while

stands for its spectral norm (largest singular value). The largest eigenvalue of

a symmetric positive semidefinite matrix is denoted by , while its smallest non-zero eigenvalue is denoted by . The largest singular value of a matrix is denoted by . For any matrix , the set is the null space of , while the set is the linear span of all the columns of . Given a matrix and a matrix , the standard inner product of them is represented as . Given a positive (semi)definite square matrix , we define the -weighted (semi-)norm .

To model the underlying communication network for the agents, we use a simple (no self-loop) undirected graph, where is the vertex set and is the edge set. We say that an matrix is compatible with the graph when the following property holds: , the -th entry of is zero if neither is an element of nor . We use to denote the set of neighbors of agent in the graph , i.e.,

Let denote the (standard) Laplacian matrix associated with the graph , i.e., , where is the diagonal matrix with diagonal entries and being the number of edges incident to node , while is the graph adjacency matrix (with when and otherwise). A few facts about are that is compatible with , symmetric and positive semidefinite.

In our algorithm, we will use a matrix whose behavior is “close or the same” to , in the sense of the following assumption.

Assumption 2 (Graph connectivity and null/span property).

The graph is connected and a matrix Ł is compatible with the graph . Furthermore, for some full row-rank matrix and .

Note that Assumption 2 equivalently says that and thus . The matrix Ł can be chosen in several different ways:

  • Since the graph Laplacian satisfies Assumption 2, we can choose . In this case, each agent needs to know the number of its neighbors (its degree) and Ł can be constructed without any communication among the agents.

  • We can let . The network needs to configure this matrix but a preprocessing to retrieve is possible [17].

  • We can also choose where

    is a symmetric doubly stochastic matrix that is compatible with the graph

    and is strictly less than . This matrix can be constructed in the network through a few rounds of local interactions between the agents since some local strategies for determining exist, such as the Metropolis-Hasting rule which requires only one round of local interactions [18, 19].

We will discuss the specific choices of Ł in some of our results to simplify analysis or to point out to interesting results.

Ii-B The resource allocation and consensus optimization problems

In this subsection, we investigate the first-order optimality conditions for problem (1) and for consensus optimization. With the notation introduced in the preceding section, the resource allocation problem (1) can be compactly given by

(6)

where is a vector of appropriate dimension whose entries are all equal to . By using the Lagrangian function, we can write down the optimality conditions for problem (6) in a special form, as given in the following lemma.

Lemma 1 (First-order optimality condition for (6)).

Let Assumptions 1 and 2 hold, and let be a given scalar. Then, is an optimal solution of (6) if and only if there exists a matrix such that the pair satisfies the following relations:

(7a)
(7b)

where is the matrix defined in Assumption 2.

Proof.

The Lagrangian function of problem (6) is

where contains all the multipliers of the constraints . Since the problem is convex, the necessary and sufficient optimality condition for the primal-dual pair is

where is to be understood as a collection of all matrices whose every row is given by some subgradient of at . Noting that the condition is equivalent to the requirement that there exists an such that , the optimality condition for the primal-dual pair can be written as:

(9a)
(9b)

By Assumption 2, the full row-rank matrix has the set as its null space, implying that if and only if for some scalar . Applying this result to each column of (9b), we obtain its equivalent relation (7b). Relation (9a) states that the vector is orthogonal to every column of the matrix . The same is true if we scale by a factor , where is the given scalar in the lemma. Since , it implies that if and only if every column of matrix lies in the set . Let be a vector such that is equal to the -th column of , for . Take these column vectors as the columns of a matrix , for which after multiplying by , we have , thus showing that relation (9a) is equivalent to relation (7a).  

It turns out that the optimality conditions for the resource optimization problem, as given in Lemma 1, have an interesting connection with the optimality conditions for the consensus optimization problem. In order to expose this relation, we next discuss the consensus optimization problem, which is given as follows:

(10)

The local objective of each agent in (10) is the same as that in (6). Unlike the resource allocation problem, instead of having as constraints, here we have the consensus constraints, i.e., .

The first-order optimality condition of (10) is stated in the following lemma.

Lemma 2 (First-order optimality condition for (10)).

Let Assumptions 1 and 2 hold, and let be a given scalar. Then, is an optimal solution of (10) if and only if there exists a matrix such that the pair satisfies the following relations:

(11a)
(11b)

where is the matrix defined in Assumption 2.

The proof for this lemma is basically the same to that for Lemma 3.1 of reference [20] only that we use the decomposition while reference [20] uses .

Ii-C The mirror relationship

It is known that the Lagrangian dual problem of the resource allocation problem is a consensus optimization problem (see the discussion around equations (4)(6) in [7]). As having been pointed out in reference [7], a distributed optimization method that can solve the consensus optimization problem may also be used for the resource allocation problem through solving the dual of the resource allocation problem. Here, we will provide more special relations these two problems have which leads to a class of resource allocation algorithms following the design of a class of decentralized consensus optimization algorithms. Due to such special relations, it is possible that one can give a decentralized resource allocation algorithm without investigating the Lagrangian dual relationship between the above mentioned two problems.

The optimality conditions of the resource allocation problem (6) and the consensus optimization problem (10) are summed up in the following box:

Opt. Cond. of Resource Allocation, (7) Opt. Cond. of Consensus Optimization, (11)
TABLE II: Summary of optimality conditions

These conditions share the same structure, i.e.,

where and are some general maps. The only difference between relations (7) and (11) is: what in the span space of is and what in the null space of is. In the resource allocation problem, we need to be consensual while the rows of summing to . In the consensus optimization problem, we need to be consensual while the rows of are summing to . If in (11), we replace by and by at the same time, it will recover (7). Hypothetically, in an iterative consensus optimization algorithm with iteration index , if we substitute the image of the “subgradient map” by that of some other map and substitute the image of the identity map by that of some other map , there is a chance that when , still goes into the span space of and still goes into the null space of .

Furthermore, to analyze a consensus convex optimization algorithm, the most crucial relation we need is the monotone inequality, namely, , which translates into verifying , when we substitute the key quantities by the images of those general maps we have discussed above. Apparently, this inequality still holds when we let and and assume is convex. It is possible that such substitutions of and do not affect the validity of some of the existing analyses for certain algorithms. For example, the subgradient form of the proximal method is where is a step size. This method can be proven to have if is convex. Its counterpart after the substitution is , which can be resolved as

and updating according to the following rules:

It can be shown that the sequence of such iterative algorithm after substitution will converge to , corresponding to the fact that will converge to in the proximal method.

These important observations motivate the class of algorithms that we propose for solving resource allocation problems based on some existing consensus optimization algorithms. The consensus optimization algorithms that will be exploited in this paper are simple, efficient, and have recently been further accelerated akin to Nesterov’s fast methods [21], as well as enhanced to work over asynchronous [22] and directed communication networks [23, 24]. Our algorithm design philosophy also implies possibilities of enhancing the resource allocation algorithms proposed in this paper by using techniques from those for advancing consensus optimization algorithms.

In Section III, we describe our resource allocation algorithms and provide their convergence analysis. Finally, we will illustrate some numerical experiments in Section IV and conclude the paper with remarks in Section V.

Iii The Algorithms and their Convergence Analysis

Before we introduce our algorithms and conduct the analyses, let us introduce the solution set of the resource allocation problem (1), denoted by . We make the following assumption for problem (1), which we use through out the paper.

Assumption 3.

The solution set of the resource allocation problem (1) is nonempty. Furthermore, a Slater condition is satisfied, i.e., there is a point which satisfies the linear constraints in (1) and lies in the relative interior of the set constraint .

The set is nonempty, for example, when the constraint set of the resource allocation problem (1) is compact, or the objective function satisfies some growth condition. Under the convexity conditions in Assumption 1, the optimal set is closed and convex. Under Assumptions 1 and 3, the strong duality holds for problem (1) and its Lagrangian dual problem, and the dual optimal set is nonempty (see Proposition 6.4.2 of [25]).

Iii-a The basic algorithm: Mirror-P-EXTRA

This algorithm solves the original problem (1), i.e., the resource allocation with local constraints. This basic algorithm operates as follows (Algorithm 1). Each agent uses its local parameter , which can be viewed as the stepsize.

Algorithm 1: Mirror-P-EXTRA

Each agent chooses its own parameter and the same parameter ;
Each agent initializes with , , and ;
Each agent for do
  ;
  ;
  ;
end

The algorithm is motivated by the P-EXTRA algorithm from reference [20] for a consensus optimization problem. The reason we refer to Algorithm 1 as Mirror-P-EXTRA will be clear from the following lemma. In the lemma and later on, we will use to denote the diagonal matrix that has as its -th entry,

Lemma 3.

Let Assumptions 1 and 2 be satisfied, and let . Then, the sequence generated by Algorithm 1 satisfies for ,

(12a)
(12b)
(12c)

where is chosen the same as that in Algorithm 1, , and where the matrix of subgradients is the same as those used in Algorithm 1.

Proof.

By using the notation given in Subsection II-A, the updates of Algorithm 1 can be represented compactly as initializing with arbitrary , , and , and then performing for ,

(13a)
(13b)
(13c)

From (13b) and (13c), also considering the initialization333Since , we can choose the subgradient and use the relation (see (2)). , we have for . Thus, by substituting in (13a), we obtain that the given conditions are equivalent to:

(14a)
(14b)

Now, in relation (14b), we write and use (cf. (14a)), to obtain the following equivalent relations:

(15a)
(15b)

These relations are enough to generate the sequence . By Assumption 2, we have that , so by introducing the notation , relation (15a) reduces to . We note that a sequence generated by is the same to the sequence generated by (15a) with . Using these relations and reorganizing (15), we obtain

(16a)
(16b)
(16c)

which generates the same sequence as Algorithm 1 does. Finally, relation (16b) is equivalent to , which when substituted into (16a) gives relation (12a). Relations (16b) and (16c) coincide with (12b) and (12c), respectively.  

If we choose the special case and in Corollary 1 of reference [20], we will have the recursive relation of P-EXTRA in the following form444For brevity, we have slightly abused the notation . In P-EXTRA, is the square root of , while in the proposed algorithms here, is an arbitrary matrix that satisfies . However, in both cases, the null spaces of are the same, and so are the span spaces of .:

(17)

To fulfill the optimal condition for the resource allocation problem, we would need to be consensual and the rows of to sum up to . In view of the insight from Lemmas 1 and 2, the only thing we need to do is to replace by and replace by , as discussed in Section II-C. By doing so, we obtain

(18)

which is very similar to the relations (12a) and (12b) in Lemma 3. The only difference is in the term of (18) whereas we have “replaced” this term by to obtain (12a). The key function of the term in (17) (corresponding to the term in (18)) is to stabilize the iterative process and neutralize those terms that are not one-step decentralized implementable in P-EXTRA. Hypothetically, with a “large enough” positive (semi)definite matrix , any term in (17) (or in (18)) will serve the purpose of stabilizing the iterative process. Here, we redesign this term as a more flexible one, , so that the recursive relations are resolvable and implementable in decentralized manner, while featuring per-agent-independent parameters.

Our analysis of Algorithm 1 will use the alternative description of the algorithm, as given in Lemma 3. To simplify our presentation, let us define the following quantities:

(19)

where is the parameter of Algorithm 1. Using this particular in Lemma 1, with each solution we can identify such that Lemma 1 holds, i.e., the optimality conditions in (7) are satisfied. This particular and the matrix constitute the matrix .

Next, we will show that converges to a solution .

Theorem 1 (Convergence of Mirror-P-EXTRA).

Let Assumptions 13 hold. Let the parameters and be such that . Then, and the sequences and generated by Algorithm 1 satisfy the following relations:

(20)

where , and , for , are defined by (19). Furthermore, the sequence converges to a point in the optimal solution set .

Proof.

By the convexity of , we have that for any arbitrary ,

(21)

By relation (12a) of Lemma 3 we have

By Lemma 1, where is the chosen parameter in the algorithm, from (7a) we have

The preceding two relations imply that

which when substituted into (21) yields