This paper deals with a decentralized resource allocation problem, which is defined over a connected network of agents, as follows:
For each agent , the vector
, the vectoris its local decision variable. The objective function is convex and the constraint set is a nonempty closed and convex set, both of which are privately known by agent only. The equality constraints, , are coupling the agents’ decisions, where is a given resource demand vector for agent .
I-a Literature review
A particular problem that falls under the preceding resource allocation formulation is the economic dispatch problem in which each is a quadratic function and every constraint set is a box, when the direct current power flow model is used . Problems sharing similar forms have received extensive attention due to the emergence of smart city concepts. For example, references  and  both consider the economic dispatch in a smart grid with an extra consideration of a random wind power injection. Algorithms proposed in both references are accompanied with discussions of basic convergence properties. Some earlier theoretical papers which have focused on decentralized algorithm design for solving the “unconstrained version” () of (1) are available in the literature [4, 5]. Reference  considers a class of algorithms that randomly pick pairs of neighbors to perform updates. Under convexity assumption, an rate on the objective optimality residual in expectation is derived over fixed graphs; under strong convexity assumption, an geometric rate is obtained also on the expectation of the objective optimality residual. Here, is the number of iterations the concerned algorithm has performed, and is the gradient Lipschitz constant for the objective function . The quantity is the condition number of the function which is a scalar (no less than ) defined as the ratio of the gradient Lipschitz constant and the strong convexity constant of . The quantity scales at the order of (though it is possible to considerably improve on this if the probabilities are chosen in a centralized way depending on the graph). Reference  gives an algorithm which is shown to have an rate for the decay of the squared gradient consensus violation over time-varying graph sequences; here is a constant which measures how long it takes for a time-varying graph sequence to be jointly connected. Reference  proposes a “consensus plus innovations” method for solving problem (1), and the convergence of the method is established for quadratic objectives under a diminishing step size selection. Based on the alternating direction method of multipliers (ADMM), reference  provides a class of algorithms which can handle problem (1) with convergence guarantees. In particular, under the assumption that the objective functions are convex, the convergence properties are established; when the per-agent constraints (1c) are absent (i.e., ), under the assumptions that the objective functions are strongly convex and have Lipschitz continuous gradients, a linear convergence (geometric) rate is shown. By using the ADMM, if a center (in a star-shaped network) is allowed to carry a part of computational tasks, a more general problem formulation beyond (1) can be handled. Such a formulation and its distributed algorithms have been found to be useful in Internet services over hybrid edge-cloud networks . Reference  studies the special case when , and considers solving the problem over time-varying networks. Under the strong convexity and the gradient Lipschitz continuity of the objective function , the algorithm in reference  is proved to have a geometric convergence rate . In other words, for the algorithm in  to reach an -accuracy, the number of iterations needs to be of the order . This translates to an scalability in the number of agents, and it is the best scalability result (with the size of the network) that currently exists in the literature. Reference  proposes a dual-based algorithm with a diminishing step size for solving (1), for which an convergence rate is derived. However, such algorithms with vanishing step sizes cannot be extended to handle problems with time-varying objectives and usually exhibit slow performance. A very recent work  has proposed a class of algorithms to handle the resource sharing problem under the conic constraints. The algorithms are built on a modified Lagrangian function and an ADMM-like scheme for seeking a saddle point of the Lagrangian function, which has been shown to have an rate for convex agents’ objective functions. Though the problem formulation in  treats (1) as a special case, the regulation assumptions111In reference , the constraint (1b) is replaced by a more general constraint where is a convex cone. However, it is required that the interior of is nonempty which does not apply to the linear constraints in (1b). that have been used to obtain the convergence rates do not apply to problem (1). Based on consensus and push-sum approaches , a recent reference  proposes a distributed algorithm for solving problem (1) over time-varying directed networks and provides convergence guarantees. Aside from the above algorithms which are all discrete-time methods, there are some continuous-time algorithms such as the one in reference , where convergence under general convexity assumption is ensured. Table I summarizes the most relevant references with the convergence rates and the scalability results for distributed algorithms for solving problem (1), and it illustrates the results of this paper with respect to the existing work. A very recent work by  proposes algorithms for decentralized consensus optimization for smooth and strongly convex objectives. By applying Nesterov’s acceleration to the dual problem of the consensus optimization, the algorithms in  attain optimal geometric convergence rate of the first-order algorithms, and that a decentralized resource allocation algorithm can scale in the order of with the number of agents. Nevertheless, to enjoy this rate/scalability improvement, one needs to know the strong convexity constant and the gradient Lipschitz constant . In contrast, the algorithms we study in this paper only ask for knowing the parameter . Furthermore, the algorithms and analysis in  are specified for smooth strongly convex objectives. It is unclear how one can modify such schemes in order to solve convex problems, nonsmooth problems, or problems with constraints/projections.
|Reference||uncon. strongly convex||unconstrained||constrained||scalability|
I-B Our contributions
In this paper, we design an algorithm for solving problem (1) from an unconventional point of view. We consider synchronous updates and connected undirected communication networks with time-varying topologies. For general convex functions (without requirements of strong convexity and smoothness), our basic method has convergence rate222A nonnegative sequence is said to be convergent to at an rate if . In contrast, it is said to have an rate if ., which is slightly better than the sub-linear convergence rates achieved in the literature. When the objective functions are strongly convex and smooth, and for all , we show a geometric convergence of the method. Furthermore, we find that the algorithm scales in the order of , with the number of agents, which is better than the best scaling that has been currently achieved in the literature (see reference  where the scaling is ). For the case when the objective function is smooth and for all , we also provide a gradient-based algorithm that achieves an rate under convexity assumption, and a geometric rate under the strong convexity assumption for the objective functions . Finally, based on these two methods, we provide a combined optimization strategy which finds an optimal solution of problem (1) by using a gradient-projection at each iteration.
Ii Resource Allocation and Its Connection to Consensus Optimization
Ii-a Notation and basic assumptions
Some of the notation may not be standard but it enables us to present our algorithm and analysis in a compact form. Throughout the paper, we let agent hold a local variable , a function , and a constraint set of problem (1). We define
Our basic assumption is that problem (1) is convex, which is formalized as follows.
(Functional properties) For any , the function is convex while the set is nonempty, closed and convex.
We define as the indicator function of the set , namely,
We also define a composite function for agent , as follows:
(see Theorem 23.8 of ). Moreover, since coincides with the normal cone of at , we have that
In addition to the network objective defined in (1a), we introduce two more network-wide aggregate functions,
Similarly, we define a matrix by using the vectors , .
Letting be a subgradient of at , we construct a matrix of subgradients , as follows:
and, similarly, the matrices and are defined using subgradients of and at , respectively. We drop the tilde in the notation when the function under consideration is differentiable (i.e., a subgradient becomes a gradient). Each row of , , , , and corresponds to the information available to agent only.
We use to denote a vector with all entries equal to 1, where the size of the vector is to be understood from the context. We say that a matrix is consensual if its rows are identical, i.e., for all agents and . For a given matrix , stands for its Frobenius norm, while
stands for its spectral norm (largest singular value). The largest eigenvalue ofa symmetric positive semidefinite matrix is denoted by , while its smallest non-zero eigenvalue is denoted by . The largest singular value of a matrix is denoted by . For any matrix , the set is the null space of , while the set is the linear span of all the columns of . Given a matrix and a matrix , the standard inner product of them is represented as . Given a positive (semi)definite square matrix , we define the -weighted (semi-)norm .
To model the underlying communication network for the agents, we use a simple (no self-loop) undirected graph, where is the vertex set and is the edge set. We say that an matrix is compatible with the graph when the following property holds: , the -th entry of is zero if neither is an element of nor . We use to denote the set of neighbors of agent in the graph , i.e.,
Let denote the (standard) Laplacian matrix associated with the graph , i.e., , where is the diagonal matrix with diagonal entries and being the number of edges incident to node , while is the graph adjacency matrix (with when and otherwise). A few facts about are that is compatible with , symmetric and positive semidefinite.
In our algorithm, we will use a matrix whose behavior is “close or the same” to , in the sense of the following assumption.
Assumption 2 (Graph connectivity and null/span property).
The graph is connected and a matrix Ł is compatible with the graph . Furthermore, for some full row-rank matrix and .
Note that Assumption 2 equivalently says that and thus . The matrix Ł can be chosen in several different ways:
Since the graph Laplacian satisfies Assumption 2, we can choose . In this case, each agent needs to know the number of its neighbors (its degree) and Ł can be constructed without any communication among the agents.
We can let . The network needs to configure this matrix but a preprocessing to retrieve is possible .
We can also choose where
is a symmetric doubly stochastic matrix that is compatible with the graphand is strictly less than . This matrix can be constructed in the network through a few rounds of local interactions between the agents since some local strategies for determining exist, such as the Metropolis-Hasting rule which requires only one round of local interactions [18, 19].
We will discuss the specific choices of Ł in some of our results to simplify analysis or to point out to interesting results.
Ii-B The resource allocation and consensus optimization problems
In this subsection, we investigate the first-order optimality conditions for problem (1) and for consensus optimization. With the notation introduced in the preceding section, the resource allocation problem (1) can be compactly given by
where is a vector of appropriate dimension whose entries are all equal to . By using the Lagrangian function, we can write down the optimality conditions for problem (6) in a special form, as given in the following lemma.
Lemma 1 (First-order optimality condition for (6)).
The Lagrangian function of problem (6) is
where contains all the multipliers of the constraints . Since the problem is convex, the necessary and sufficient optimality condition for the primal-dual pair is
where is to be understood as a collection of all matrices whose every row is given by some subgradient of at . Noting that the condition is equivalent to the requirement that there exists an such that , the optimality condition for the primal-dual pair can be written as:
By Assumption 2, the full row-rank matrix has the set as its null space, implying that if and only if for some scalar . Applying this result to each column of (9b), we obtain its equivalent relation (7b). Relation (9a) states that the vector is orthogonal to every column of the matrix . The same is true if we scale by a factor , where is the given scalar in the lemma. Since , it implies that if and only if every column of matrix lies in the set . Let be a vector such that is equal to the -th column of , for . Take these column vectors as the columns of a matrix , for which after multiplying by , we have , thus showing that relation (9a) is equivalent to relation (7a).
It turns out that the optimality conditions for the resource optimization problem, as given in Lemma 1, have an interesting connection with the optimality conditions for the consensus optimization problem. In order to expose this relation, we next discuss the consensus optimization problem, which is given as follows:
The first-order optimality condition of (10) is stated in the following lemma.
Lemma 2 (First-order optimality condition for (10)).
Ii-C The mirror relationship
It is known that the Lagrangian dual problem of the resource allocation problem is a consensus optimization problem (see the discussion around equations (4)(6) in ). As having been pointed out in reference , a distributed optimization method that can solve the consensus optimization problem may also be used for the resource allocation problem through solving the dual of the resource allocation problem. Here, we will provide more special relations these two problems have which leads to a class of resource allocation algorithms following the design of a class of decentralized consensus optimization algorithms. Due to such special relations, it is possible that one can give a decentralized resource allocation algorithm without investigating the Lagrangian dual relationship between the above mentioned two problems.
|Opt. Cond. of Resource Allocation, (7)||Opt. Cond. of Consensus Optimization, (11)|
These conditions share the same structure, i.e.,
where and are some general maps. The only difference between relations (7) and (11) is: what in the span space of is and what in the null space of is. In the resource allocation problem, we need to be consensual while the rows of summing to . In the consensus optimization problem, we need to be consensual while the rows of are summing to . If in (11), we replace by and by at the same time, it will recover (7). Hypothetically, in an iterative consensus optimization algorithm with iteration index , if we substitute the image of the “subgradient map” by that of some other map and substitute the image of the identity map by that of some other map , there is a chance that when , still goes into the span space of and still goes into the null space of .
Furthermore, to analyze a consensus convex optimization algorithm, the most crucial relation we need is the monotone inequality, namely, , which translates into verifying , when we substitute the key quantities by the images of those general maps we have discussed above. Apparently, this inequality still holds when we let and and assume is convex. It is possible that such substitutions of and do not affect the validity of some of the existing analyses for certain algorithms. For example, the subgradient form of the proximal method is where is a step size. This method can be proven to have if is convex. Its counterpart after the substitution is , which can be resolved as
and updating according to the following rules:
It can be shown that the sequence of such iterative algorithm after substitution will converge to , corresponding to the fact that will converge to in the proximal method.
These important observations motivate the class of algorithms that we propose for solving resource allocation problems based on some existing consensus optimization algorithms. The consensus optimization algorithms that will be exploited in this paper are simple, efficient, and have recently been further accelerated akin to Nesterov’s fast methods , as well as enhanced to work over asynchronous  and directed communication networks [23, 24]. Our algorithm design philosophy also implies possibilities of enhancing the resource allocation algorithms proposed in this paper by using techniques from those for advancing consensus optimization algorithms.
Iii The Algorithms and their Convergence Analysis
Before we introduce our algorithms and conduct the analyses, let us introduce the solution set of the resource allocation problem (1), denoted by . We make the following assumption for problem (1), which we use through out the paper.
The set is nonempty, for example, when the constraint set of the resource allocation problem (1) is compact, or the objective function satisfies some growth condition. Under the convexity conditions in Assumption 1, the optimal set is closed and convex. Under Assumptions 1 and 3, the strong duality holds for problem (1) and its Lagrangian dual problem, and the dual optimal set is nonempty (see Proposition 6.4.2 of ).
Iii-a The basic algorithm: Mirror-P-EXTRA
This algorithm solves the original problem (1), i.e., the resource allocation with local constraints. This basic algorithm operates as follows (Algorithm 1). Each agent uses its local parameter , which can be viewed as the stepsize.
Algorithm 1: Mirror-P-EXTRA
|Each agent chooses its own parameter and the same parameter ;|
|Each agent initializes with , , and ;|
|Each agent for do|
The algorithm is motivated by the P-EXTRA algorithm from reference  for a consensus optimization problem. The reason we refer to Algorithm 1 as Mirror-P-EXTRA will be clear from the following lemma. In the lemma and later on, we will use to denote the diagonal matrix that has as its -th entry,
By using the notation given in Subsection II-A, the updates of Algorithm 1 can be represented compactly as initializing with arbitrary , , and , and then performing for ,
From (13b) and (13c), also considering the initialization333Since , we can choose the subgradient and use the relation (see (2)). , we have for . Thus, by substituting in (13a), we obtain that the given conditions are equivalent to:
These relations are enough to generate the sequence . By Assumption 2, we have that , so by introducing the notation , relation (15a) reduces to . We note that a sequence generated by is the same to the sequence generated by (15a) with . Using these relations and reorganizing (15), we obtain
which generates the same sequence as Algorithm 1 does. Finally, relation (16b) is equivalent to , which when substituted into (16a) gives relation (12a). Relations (16b) and (16c) coincide with (12b) and (12c), respectively.
If we choose the special case and in Corollary 1 of reference , we will have the recursive relation of P-EXTRA in the following form444For brevity, we have slightly abused the notation . In P-EXTRA, is the square root of , while in the proposed algorithms here, is an arbitrary matrix that satisfies . However, in both cases, the null spaces of are the same, and so are the span spaces of .:
To fulfill the optimal condition for the resource allocation problem, we would need to be consensual and the rows of to sum up to . In view of the insight from Lemmas 1 and 2, the only thing we need to do is to replace by and replace by , as discussed in Section II-C. By doing so, we obtain
which is very similar to the relations (12a) and (12b) in Lemma 3. The only difference is in the term of (18) whereas we have “replaced” this term by to obtain (12a). The key function of the term in (17) (corresponding to the term in (18)) is to stabilize the iterative process and neutralize those terms that are not one-step decentralized implementable in P-EXTRA. Hypothetically, with a “large enough” positive (semi)definite matrix , any term in (17) (or in (18)) will serve the purpose of stabilizing the iterative process. Here, we redesign this term as a more flexible one, , so that the recursive relations are resolvable and implementable in decentralized manner, while featuring per-agent-independent parameters.
Our analysis of Algorithm 1 will use the alternative description of the algorithm, as given in Lemma 3. To simplify our presentation, let us define the following quantities:
where is the parameter of Algorithm 1. Using this particular in Lemma 1, with each solution we can identify such that Lemma 1 holds, i.e., the optimality conditions in (7) are satisfied. This particular and the matrix constitute the matrix .
Next, we will show that converges to a solution .