A System Theoretical Perspective to Gradient-Tracking Algorithms for Distributed Quadratic Optimization

11/15/2019 ∙ by Michelangelo Bin, et al. ∙ University of Bologna 0

In this paper we consider a recently developed distributed optimization algorithm based on gradient tracking. We propose a system theory framework to analyze its structural properties on a preliminary, quadratic optimization set-up. Specifically, we focus on a scenario in which agents in a static network want to cooperatively minimize the sum of quadratic cost functions. We show that the gradient tracking distributed algorithm for the investigated program can be viewed as a sparse closed-loop linear system in which the dynamic state-feedback controller includes consensus matrices and optimization (stepsize) parameters. The closed-loop system turns out to be not completely reachable and asymptotic stability can be shown restricted to a proper invariant set. Convergence to the global minimum, in turn, can be obtained only by means of a proper initialization. The proposed system interpretation of the distributed algorithm provides also additional insights on other structural properties and possible design choices that are discussed in the last part of the paper as a starting point for future developments.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Many optimization algorithms are iterative procedures that can be, thus, framed as discrete-time dynamical systems. Usual approaches to prove the convergence of these schemes, even though often based on descent, Lyapunov-like arguments, do not explicitly and deeply explore this system theoretical perspective. The great potential of system theory becomes more evident when noticing that several algorithms encode a feedback structure in their update laws. In this paper we propose a system theoretical interpretation of a state-of-the-art distributed optimization algorithm often known as gradient tracking, see, e.g., [1, 2, 3, 4, 5, 6, 7, 8].

In this framework agents (systems) in a network cooperate to minimize the sum of local functions that depend on a common decision variable. Agents exchange information with neighbors in a given (sparse) communication graph and cannot rely on any centralized coordinating unit. We consider a simplified set-up in which the optimization problem is quadratic and the communication occurs according to a fixed and undirected graph.

Distributed optimization has received a large interest from the control community in the last decades. Early references on this topic are [9, 10] where the (sub)gradient method has been successfully combined with consensus averaging to design a distributed method. Recently, this approach has been enhanced by introducing a tracking technique based on the dynamic average consensus, originally proposed in [11, 12]

. The tracking mechanism allows agents to obtain a local estimate of the gradient of the entire sum of functions, which is then used as a descent direction in the consensus-based update of the local solution estimate, see, e.g., 

[1, 2, 3, 4, 5, 6, 7, 8].

First approaches providing a system theoretical perspective to distributed optimization algorithms are [13, 14]. A framework based on integral quadratic constraints from robust control theory is proposed in [15] to analyze and design (centralized) iterative optimization algorithms. In [16] authors propose a loop-shaping interpretations for several existing optimization methods based on basic control elements such as PID and lag compensators. The convergence of distributed optimization algorithms by means of proper semidefinite programs is, instead, discussed in [17]. A passivity-based approach is proposed in [18] to analyze a distributed algorithm with communication delays.

The contributions of this paper are as follows. We approach the design of distributed optimization algorithms as a control problem, by showing how system theoretical tools can be used to provide new insights on the existing algorithms, and new perspectives for future extensions. We develop the discussion for a simplified quadratic, unconstrained optimization problem, that allows us to rely on powerful tools from linear regulation theory. In particular, we cast the optimization algorithm design as a linear control problem aiming to steer the state trajectories toward the optimal solution, and we provide necessary and sufficient conditions for its solvability. We show that a class of gradient tracking distributed algorithms fits in the proposed framework, which, in turn, provides new insights in terms of structural properties of the controlled system. Specifically, the resulting algorithm, seen as a sparse dynamical system, turns out to be not completely reachable and this reflects on the need of a proper initialization and of the necessity of a stabilizing action in the “closed-loop” dynamics. The proposed system theoretical perspective suggests that robustness arguments, customary in control theory, can be used to extends these features also to optimization algorithms.

The paper is organized as follows. In Section II we introduce the distributed optimization set-up and recall the gradient tracking algorithm. In Section III we describe the system theoretical framework to solve a quadratic distributed optimization problem which is used in Section IV to analyze the gradient tracking algorithm.

Notation

We deal with discrete-time dynamical systems of the form . For the sake of readability we omit the time dependency whenever it is clear from the context and we write in place of . Given a square matrix , we denote by

its spectrum. A square matrix is said to be Schur if all its eigenvalues lie inside the open unitary disc. Given a square matrix

, a set is said to be -invariant if for all it holds . We denote by the identity matrix and by the

matrix of zeros. The column vector of

ones is denoted by . Moreover, we define where is the Kronecker product. We omit the dimension of these objects whenever it is clear from the context. For and , we denote by their column concatenation. For and , we let .

Ii The Distributed Optimization Framework

In this section we introduce the distributed optimization set-up and recall the state-of-the-art gradient tracking algorithm that we aim to investigate in this paper.

Ii-a Distributed Optimization Set-up

We consider the following optimization problem

(1)

where, for each , is of the form

(2)

with symmetric and positive-definite, , and where is an offset parameter whose role will be clarified later. In particular, problem (1) admits a unique solution given by

(3)

We focus on iterative procedures to solve (1) that are distributed. In particular, we assume to have a network of agents, each one having access only to partial information about the problem and exchanging information with a subset of the other agents. Distributed optimization algorithms are local update laws that fulfill the network constraints and allow agents to eventually converge the optimal solution . Formally, we model the network by means of a connected and undirected graph where is the set of edges. If , then nodes and can exchange information (and, in fact, ). We denote by the set of neighbors of node in . We assume that contains itself. As usually done in consensus-based approaches, we consider a matrix matching the graph , i.e., -th entry for while otherwise. Moreover, is row stochastic if , while it is column stochastic if

. It can be proved that the spectrum of a row (or column) stochastic matrix lies in the closed unitary circle and the largest (in norm) eigenvalue is

and is simple.

In this paper we assume that each agent maintains a local quantity representing its guess of the optimal solution , and it has only access to gradients of the local cost function computed at , i.e., to the quantity

(4)

where . In these terms, the distributed optimization problem associated to (1) can be cast as follows.

Problem II.1

Find an update law for , depending only on the local available information given by the quantities for all , such that

for each .

Problem II.1 could be clearly solved in a distributed way through a consensus algorithm by exploiting equation (3). In this paper we focus on distributed optimization algorithms to solve Problem II.1. It is worth mentioning that in some applications agents may not know , and but just the local measurement . Notice that, in view of (4), each matrix is directly linked to the Lipschitz constant of the corresponding local gradient , while the affine terms represent a partial information on that, even if accessible via , is not assumed to be known a priori. On this regard, is a parameter condensing an information which is not known to the agents.

Ii-B The Gradient Tracking Algorithm

In this subsection, we recall the gradient tracking algorithm in its most basic form. For convenience, we first recall the (centralized) gradient method applied to a generic instance of (1). In the (steepest descent) gradient method, a solution estimate is iteratively updated according to111As discussed in the Notation paragraph, we omit the time dependence when not strictly necessary.

where is a constant, positive parameter that is usually called stepsize. Convergence results for the class of gradient methods can be found, e.g., in [19].

The gradient tracking distributed algorithm mimics the centralized update by exploiting a twofold consensus-based mechanism to: (i) enforce an agreement among the agents’ estimates and (ii) dynamically track the gradient of the whole cost function through an auxiliary variable , called tracker. Formally, it reads

(5a)
(5b)

where and are entries of a row stochastic matrix and of a column stochastic matrix , respectively, while is a (constant) stepsize.

Several versions of the gradient tracking algorithm have been analyzed for generic, nonlinear, and possibly constrained versions of problem (1), see, e.g., [1, 2, 3, 4, 5, 6, 7, 8]. For example, in [7] it is shown that, under strong convexity of the local cost functions , and Lipschitz continuity of their gradients, the sequence generated by algorithm (5), with arbitrary, and for a sufficiently small stepsize , converges to the optimal solution of (1).

Remark II.2

An interesting property of the states is that, by summing over the update (5b), we can exploit the column stochasticity of the weights to obtain

(6)

Specifically, condition (6) holds at . Moreover, it also holds for any consensual asymptotic value of . By assuming , for all , it can be shown that the asymptotic value of the tracker is , (recall that weights in (5a) sum up to one, i.e., is row stochastic). Thus, we have

This, in turn, shows that if the initialization of each is arbitrary, so that the last line is not zero, there is no chance that a consensual asymptotic value is stationary (hence optimal) for problem (1).

The distributed algorithm described by (5) does not enjoy the usual “state-space” structure of dynamical systems, since the updated depends on . Thus, we consider the change of variable , so that algorithm (5) can be equivalently rewritten as

(7a)
(7b)

In these new coordinates, the correct initialization becomes , for all . Also, this reformulation does not alter the distributed nature of the algorithm.

Let , and and compactly rewrite (7) as

(8)

in which , , , and denotes a column vector stacking the local gradients, i.e., . Notice that, for the considered quadratic scenario the output map is (as expected) affine in and this structure will be exploited later.

Iii A System Theoretical Approach to Quadratic Distributed Optimization

In this section, we approach the design of a distributed algorithm solving optimization problem (1) from a system theoretical perspective. Specifically, we approach Problem II.1 as a generic set-point control problem

, by giving necessary and sufficient conditions for its solvability. For the sake of presentation, in this part we intentionally avoid dealing explicitly with network constraints. We will discuss constructive choices of the different degrees of freedom that are consistent with the network constraints in Section 

IV.

Iii-a The Underlying Control Problem

The local estimates can be seen as the state of controlled plants

(9)

The control goal consists of finding a suitable control input such that each controlled plant asymptotically converges to the optimal solution of problem (1). We further point out that in this regulation setting the target equilibrium is not available for feedback.

By letting and , the overall controlled plant, obtained by stacking the local dynamics (9) and the local measurements (4), reads as

(10)

where and , with and introduced in (4). Therefore, Problem II.1 can be recast as follows.

Problem III.1

Find a (dynamic) controller of the form

(11)

with state , , and a non-empty set of initial conditions such that, for each , all the trajectories of the “closed-loop system”

(12)

with are bounded and satisfy

for each .

Restricting the regulator (11) to be linear is motivated by the fact that, except for the affine term appearing in the output , the controlled system (10) is linear. Thus, Problem III.1 results in a linear set-point control problem that can be solved by a linear regulator. In the same way, linearity implies that we can assume, without loss of generality, that the set of initial conditions is an affine subspace of whose bias is parametrized by , i.e.,

(13)

for some linear subspace of of dimension , and for some matrix satisfying .

The closed-loop system (12) can be compactly written as

(14)

with

The gradient tracking algorithm (8) exhibits the same closed-loop structure as (14), in which the gradients act as an output feedback action. In the following, we provide necessary and sufficient conditions for the existence of a controller of the form (11) and a set of the form (13) solving Problem III.1.

Iii-B Necessary and Sufficient Conditions

Let and . Consider an -dimensional vector subspace of and let be an orthonormal matrix of the form , with and satisfying

(15)

Then, it is easy to see that is -invariant if and only if

for some , and . The matrices and represent the restriction of to and , respectively. These matrices yield to the following definition.

Definition III.2

The subspace is said to be:

  • internally stable if is Schur;

  • externally anti-stable if has no eigenvalue inside the open unitary disc.

The forthcoming proposition is the main result of the section and it states necessary and sufficient conditions for the existence of a controller of the form (11) and an initialization set of the form (13) solving Problem III.1. For simplicity, although not necessary, we restrict the focus to initialization sets with the following additional property.

Definition III.3

A set of the form (13) is said to be an admissible initialization set if is -invariant and externally anti-stable.

Proposition III.4

Consider a controller of the form (11) resulting in the closed-loop system (14). Let be an admissible initialization set of the form (13), for some -dimensional -invariant subspace of and some satisfying . Moreover, let be an orthonormal matrix satisfying . Then, Problem III.1 is solved from by a controller of the form (11) if and only if

  1. the set is internally stable;

  2. there exists satisfying

    (16a)
    (16b)
    (16c)

Regarding the claim of Proposition III.4, we observe that equation (16a) expresses the existence, for every , of an equilibrium of the closed-loop system (14) given by

(17)

Equation (16b), instead, forces such equilibrium to be an optimal solution of problem (1), namely . Finally, equation (16c) and the internal stability of express the fact that, if the closed-loop system (14) is initialized with , then the equilibrium point (17) attracts all the closed-loop trajectories.

Iv Gradient Tracking Revisited

In this section, we establish a bridge between the gradient tracking distributed algorithm described in Section II-B and the system theoretical framework discussed in Section III. The design of a distributed optimization algorithm solving problem (1) can be equivalently recast as the problem of finding a regulator of the form (11) which satisfies Problem III.1 and is sparse, in the sense that each control input depends only on the neighboring information , . Specifically, we show that matrices , , , , , in the controller (11) can be properly chosen to implement a class of gradient tracking algorithms that, among others, includes (8). To this end, we progressively fix the available degrees of freedom in the controller (11) with the aim of satisfying the conditions given in Proposition III.4.

Iv-a Gradient Tracking as a Control System

First we set the controller dimension equal to the plant dimension, i.e., . Moreover, we let in the controller (11)

(18)

where satisfies and , while and are still free. We notice that all the matrices in (18) are sparse, resulting in a controller that can be implemented in a fully distributed way.

The choice (18) results in a closed-loop system (14) with

(19)

In the following, we investigate conditions on the choice of and such that an admissible initialization set and the controller (11) satisfy the assumptions of Proposition III.4. As a first result we claim the following.

Lemma IV.1

Consider the closed-loop system (14) in the setting described above. Then,

  1. there exists an -dimensional subspace of that is -invariant and externally anti-stable for all the possible choices of and ;

  2. there always exist and such that is also internally stable.

We first notice that in (19) can be decomposed in two terms as

(20)

where

As a consequence can be thought of as being obtained by stabilizing the following auxiliary system

by means of the state-feedback control law . Being triangular, it holds that . Hence, has an eigenvalue equal to with algebraic multiplicity

, while all the other eigenvalues lie inside the open unitary disc. It can be shown that a basis for the left-eigenspace of

associated to the eigenvalue is given by the span of and defined as

(21)

where satisfies . We further observe that the left-kernel of is spanned only by . Therefore, the stabilizability PBH test ensures that the non-reachable subspace of is a -dimensional subspace of the eigenspace associated to the eigenvalue and, on the other hand, that the reachable subspace has dimension . Therefore, point (i) follows by taking equal to the reachable subspace, and by noticing that its -invariance and external anti-stability properties cannot be changed via feedback, i.e., by any choice for and .

To show point (ii), we resort to the reachability Kalman decomposition. Consider a transformation matrix with

(22)

where is such that and . Then, it holds and transforms into and of the form

for some and . Furthermore by construction the pair

(23)

is completely reachable, and being nonsingular, then there always exist gain matrices and satisfying such that the matrix is Schur. Thus, point (ii) follows since the latter condition implies that is internally stable.

In the rest of the section, we denote by the subspace produced by Lemma IV.1. The following result gives a sufficient condition on the choice of and such that equations (16) in Proposition III.4 admit a solution.

Lemma IV.2

Consider the closed-loop system (14) in the setting described above. Pick and such that

(24)

Then, there exist and , satisfying , such that equations (16) hold.

Let , with and . Then, in view of (19), solves equations (16a) and (16b) if and only if

(25)

By (24) and since , we can rewrite (25) as

(26)

Therefore, and solve (16a)-(16b).

Finally, as for the existence of satisfying (16c), we observe that is full rank and . Hence, given , equation (16c) is satisfied, e.g., with that fulfills .

Remark IV.3

While Lemma IV.1 is linked only to a stability requirement on the closed-loop system, the choice (24) of Lemma IV.2 represents a constraint ensuring the existence of an equilibrium which is an optimal solution for the optimization problem (1). In fact, in order to obtain internal stability of we could, for example, choose and any matrix so that is Schur (which always exists). However, such a choice does not satisfy (24) and, hence, the resulting algorithm would not ensure the existence of an optimal equilibrium for the closed-loop system.

In the following proposition we merge the previous results to give sufficient conditions on the choice of and so that the trajectories of (14) initialized in converge to a solution of Problem III.1.

Proposition IV.4

Consider the closed-loop system (14) in the setting described above. Let be such that in (19) has all the eigenvalues but inside the open unitary disc. Then, Problem III.1 is solved from , in the sense that all the trajectories of the closed-loop system (14) originating in are bounded and , .

In view of Lemma IV.1, there exists an -invariant and externally anti-stable subspace . Moreover, it is also internally stable whenever and are such that has all the eigenvalues but inside the open unitary disc. In view of Lemma IV.2, if , then there exist and such that steady-state condition (16) hold. Hence, the claim follows by Proposition III.4.

Finally, we notice that the choice of and in Proposition IV.4 might not satisfy the network constraints. In the following, we discuss how the usual practice in distributed optimization of selecting a common stepsize for all the agents, is consistent with Proposition IV.4, provided that is taken sufficiently small. In our framework, this is achieved by setting . In this way we complete the result of the section by showing that and , fulfilling both the assumptions of Proposition IV.4 and the network constraints, always exist. The feasibility of this choices follows as a particular case of the following result.

Proposition IV.5

Consider the closed-loop system (14) in the setting described above and let , with diagonal and positive definite. Then, there exists such that, if all the eigenvalues of lie in , Problem III.1 is solved from , in the sense that all the trajectories of the closed-loop system (14) originating in are bounded and , .

Iv-B Remarks About the Proposed Approach

We start by pointing out some aspects related to the initialization. We observe that, if is nonsingular, then the choice of satisfying (26) is unique and it is given by . Thus, recalling the definition of in (3), it holds . Moreover, we have shown that we can set the matrix . Thus, equation (16c) leads to . This means that the admissible initialization set coincides with , i.e., the distributed algorithm works only if initialized in the reachable subspace , which means that has to be chosen so that . This, in turn, is consistent with Remark II.2. However, we point out that is necessary only if , as otherwise different choices of might be possible.

We underline that the only parameters of the problem (1)-(2) that need to be known for the design of the gains and (fulfilling the stability requirement of Proposition IV.4) are the matrices . Nevertheless, due to the continuity of the eigenvalues of the closed-loop matrix with respect to variations in and , we also observe that whenever internal stability of is ensured for a “nominal” value of , it also holds for all the actual values of in a sufficiently small open neighborhood of .

More in general, well-known results in the context of (hybrid) dynamical systems (see, e.g., [20, Proposition 6.34]), show that any algorithm of the form (11) fulfilling the conditions of Proposition III.1 is “robust” with respect to parameter perturbations and measurement noise. That is, for sufficiently small perturbations and noise, boundedness of the closed-loop trajectories is preserved, and the asymptotic error from the optimal solution is related to the noise bound.

Finally, we underline how well-known arguments on homogeneous approximations of nonlinear systems (see, e.g., [20, Theorem 9.11]) can be used to show that the global (in ) result of Proposition III.1 implies a local (in ) result when sufficiently regular nonlinearities comes into play. This, in turn, permits to extend the presented results “locally” to optimization problems of the form (1)-(2) with nonlinear, strongly convex functions and smooth .

V Conclusions

In this paper we proposed a system theoretical approach to analyze a class of gradient tracking algorithms for distributed quadratic optimization. We formulated the design of a distributed algorithm as the design of a (linear) dynamic regulator solving a set-point control problem. We highlighted structural properties of the designed regulator and we showed that they are fulfilled by the gradient tracking. Moreover, we proved how lack of reachability of the closed-loop system imposes conditions on the initialization of distributed algorithms with this structure. The proposed system theoretical perspective suggests that robustness arguments, customary in control theory, can be used to draw similar conclusions on the optimization algorithms. Finally, these results pave the way to more general technical tools for the analysis of nonlinear, distributed optimization problems.

References

  • [1] P. Di Lorenzo and G. Scutari, “Next: In-network nonconvex optimization,” IEEE Trans. on Signal and Information Processing over Networks, vol. 2, no. 2, pp. 120–136, 2016.
  • [2] D. Varagnolo, F. Zanella, A. Cenedese, G. Pillonetto, and L. Schenato, “Newton-Raphson consensus for distributed convex optimization,” IEEE Trans. on Autom. Control, vol. 61, no. 4, pp. 994–1009, 2016.
  • [3] A. Nedić, A. Olshevsky, and W. Shi, “Achieving geometric convergence for distributed optimization over time-varying graphs,” SIAM Journal on Optimization, vol. 27, no. 4, pp. 2597–2633, 2017.
  • [4] G. Qu and N. Li, “Harnessing smoothness to accelerate distributed optimization,” IEEE Trans. on Control of Network Systems, vol. 5, no. 3, pp. 1245–1260, 2018.
  • [5] J. Xu, S. Zhu, Y. C. Soh, and L. Xie, “Convergence of asynchronous distributed gradient methods over stochastic networks,” IEEE Trans. on Autom. Control, vol. 63, no. 2, pp. 434–448, 2018.
  • [6] C. Xi, R. Xin, and U. A. Khan, “ADD-OPT: Accelerated distributed directed optimization,” IEEE Trans. on Autom. Control, vol. 63, no. 5, pp. 1329–1339, 2018.
  • [7] R. Xin and U. A. Khan, “A linear algorithm for optimization over directed graphs with geometric convergence,” IEEE Control Systems Letters, vol. 2, no. 3, pp. 315–320, 2018.
  • [8] G. Scutari and Y. Sun, “Distributed nonconvex constrained optimization over time-varying digraphs,” Mathematical Programming, vol. 176, no. 1-2, pp. 497–544, 2019.
  • [9] A. Nedić and A. Ozdaglar, “Distributed subgradient methods for multi-agent optimization,” IEEE Trans. on Autom. Control, vol. 54, no. 1, pp. 48–61, 2009.
  • [10] A. Nedić, A. Ozdaglar, and P. A. Parrilo, “Constrained consensus and optimization in multi-agent networks,” IEEE Trans. on Autom. Control, vol. 55, no. 4, pp. 922–938, 2010.
  • [11] M. Zhu and S. Martínez, “Discrete-time dynamic average consensus,” Automatica, vol. 46, no. 2, pp. 322–329, 2010.
  • [12] S. S. Kia, B. Van Scoy, J. Cortés, R. A. Freeman, K. M. Lynch, and S. Martínez, “Tutorial on dynamic average consensus: the problem, its applications, and the algorithms,” preprint arXiv:1803.04628, 2018.
  • [13] J. Wang and N. Elia, “Control approach to distributed optimization,” in IEEE Allerton Conf. on Communication, Control, and Computing, 2010, pp. 557–561.
  • [14] ——, “A control perspective for centralized and distributed convex optimization,” in IEEE Conf. on Decision and Control and European Control Conf. (CDC)-(ECC), 2011, pp. 3800–3805.
  • [15] L. Lessard, B. Recht, and A. Packard, “Analysis and design of optimization algorithms via integral quadratic constraints,” SIAM Journal on Optimization, vol. 26, no. 1, pp. 57–95, 2016.
  • [16] B. Hu and L. Lessard, “Control interpretations for first-order optimization methods,” in IEEE American Control Conf. (ACC), 2017, pp. 3114–3119.
  • [17] A. Sundararajan, B. Hu, and L. Lessard, “Robust convergence analysis of distributed optimization algorithms,” in IEEE Allerton Conf. on Communication, Control, and Computing, 2017, pp. 1206–1212.
  • [18] T. Hatanaka, N. Chopra, T. Ishizaki, and N. Li, “Passivity-based distributed optimization with communication delays using PI consensus algorithm,” IEEE Trans. on Autom. Control, vol. 63, no. 12, pp. 4421–4428, 2018.
  • [19] D. P. Bertsekas, Nonlinear programming.   Athena scientific, 1999.
  • [20] R. Goebel, R. G. Sanfelice, and A. R. Teel, Hybrid Dynamical Systems. Modeling, Stability, and Robustness.   Princeton University Press, 2012.