Distributed and time-varying primal-dual dynamics via contraction analysis

03/27/2020 ∙ by Pedro Cisneros-Velarde, et al. ∙ The Regents of the University of California 0

In this paper, we provide a holistic analysis of the primal-dual dynamics associated to linear equality-constrained optimization problems and aimed at computing saddle-points of the associated Lagrangian using contraction analysis. We analyze the well-known standard version of the problem: we establish convergence results for convex objective functions and further characterize its convergence rate under strong convexity. Then, we consider a popular implementation of a distributed optimization problem and, using weaker notations of contraction theory, we establish the global exponential convergence of its associated distributed primal-dual dynamics. Moreover, based on this analysis, we propose a new distributed solver for the least-squares problem with global exponential convergence guarantees. Finally, we consider time-varying versions of the centralized and distributed implementations of primal-dual dynamics and exploit their contractive nature to establish asymptotic bounds on their tracking error. To support our convergence analyses, we introduce novel results on contraction theory and specifically use them in the cases where the analyzed systems are weakly contractive (i.e., has zero contraction rate) and/or converge to specific points belonging to a subspace of equilibria.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Problem statement and motivation

Primal-dual dynamics are dynamical systems that have been used for solving constrained optimization problems. These dynamics have a long history of development, whose origins and study can be traced back to many decades ago [23, 4] and which have experienced a regained interest since the last decade [16]. Primal-dual dynamics are arguably relevant in applications where convergence to a solution of a constrained optimization problem is required, and their scalability and simplicity have made them popular. They have been widely adopted in a variety of engineering applications, such as resource allocation problems in power networks [39], frequency control in micro-grids [30], solvers for linear equations [44], etc. In this paper, we study optimization problems with linear equality constraints. In general, primal-dual dynamics seek to find a saddle point of the associated Lagrangian function to the constrained problem, which is convex in the primal variable (i.e., the optimization solution or optimizer) and concave on the dual variable (i.e., the Lagrange multiplier of the constraint); i.e., the equilibria of the dynamics characterize these saddle points.

Moreover, it is possible to modify the Lagrangian to enhance different properties; for example, the so-called augmented Lagrangian [38, 11] enhances convexity properties with respect to the primal variable without altering the saddle points, and has been interpreted as adding a PI controller to the standard primal-dual dynamics [43]. For a general treatise of asymptotic stability of the saddle points of a Lagrangian function for primal-dual dynamics, we refer to the works [18, 9] and references therein. However, despite its long history of study and application, there are very recent studies and ongoing research on primal-dual dynamics related to linear equality constraints that further study different dynamic properties, such as: exponential convergence under different convexity assumptions on the objective function [36, 8] and contractivity properties [32].

Parallel to the interest in primal-dual dynamics, an important research area that has a long-standing interest has been the distributed implementation of solvers of optimization problems whose objective function is amenable to being expressed as a finite sum of functions. In these problems, a group of agents aims to minimize the objective function by having each agent locally minimizing an assigned function and communicating their states with neighboring agents on some underlying communication graph. We refer to the recent survey [45] and references therein. A critical question is to know, for example, whether the convergence rate to a solution of an optimization problem is exponential when using a distributed implementation, provided that it is exponential when using a non-distributed or centralized solver.

We can conclude that, for any dynamics that solve an optimization problem, whether centralized or distributed, all the aforementioned works stress the importance and characterization of strong convergence properties, which we summarize as follows: 1) global convergence of the dynamics or, otherwise, to have the primal variable always converge to the optimum; and 2) the convergence being at exponential rates. These strong convergence properties are valuable in practice since they lead to a faster optimization solver and introduce robustness on the system: if the dynamics are momentarily perturbed, it is guaranteed the optimum is always reached. As pointed out in [11, 36], these important properties also help to assess the convergence rate of discrete-time implementations of the primal-dual dynamics, e.g., they may lead to discrete-time algorithms with geometric convergence rate. In response to these necessities, our paper focuses on these strong convergence properties for both centralized and distributed systems using contraction theory, in contrast to the prevalent Lyapunov or invariance analysis found in the literature. Moreover, due to its exponential incremental stability properties, contractive systems enjoy certain additional robustness properties, as specified in [28].

There has also been a recent growing interest in time-varying optimization, i.e., constrained optimization problems where the objective function and/or the constraints are time dependent, motivated by applications in system identification, signal detection, robotics, network traffic management, etc. [15, 41]. In these problems, we should employ a dynamical system that is able to track the time-varying optimal solution up to some bounded error in real time (naturally, the tracking may also be established on the dual variables). Although different dynamics have been proposed to solve both time-varying centralized [15] and distributed implementations [41, 37], to the best of our knowledge, there has not been a characterization of the primal-dual dynamics in such application contexts. The importance of primal-dual algorithms is their simplicity of implementation, i.e., they do not require more complex information structures like the inverse of the Hessian of the system at all times, as in [15] and [37] for the centralized and distributed case respectively. However, this simplicity may come with the possible trade-off of having a non-zero asymptotic tracking error bound. For example, asymptotic errors are present in gradient flows that look for a minimum of a function [35]. Therefore, there is a need for further understanding the performance of time-varying primal-dual dynamics, and for this, our paper takes advantage of the contractive nature of their time-invariant counterparts.

Literature review

In the previous paragraphs we saw that the recent works [36, 32, 8] study convergence properties of different primal-dual dynamics under different assumptions on the objective function from the associated optimization problem. In the context of distributed optimization, solvers based on primal-dual dynamics are fairly recent, e.g., see [43, 11, 45]. On the other hand, in the context of distributed algorithms, the problem of solving a system of linear equations in a distributed fashion is of current interest, as seen in the recent survey [44] and references therein. In particular, there is interest in the distributed least-squares problem, in which a group of agents in a network converges to the least-squares solution of an over-determined system of linear equations. To the best of our knowledge, solvers for the distributed least-squares problem (in continuous-time) with exponential global convergence guarantees are still missing in the literature.

We also remark that in this paper (and in the references cited above), we are only concerned with the study of continuous-time primal-dual dynamics. However, there also exists a complementary literature on the analysis of its discrete-time counterpart, e.g., the recent work [40] includes some contraction studies and the works [17, 20] consider a different approach to study the linear convergence rate of distributed algorithms.

Finally, this paper is related to contraction theory, a mathematical tool to analyze the incremental stability of nonlinear systems [28, 42]. An introduction and survey on contraction theory can be found in [2]. Different variants of contraction theory exists, and, in particular, partial contraction [34, 14], used for the study of convergence to linear subspaces, has proven to be useful in the synchronization analysis of diffusively-coupled network systems [34, 12, 3]; however, its study in applications related to distributed algorithms is still missing, and our paper provides such contribution.

Contributions

In this paper we consider the primal-dual (PD) dynamics associated to optimization problems with an arbitrary objective function and linear equality constraints. We use tools from contraction theory to perform a holistic study of these dynamics in a variety of implementations and applications. In particular:

(i) We introduce new theoretical results on contraction theory. Among these results, we show how the notions of weak and partial contraction can imply exponential convergence of a system to an equilibrium point that belongs to a subspace of a continuum of equilibria.

(ii) We use the theory of weakly contractive systems to provide, to the best of our knowledge, a new result on the convergence of the PD dynamics when the objective function is convex.

(iii) We provide a new proof that shows contraction for the PD dynamics when the objective function is strongly convex, using an integral contractivity condition. Compared to the work [32] that also shows contractivity, our new proof method provides an explicit closed-form expression of the system’s contraction rate. Our exponential convergence rate is different from the one obtained by [36] via Lyapunov analysis, and we remark that their rate and ours are not directly comparable unless we make further assumptions on the numerical relationships among various parameters associated to the optimization problem’s objective function and/or constraints. As a corollary, we also use contraction theory to analyze the PD dynamics resulting from the associated augmented Lagrangian and provide sufficient conditions that ensure global exponential convergence. In contrast to the recent work [11]

which also proves exponential convergence using Lyapunov analysis, we provide an explicit estimate of the convergence rate and explicit sufficient conditions for the convergence of our particular problem.

(iv) We consider a popular implementation of a distributed optimization problem (as seen, for example, in the survey work [45]) and analyze its associated PD dynamics with our newly introduced results on contraction theory to: 1) prove convergence when the objective function of the optimization problem is convex in some of its domain, and 2) prove global exponential convergence when the function is strongly convex and provide a closed-form convergence rate. These two contributions, to the best of our knowledge, are new in the literature. We remark there exists other different distributed solvers for optimization problems that show exponential convergence, e.g., as in [22, 25], and none of these works use contraction theory.

(v) We propose a new solver for the distributed least-squares problem based on PD dynamics, and use our new contraction theory results to prove its convergence. Compared to the recent work [26], our new model exhibits global convergence; and compared to the solvers proposed in [46, 27], our new model exhibits convergence at an exponential rate and has a simpler structure.

(vi) Finally, we characterize the performance of the PD dynamics associated to time-varying linear equality constrained optimization problems and show that the tracking error to the time-varying solutions is asymptotically bounded and we further characterize it in terms of the problem parameters. This characterization is possible from the contraction results pointed out in contributions (iii) and (iv). We study two problems: 1) the linear equality constrained problem when both the objective function and constraints are time-varying; and 2) the distributed computation of an unconstrained problem over a fixed network where each agent’s objective function is time-varying at possibly different rates (this problem is cast as a linear equality constrained problem). To the best of our knowledge, these two results are novel in the literature, and their importance is the characterization of the performance of the popular PD dynamics in rather general time-varying settings.

Paper organization

Section 2 introduces notation and preliminary concepts. Section 3 introduces new results on contraction theory. Section 4 analyzes contractive properties of the PD dynamics associated to linear equality constrained problems. Section 5 analyzes a distributed implementations of PD dynamics and the distributed least-squares problem. Section 6 analyzes different PD dynamics for time-varying optimization problems. Section 7 is the conclusion.

2 Preliminaries and notation

2.1 Notation, definitions and useful results

Consider a real matrix , then

denote its minimum singular value and

its maximum one. If

has only real eigenvalues, let

denote its maximum eigenvalue. is an orthogonal projection if it is symmetric and . Let denote any norm, denote the -norm, and denote a weighted -norm with non-singular matrix , i.e.,

for any appropriate vector

. When the argument of a norm is a matrix, we refer to its respective matrix induced norm.

Let be the identity matrix, and be the all-ones and all-zeros column vector with entries respectively. Let represent a block-diagonal matrix whose elements are the matrices . Let be the set of non-negative real numbers. Given , we let .

Definition 2.1 (Lipschitz smoothness and strong convexity).

Consider a differentiable function . We say that function is

  1. Lipschitz smooth with constant if for any ;

  2. strongly convex with constant if for any .

Assuming is twice differentiable, conditions (i) and  (ii) are equivalent to and for any , respectively.

Proposition 2.1.

Consider the matrix with an symmetric real matrix and a full-row rank matrix. Then, this matrix is Hurwitz, i.e., all eigenvalues have negative real part.

Proof.

See the Appendix. ∎

2.2 Review of basic concepts on contraction theory

Assume we have the dynamical system

(1)

with . For any and every , the trajectory of the system (1) starting from at time is denoted by . The dynamical system (1) has exponential incremental stability if, for any and any , the two trajectories and satisfy . We say a system is contractive with respect to norm when , and we say it is weakly contractive with respect to norm when . Moreover, if the system is contractive, then there must exist a unique equilibrium point and this equilibrium point is globally exponentially stable. Note, however, that if a system converges exponentially to a fixed point, then it does not necessarily follows that the system is contractive (for an example in this paper, see Remark 5.2). A central concept for studying contractivity is the notion of matrix measure. Given a norm on , the associate matrix measure on the space of matrices is defined by . For a given matrix ; the matrix measure associated to the -norm is  [2]. For the weighted -norm with non-singular matrix , the associated matrix measure is  [2]. We refer to the reference [13] for further properties of matrix measures.

Now, assume the Jacobian of the system (1), i.e., , satisfies: for any , with being the induced matrix measure from norm and being some constant. Then, a very known result of contraction theory is that this system has contraction rate with respect to  [24, 33]. Now, assume that the system (1) has a flow-invariant linear subspace with being a full-row rank matrix with orthonormal rows. Then we say that the system is partially contractive with respect to norm and subspace if there exists such that, for any and , the two trajectories of the system and satisfy . When , we say the system is partially weakly contractive with respect to . As a consequence, if the system is partially contractive, then any of its trajectories approaches with an exponential rate111 More formally, this result comes from being an orthogonal projection to (see the proof of Theorem 3.1).. On the other hand, if the system is partially weakly contractive, then any of its trajectories have a non-increasing distance from . We remark that the concept of partially contractive systems was previously introduced in [34] under a different alternative formulation for weighted -norms. Another concept relevant to contractivity is the so-called QUAD condition for dynamical systems [12]. The following result, whose proof follows from [12] and [33], shows the connection between the QUAD condition and exponential incremental stability.

Lemma 2.1 (Integral contractivity condition).

Consider a continuously-differentiable dynamical system on with Jacobian , and pick a symmetric positive-definite matrix and a positive scalar . Then, the following statements are equivalent:

  1. for all ,

  2. satisfies the integral contractivity condition, i.e., for every and ,

3 Theoretical contraction results

We present two theoretical results about the contractivity of dynamical systems. First, we extend the result for weakly contractive systems in [29, Lemma 6] to arbitrary norms and for the time-varying case.

Lemma 3.1 (Convergence of weakly contractive systems).

Consider the dynamical (1) where is continuously-differentiable with respect to and weakly contractive respect to some norm , and let be an equilibrium for the system, i.e., , for every . Then is locally asymptotically stable if and only if it is globally asymptotically stable.

Proof.

See the Appendix. ∎

The second result characterizes the partial contractivity of dynamical systems and extends the results in [34].

Theorem 3.1 (Results on partially contractive systems).

Consider the system (1) with a flow-invariant subspace with being a full-row rank matrix with orthonormal rows. Assume for any , some constant and some matrix measure .

  1. If , then the system (1) is partially contractive with respect to and every trajectory of the system exponentially converges to the subspace with rate .

  2. If and for any , then the system (1) is partially weakly contractive with respect to and every trajectory of the system converges to the subspace .

Moreover, assume that one of the conditions in parts (i) and (ii) holds and is a set of equilibrium points. If the system is weakly contractive, then

  1. every trajectory of the system converges to an equilibrium point, and if , then it does it with exponential rate .

Finally, if the matrix that defines does not have orthonormal rows, we have that

  1. if for any and some constant , then the system is partially contractive with respect to and every trajectory of the system converges exponentially to the subspace .

Remark 3.2.

Statement (i) of Theorem 3.1 was formally proved in the work [34], and statement (iv) was mentioned in the same work without a proof. The rest of the results, to the best of our knowledge, are novel.

Proof of Theorem 3.1.

We first prove statements (i) and (ii). It is easy to check that is an orthogonal projection matrix. Now, observe that , and so projects onto . Now, observe that is also an orthogonal projection matrix whose image is , i.e., projects onto . Observe that and . Then, for any we have the decomposition with and such that and .

Using these results, we can express the given system as . Now, we set , and observe that converges to if and only if converges . Then, using this change of coordinates, we obtain the system:

(2)

We claim that is a fixed point for the system (2). To see this, if , then the system (2) becomes ; and since and is a flow-invariant linear subspace, we have that , and thus, for all , we get 222If a linear subspace is flow-invariant for , then for any .. Therefore, if we can prove that the system (2) is contractive, then should converge to exponentially fast. Now, the Jacobian of the system (2) is , which becomes , and therefore, if for any , some constant and some matrix measure , then any trajectory of the system exponentially converge to the subspace with rate . This finishes the proof for statement (i).

To proof (ii), assume that for any ; i.e., that the system (2) is weakly contractive. Now, if we assume that for any , then by the Coppel’s inequality [10], the fixed point is locally exponentially stable. Now, we can use Lemma 3.1 to establish the convergence of to . This finishes the proof for (ii).

Now, we prove statement (iii). Let be a trajectory of the dynamical system. For every , is the orthogonal projection of onto the subspace and it is an equilibrium point. Since the dynamical system is weakly contractive, we have , for all . This implies that, for every and every , the point remains inside the closed ball . Therefore, for every , the point is inside the set defined by . It is easy to see that, for , we have . This implies that the family is a nested family of closed subsets of . Moreover, by parts (i) and (ii), we have that as , which in turn results in , with convergence rate for the case because of . Thus, by the Cantor Intersection Theorem [31, Lemma 48.3], there exists such that . We first show that . Note that , for every . This implies that . This in turn means that and converges to , with convergence rate for the case . On the other hand, by part (i), the trajectory converges to the subspace . Therefore, and is an equilibrium point of the dynamical system. This completes the proof for statement (iii).

Now, we prove statement (iv). Since does not have orthonormal rows and its rows form a basis for

, we have that an application of the Gram-Schmidt process can produce a non-singular matrix

such that with the rows of being an orthonormal basis for . Then, by the Sylvester’s law of inertia [19], we have that and have the same inertia. This immediately implies that for any and , we have that

for some constant if and only if

for some constant . Finally, this last result and statement (i) imply statement (iv). ∎

4 Linearly constrained optimization problems

We consider the constrained optimization problem:

(3)

with the following standing assumptions: , , , is full row rank, and is convex and twice differentiable.

Associated to the optimization problem (3) is the Lagrangian function defined by

(4)

and the primal-dual dynamics defined by

(5)

To study the primal-dual dynamics, we introduce two possible sets of assumptions:

  1. [label=(A0)]

  2. the primal-dual dynamics (5) have an equilibrium and ;

  3. the function is strongly convex with constant and Lipschitz smooth with constant , and, for , we define

(6)
Theorem 4.1 (Contraction analysis of primal-dual dynamics).

Consider the constrained optimization problem (3), its standing assumptions, and its associated primal-dual dynamics (5).

  1. The primal-dual dynamics is weakly contractive with respect to and, if Assumption (i) holds, then is globally asymptotically stable.

  2. Under Assumption (ii),

    1. the primal-dual dynamics are contractive with respect to with contraction rate

      (7)
    2. there exists a unique globally exponentially stable equilibrium point for the primal-dual dynamics and is the unique solution to the constrained optimization problem (3).

Proof.

First, observe that is a fixed point for the system, and that

Now, observe that

for any , because of the convexity of , i.e., . Now, using Proposition 2.1 we have that is Hurwitz since . Then, we use Proposition 3.1 to prove statement (i).

Now, we prove statement (ii). Define which is a positive definite matrix when

(8)

We plan to use the integral contractivity condition to show that the system (5) is contractive with respect to norm . Then, we need to show

for any and , and some constant . After completing squares and using the strong convexity of we obtain

Moreover,

where we used the Lipschitz smoothness of , and and . Set for some . Then, to ensure that , we need to ensure

(9)
(10)

Now, to ensure inequality (9) holds, using the inequalities (8), it is easy to see that it suffices to ensure that

(11)

Now, using inequalities (8) and (11), we obtain: and so, to ensure inequality (10) holds, it suffices that

(12)

Now, observe that the parameter needs to satisfy inequalities (8) and (12). However, the inequality for let us conclude that , from which it immediately follows that it suffices for to satisfy only inequality (12). Finally, we conclude that the contraction rate is less than the multiplication of the left-hand sides of the inequalities (11) and (12), which proves statement (ii)a.

Now, since the dynamics are contractive, there must exist a globally exponentially stable equilibrium point . Observe that such equilibrium point satisfies the KKT conditions for the optimization problem (3), which are necessary and sufficient conditions of optimality in this case [6], thus proving statement (ii)b. ∎

Remark 4.2.

We make the following remarks

  1. Theorem 4.1 is a fundamental building block for the rest of results in this paper and therefore, it was necessary to provide a comprehensive proof using the integral contractivity condition that could provide an explicit estimate of the contraction rate (as opposed to the different proof in [32]).

  2. Theorem 4.1(i) relies on two fundamental properties of the primal-dual dynamics (5): (a) it is weakly contractive, and (b) . If condition (b) does not hold, it is still possible to use LaSalle’s invariance principle to show the convergence to a solution which has constant distance from any saddle point of the Lagrangian [18]; however, oscillations may appear in the dynamics and convergence to the saddle points is not guaranteed [16, 18].

Theorem 4.1(i) shows that the primal-dual dynamics associated to the optimization problem (3) is always weakly contractive. However, as pointed out by statement (ii) of Remark 4.2, without any additional assumption, convergence of the trajectories to a saddle point of the Lagrangian is not guaranteed. In particular, this could happen when is convex but not strictly convex. For this reason, we analyze a modification to the Lagrangian (4), known as the augmented Lagrangian [38, 11], whose contractivity analysis we defer to the appendix. :

(13)

with gain . Its associated augmented primal-dual dynamics become

(14)

Note that these new dynamics have the same equilibria as the original one in equation (5). To study these dynamics we introduce two possible sets of assumptions:

  1. [label=(A0)]

  2. the primal-dual dynamics (5) have an equilibrium and ;

  3. for any and is Lipschitz smooth with constant , and, for , we define

(15)
Corollary 4.3 (Contraction analysis of the augmented primal-dual dynamics).

Consider the constrained optimization problem (3), its standing assumptions, and its associated augmented primal-dual dynamics (14) with .

  1. Under Assumption (iii), the augmented primal-dual dynamics are weakly contractive with respect to and is globally asymptotically stable.

  2. Under Assumption (iv),

    1. the augmented primal-dual dynamics are contractive with respect to with contraction rate

      (16)
    2. there exists a unique globally exponentially stable equilibrium point for the augmented primal-dual dynamics and is the unique solution to the constrained optimization problem (3).

Proof.

First, note that implies that , and since the Jacobian of the system is , we can follow the same proof as for statement (i) in Theorem 4.1 to finish the proof of statement (i).

Now, observe that for any implies that is Lipschitz smooth with constant and strongly convex with constant . Then statement (ii) follows immediately from the proof of Theorem 4.1. ∎

5 Distributed algorithms

5.1 Distributed optimization as a linear equality constrained optimization

We study a popular distributed implementation for solving an unconstrained optimization problem; as seen, for example, in the recent survey [45]. Assume we want to minimize the objective function which can be expressed as a sum of functions, i.e., with being a convex function:

(17)

Let be an undirected connected graph with nodes, which represents the interaction graph between distinct agents represented by the nodes in the graph. Let be the neighborhood of node and be the corresponding Laplacian matrix of . Let be the state associated to agent , and let . Then, we transform problem (17) into the following linear equality constrained optimization problem:

(18)

The associated distributed primal-dual dynamics are

(19)

with being the dual variable of the constraint in the problem (18) (we used the fact that for undirected ). The system (19) is distributed because, for agent , the update of her variables depend only on her own information (i.e., and ) and on the set of her neighbors (i.e., and for ).

To study this system, we introduce two possible sets of assumptions:

  1. [label=(A0)]

  2. the optimization problem (17) has a solution and for any ;

  3. the optimization problem (17) has a solution and the function is strongly convex with constant and Lipschitz smooth with constant for any , with and .

Theorem 5.1 (Contraction analysis of distributed primal-dual dynamics).

Consider the distributed primal-dual dynamics (19).

  1. The distributed primal-dual dynamics are weakly contractive with respect to , and

  2. under Assumption (v), for any , and , for some such that .

  3. Under Assumption (vi), the convergence results in statement (ii) hold and, for , the convergence of has exponential rate

    (20)

    where and are the smallest non-zero and the largest eigenvalues of , respectively.

Proof.

Since the graph is connected and undirected, the eigenvalue zero of

is unique, with associated eigenvector

, . Then, if and only if , ; and thus the objective function of the problem (18) is of the form . Thus, there exists a one to one mapping between the solutions of the problems (17) and (18).

Now, set and . Succinctly, the dynamics of the system are

(21)

Consider the equilibrium equations of (21), and let be a (candidate) fixed point of the system. From the second equation in (21), with (see discussion above). Now, from the first equation in (21), we get and left multiplying by , we obtain , and since , we obtain . This equation is exactly the necessary and sufficient conditions of optimality [6] for the problem (17). Then, is an optimal solution to (17). Moreover, is simply some Lagrange multiplier for the second constraint in (18).

Now, let us define . Let