A preconditioned Forward-Backward method for partially separable SemiDefinite Programs

11/17/2019 ∙ by Filippo Fabiani, et al. ∙ Delft University of Technology 0

We present semi-decentralized and distributed algorithms, designed via a preconditioned forward-backward operator splitting, for solving large-scale, decomposable semidefinite programs (SDPs). We exploit a chordal aggregate sparsity pattern assumption on the original SDP to obtain a set of mutually coupled SDPs defined on positive semidefinite (PSD) cones of reduced dimensions. We show that the proposed algorithms converge to a solution of the original SDP via iterations of reasonable computational cost. Finally, we compare the performances of the two proposed algorithms with respect to others available in the literature.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Semidefinite programs (SDPs), i.e., convex optimization problems over the cone of psd matrices, find applications in many engineering areas, since semidefinite constraints and linear matrix inequalities characterize problems related to sensor networks [1, 2], smart grids [3], robustness analysis of interconnected, uncertain systems [4]

, machine learning and operations research

[5]. Additionally, SDP offer a convex relaxation framework that is applicable to several nonconvex optimization problems, e.g., optimal power flow [6, 7].

Despite these appealing features, the computational complexity of SDP represents an important issue, since they scale badly with both the dimension of the semidefinite constraints and the number of optimization variables. This dramatically limits, or even makes vain [4, 8], the possibility of solving large-scale problems in a centralized fashion. Therefore, since certain applications impose intrinsic privacy limitations that rule out centralized computation, it is fundamental to develop scalable and parallelizable algorithms.

Existing works that propose distributed approaches rely on a specific sparsity structure of the SDP. Specifically, when the graph associated with the aggregate sparsity pattern of the data matrices enjoys the chordality property, the original SDP can be decomposed into a set of mutually coupled SDP defined on cones of smaller dimensions. However, the available distributed, first-order algorithms are either ad-hoc for the specific problem considered, e.g. [3, 9, 2, 1], or are variants of the alternating direction method of multipliers (ADMM) [10, 11], a special instance of the Douglas-Rachford operator splitting [12, §25.2]. Along the same line, in [13] an operator-splitting method has been proposed, whose convergence requires to solve a quadratic program at every iteration. The authors in [14] introduced a proximal bundle method encompassing a message-passing strategy, which in turn has been recently exploited in [15] together with an interior-point method, to overcome the typical issues of second-order methods shown, e.g., in [16, 17].

Differently from the aforementioned literature, in this paper we provide a unifying operator-theoretic perspective to the design of scalable and distributed algorithms for decomposable, large-scale SDP, paving the way for a systematic development of efficient methods that rely on monotone operator theory as a general purpose tool. Specifically, we discuss a semi-decentralized and a fully distributed version of a pFB method, recently proposed in [18] to solve generalized Nash equilibrium problems. Unlike the standard FB method, a preconditioning matrix allows to circumvent intrinsic technical issues, e.g., the impossibility to explicitly compute the resolvent of an operator, by determining the step sizes and the exploration direction of the forward iteration. We exploit the chordal structure of the original problem to obtain a multi-agent SDP with coupling constraints. In this context, the proposed algorithms guarantee global convergence to an optimal solution by requiring local evaluations of projections on psd cones of reduced dimensions, which is the major computational burden.

After briefly reviewing some notions of graph theory (§II), under the assumption of chordal sparsity pattern we describe how to decompose an SDP into mutually coupled SDP (§III). Finally, in §IV, we apply an operator-splitting approach for the design of the pFB-based algorithms, and in §V, we compare the two methods with available algorithms in the literature on illustrative examples.

Basic notation

denotes the set of

real vectors,

is the space of symmetric matrices and is the cone of psd matrices. () denotes a matrix/vector with all elements equal to (). For vectors and , we denote . The symbol denotes the inner product in the appropriate space, i.e., for , and for , . Given a matrix , its entry is denoted by , while denotes its transpose. represents the Kronecker product between the matrices and . Given a discrete set , denotes its cardinality. The operator maps a matrix to a vector that stacks its columns; performs the inverse.

Operator-theoretic definitions

denotes the identity operator. The mapping represents the indicator function for a set , i.e., if , otherwise. The mapping denotes the projection onto the set w.r.t. the Euclidean norm when the argument is a vector, to the Frobenius norm when it is a matrix. The set-valued mapping denotes the normal cone operator for the set , i.e., if , otherwise. denotes the set of zeros of a set-valued mapping .

Ii Mathematical problem setup

We first recall some notions of graph theory and chordality [19, 20], necessary to introduce the partially separable structure addressed in the remainder. Then, we review a key result to decompose a convex cone, i.e., the cone of psd matrices, into a set of smaller but coupled convex cones.

Ii-a Graph theoretic concepts

Let be a graph defined by a set of vertices and a set of edges . The graph is undirected when if and only if , while it is complete if every two vertices share exactly one edge. The set of neighboring vertices of is denoted by . A path of is a sequence of distinct vertices such that any consecutive vertices in the sequence correspond to an edge of graph . Let be the adjacency matrix of with if , otherwise, and let us assume . The weighted degree matrix is , , and the weighted Laplacian of is .

A clique is defined as the complete subgraph induced by the set of vertices , i.e., such that for any distinct vertices , and it is not a subset of any other clique. Given a path , a chord is an edge with , while a cycle is a path with and for . An undirected graph is chordal if every cycle with has at least one chord. Every nonchordal graph can be extended to a chordal graph by adding a (minimum) number of edges to the original graph (minimum fill-in problem, NP-complete [19]). Typical examples of chordal graphs are the complete graphs, trees and forests, and undirected graphs with no cycles of length greater than three.

Ii-B Separable cones and chordal decomposition

Given a sparse matrix , the associated sparsity pattern can be represented as an undirected graph where each clique defines a maximally dense principal submatrix of , i.e., , by means of an entry selection matrix defined as

where is the -th element in , sorted in natural ordering. Conversely, returns a matrix in the original space. Note that, given cliques and associated to the same graph, and have overlapping elements if .

By considering , we define the space of symmetric matrices with sparsity pattern as

Next, we recall a well-known result that links a matrix belonging to the cone generated by the projection of the psd cone onto the space of matrices with sparsity pattern , i.e., , and its maximal principal submatrices.

Lemma 1

[21, Th. 7] Let be a chordal graph and let be the set of its cliques. Then, if and only if

Informally speaking, a matrix is positive semidefinite if and only if its maximal dense principal submatrices , , are positive semidefinite, i.e., the psd property of can be checked on matrices of smaller dimensions. To recap the introduced concepts, let us show an example on a simple graph.

Example 1
Fig. 1: (a) Sparsity pattern of a given matrix . (b) Graph associated with the sparsity pattern of .

The sparsity pattern of a given matrix in Fig. 1(a) can be summarized in the graph with seven vertices depicted in Fig. 1(b). The graph happens to be chordal, since every cycle of length strictly greater than three has a chord, e.g., is a chord for the cycle . Moreover, it is characterized by four cliques, namely , , and . For instance, let us consider . Then,

where is the maximally dense principal submatrix of related with . It follows from Lemma 1 that is psd with sparsity pattern if and only if , . Finally, note that , and therefore matrices and shall be coupled via the elements , and . The same considerations apply to and .

Iii Decomposition in sparse Sdp

We consider an SDP optimization problem in standard primal form [22]:

(1)

where , and for all , whose optimal solutions belong to the set

(2)

which is assume to be nonempty. Moreover, we make the following assumption that characterize the data in (1).

Standing Assumption 1

The aggregate sparsity pattern of (1) is identified by the graph , i.e., and , for all .

Such a sparsity pattern also reflects on the standard dual form of (1) with dual variables and , i.e.,

(3)

where . Clearly, in view of the equality constraint in (3), the slack matrix is necessarily sparse at any dual feasible point, with sparsity pattern . On the other hand, the primal variable is typically dense, but one can observe that both the cost function and the equality constraints in (1) depend on the entries such that . The remaining entries are arbitrary and guarantee the psd property of (matrix completion problem). Thus, according to the domain-space decomposition in [16], the conic constraint in (1) can be equivalently rewritten as . We then assume the following.

Standing Assumption 2

The graph is chordal, undirected and connected.

Thus, in view of Lemma 1, we follow the so called clique-tree conversion introduced in [16, 23] to decompose the original SDP in (1). Specifically, let , , be the set of cliques of , and let be the set containing the indices of all the cliques such that , . The SDP in (1) translates into

(4)

where , while the coefficient matrices and are chosen so that and for all . Remarkably, their choice is not unique. Some examples on how to chose such matrices are in [13], while for the remainder we postulate the following assumption.

Standing Assumption 3

For all , are linearly independent matrices.

The SDP in (4) is fully decomposed in terms of cost function and it is defined on psd conic constraints of smaller dimensions. However, it shows couplings among the dense submatrices in the two sets of equality constraints. Specifically, while the first set is directly inherited from (1), the second set refers to the consistency constraints, which ensure that the entries of the dense submatrices and agree when share the same entries of the original in (1). Note that, all these constraints are affine in the variable , since they involve matrix multiplications only. Thus, the decomposed structure in (4), which is quite general for a sparse SDP under Standing Assumption 2, enables and motivates us to develop semi-decentralized/distributed algorithms that allow to compute a solution by treating each clique as a single decision agent that controls its local variable, .

Iii-a Vector notation for decomposed Sdp

For our purposes, i.e., to formalize our multi-agent optimization problem and, consequently, to develop solution algorithms, it might be convenient to adopt a vector notation for the matrix variables in (4). Specifically, let us now define , for all , and for all and all . The SDP in (4) can be rewritten as multi-agent SDP and recast in vector form as follows:

(5)

where, for all , , , , which is of full-row rank in view of Assumption 3. Moreover, and , where , , for all and . The set is defined as It follows directly that if and only if .

Iv Preconditioned Forward-Backward Operator Splitting

In this section, we look at the multi-agent SDP in (4) or, equivalently, in (5), under an operator splitting perspective. By leveraging on this view, we then provide two iterative algorithms based on a preconditioned variant of the FB operator splitting [12, §25.3].

Let us introduce . The Lagrangian function associated to (5) reads as follows

where , , and are the Lagrangian multipliers associated with the coupling constraints. Moreover, let us define , , and , . Then, for each agent , we define the consistency constraint matrix as by stacking and rearranging all and , together with some (possibly rectangular) matrices of zeros with suitable dimensions. Thus, to compute a solution to the SDP in (5), the KKT stationarity conditions require that

(6)

where . To cast the inclusion problem (6) in a compact form, we first introduce the gradient mapping, which is constant in this case, as . Then, we define the set-valued mapping , as follows:

(7)

Thus, solving (6) is equivalent to finding such that , i.e., finding a zero of in (7), which can be written as the sum of the two following operators:

(8)
(9)

We note that the set of zeros of corresponds to the set of zeros of the operator , where is a “preconditioning” matrix to be designed freely. In simple words, the FB operator splitting states that the zeros of , or, equivalently, , are the fixed points of the operator . Consequently, by introducing , the FB algorithm corresponds to the fixed-point iteration [12, (1.67)]

(10)

Iv-a Semi-decentralized forward-backward algorithm

From (10), we obtain the following chain of inclusions:

and by explicitly including the operators in (8)–(9) we have

(11)

Thus, by following the guidelines provided in [24, §IV], it is convenient to design the preconditioning matrix as

(12)

where every , , and are chosen large enough to guarantee . Essentially, the off-diagonal terms of in (12

) allow to cancel the skew-symmetric matrices in (

11), while the step sizes constitute the main diagonal.

Initialization:
  • , ,

Iteration :
Algorithm 1 Semi-decentralized pFB

The steps in Algorithm 1 summarize the proposed semi-decentralized pFB. Remarkably, such a procedure is suitable for a parallel implementation. Specifically, each agent first updates its decision variables by means of a projection on a local psd cone of reduced dimension, i.e.,

where the projection onto of

can be efficiently computed via eigenvalue decomposition. Then, the agents communicate their decisions to a central coordinator, which is in charge of updating both the Lagrange multipliers associated with the consistency constraints (

), and the dual variable associated with the original coupling constraints (). Note that the central coordinator performs almost inexpensive computations. Due to privacy reasons, we next investigate how to avoid this direct involvement by a central coordinator, by proposing a fully distributed pFB algorithm to solve the SDP in (5).

Iv-B Distributed forward-backward algorithm

As in [18], to design a distributed algorithm based on the pFB splitting, we endow each agent with a local copy of dual variables and . Moreover, we augment the state of each agent with two local auxiliary variables, namely and , with the aim to enforce local consensus on the copies of the multipliers.

Let us now preliminary introduce some useful quantities, such as , , , , and . Here, is the weighted Laplacian matrix associated to . Furthermore, , , while and are defined similarly. Then, by augmenting the state and by adding the consensus constraints on the dual variables, the operators in (8)–(9) extend as follows:

(13)
(14)

Thus, by defining , we mimic the steps for the semi-decentralized case to design a suitable preconditioning matrix so that the mapping can be iterated. Also in this case, we have that if and only if . Specifically, by resorting on the inclusion , after substituting the operators and in (13)–(14), the matrix can be conveniently chosen as

(15)

Also in this case, the entries on the main diagonal, with (, and are defined similarly), are chosen large enough to have a positive definite . Here, the off-diagonal terms of in (15) cancel the skew-symmetric terms in (14) and the step sizes are located on the main diagonal. The distributed pFB procedure is summarized in Algorithm 2, where the steps to be performed at every iteration by each single agent are emphasized. Specifically, the algorithm alternates communication and computation tasks. Once that the agent receives the local copy of the dual variables from its neighbors in , it updates the augmented state, i.e., its decision variable together with and (S1). Then, the agents transmit to their neighbors the updated version of the augmented state to compute a local update of the copies of the multipliers (S2). Finally, we emphasize that each computation is based on either local or communicated quantities only, and it does not require a central coordinator.

Initialization: For each agent
  • , ,

Iteration : For each agent
S1) Receives and , , and updates
S2) Receives and , , and updates
          
          
  •           
              
    Algorithm 2 Distributed pFB

    V Numerical simulations

    Fig. 2: Banded sparsity pattern.

    We first compare the performances of the presented algorithms on randomly generated SDP problems. The simulations are run in Matlab environment on a laptop with an Intel Core i5 2.9 GHz CPU and 8 Gb RAM. Specifically, we focus on an SDP characterized by a banded sparsity pattern illustrated in Fig. 2. Here, each agent controls a diagonal block of dimension , while sharing a subset of variables with the -th agent (the parameter specifies the number of overlapping elements). For simplicity, we assume for all , and therefore the associated psd cone has dimension . As in [25], we first generate random symmetric matrices , , and with banded sparsity and entries drawn uniformly from . Then, we generate a strictly feasible matrix , with large enough to guarantee . Finally, the vector of equality constraints is computed as , .

    Fig. 3: Coupling equality constraints.
    Fig. 4: Consistency constraints.
    Fig. 5: Average CPU time per iterations. From left to right: varying the number of diagonal blocks (i.e., agents); varying the dimension of each block; varying the number of coupling constraints of the original SDP.
    Fig. 6: Comparison of the convergence behavior of the consistency constraints between Algorithm 2 and the ADMM algorithm in [10].
    Fig. 7: Comparison of the integral value of the computational time between Algorithm 2 and the ADMM algorithm in [10].
    Fig. 8: Comparison of the convergence behavior of the equality constraints between Algorithm 2 and the ADMM algorithm in [2].
    Fig. 9: Comparison of the integral value of the computational time between Algorithm 2 and the ADMM algorithm in [2].

    By considering an instance with , , , Figures 34 show the convergence of both coupling and consistency constraints when applying Algorithm 1 and 2, while Fig. 5 illustrates the average CPU time per iterations as a function of , and , respectively. The (averaged) CPU time to compute the projection onto , for all is also reported. For each plot, only one parameter at time changes, while the other are fixed to , , and . While the two algorithms exhibit the same behavior in term of convergence, Algorithm 1 takes less time on average to perform each iteration.

    Finally, Figures. 69 compare the behavior of the distributed pFB in Algorithm 2 and the ADMM algorithms proposed in [10] and [2] on two different instances with , , and , and , , and , respectively. The ADMM algorithms converge with fewer iterations (Fig. 6 and 8), at the price of much higher computational cost at each iteration, as shown in Fig. 7 and 9. Specifically, since the iterations of the considered algorithms shall be performed in parallel, let be the computational time taken by the -th agent, , to perform the steps of a certain algorithm at iteration . The value at the generic iteration in Fig. 7 and 9 is computed as . However, we emphasize that the distributed ADMM in [10] is less general than the pFB in Algorithm 2, since it is tailored for separable SDP with local equality constraints only (this justifies for the first numerical simulation), while the one in [2], at every iteration , requires to solve an SDP for each .

    Vi Conclusion and outlook

    Monotone operator splitting theory promises to be a general purpose tool to design scalable and distributed algorithms for decomposable, large-scale SDP. The operator-theoretic perspective given in this paper paves the way for a systematic development of efficient methods with reasonable computational burden and convergence guarantees. Accelerated strategies, as well as additional operator splitting methods, are left as future work.

    References

    • [1] P. Biswas, T.-C. Lian, T.-C. Wang, and Y. Ye, “Semidefinite programming based algorithms for sensor network localization,” ACM Transactions on Sensor Networks, vol. 2, no. 2, pp. 188–220, 2006.
    • [2] A. Simonetto and G. Leus, “Distributed maximum likelihood sensor network localization,” IEEE Transactions on Signal Processing, vol. 62, no. 6, pp. 1424–1437, 2014.
    • [3] E. Dall’Anese, H. Zhu, and G. B. Giannakis, “Distributed optimal power flow for smart microgrids,” IEEE Transactions on Smart Grid, vol. 4, no. 3, pp. 1464–1475, 2013.
    • [4] M. S. Andersen, S. K. Pakazad, A. Hansson, and A. Rantzer, “Robust stability analysis of sparsely interconnected uncertain systems,” IEEE Transactions on Automatic Control, vol. 59, no. 8, pp. 2151–2156, 2014.
    • [5] S. Boyd, L. El Ghaoui, E. Feron, and V. Balakrishnan, Linear matrix inequalities in system and control theory.   Siam, 1994, vol. 15.
    • [6] R. Madani, S. Sojoudi, and J. Lavaei, “Convex relaxation for optimal power flow problem: Mesh networks,” IEEE Transactions on Power Systems, vol. 30, no. 1, pp. 199–211, 2014.
    • [7] S. Magnússon, P. C. Weeraddana, and C. Fischione, “A distributed approach for the optimal power-flow problem based on ADMM and sequential convex approximations,” IEEE Transactions on Control of Network Systems, vol. 2, no. 3, pp. 238–253, 2015.
    • [8] S. Kim, M. Kojima, and H. Waki, “Exploiting sparsity in SDP relaxation for sensor network localization,” SIAM Journal on Optimization, vol. 20, no. 1, pp. 192–215, 2009.
    • [9]

      Y. Weng, Q. Li, R. Negi, and M. Ilić, “Distributed algorithm for SDP state estimation,” in

      2013 IEEE PES Innovative Smart Grid Technologies Conference.   IEEE, 2013, pp. 1–6.
    • [10] A. Kalbat and J. Lavaei, “A fast distributed algorithm for decomposable semidefinite programs,” in 2015 54th IEEE Conference on Decision and Control (CDC).   IEEE, 2015, pp. 1742–1749.
    • [11] R. Madani, A. Kalbat, and J. Lavaei, “A low-complexity parallelizable numerical algorithm for sparse semidefinite programming,” IEEE Transactions on Control of Network Systems, vol. 5, no. 4, pp. 1898–1909, 2017.
    • [12] H. H. Bauschke and P. L. Combettes, Convex analysis and monotone operator theory in Hilbert spaces.   Springer, 2011, vol. 408.
    • [13] Y. Sun, M. S. Andersen, and L. Vandenberghe, “Decomposition in conic optimization with partially separable structure,” SIAM Journal on Optimization, vol. 24, no. 2, pp. 873–897, 2014.
    • [14] M. V. Nayakkankuppam, “Solving large-scale semidefinite programs in parallel,” Mathematical programming, vol. 109, no. 2-3, pp. 477–504, 2007.
    • [15] S. K. Pakazad, A. Hansson, M. S. Andersen, and A. Rantzer, “Distributed semidefinite programming with application to large-scale system analysis,” IEEE Transactions on Automatic Control, vol. 63, no. 4, pp. 1045–1058, 2017.
    • [16] M. Fukuda, M. Kojima, K. Murota, and K. Nakata, “Exploiting sparsity in semidefinite programming via matrix completion i: General framework,” SIAM Journal on Optimization, vol. 11, no. 3, pp. 647–674, 2001.
    • [17] S. Gros, “A Newton algorithm for distributed semi-definite programs using the primal-dual interior-point method,” in 53rd IEEE Conference on Decision and Control.   IEEE, 2014, pp. 3222–3227.
    • [18] P. Yi and L. Pavel, “An operator splitting approach for distributed generalized Nash equilibria computation,” Automatica, vol. 102, pp. 111–121, 2019.
    • [19] J. R. S. Blair and B. Peyton, “An introduction to chordal graphs and clique trees,” in Graph theory and sparse matrix computation.   Springer, 1993, pp. 1–29.
    • [20] M. Mesbahi and M. Egerstedt, Graph theoretic methods in multiagent networks.   Princeton University Press, 2010, vol. 33.
    • [21] R. Grone, C. R. Johnson, E. M. Sá, and H. Wolkowicz, “Positive definite completions of partial Hermitian matrices,” Linear algebra and its applications, vol. 58, pp. 109–124, 1984.
    • [22] L. Vandenberghe and M. S. Andersen, “Chordal graphs and semidefinite optimization,” Foundations and Trends® in Optimization, vol. 1, no. 4, pp. 241–433, 2015.
    • [23] S. Kim, M. Kojima, M. Mevissen, and M. Yamashita, “Exploiting sparsity in linear and nonlinear matrix inequalities via positive semidefinite matrix completion,” Mathematical programming, vol. 129, no. 1, pp. 33–68, 2011.
    • [24] G. Belgioioso and S. Grammatico, “Projected-gradient algorithms for generalized equilibrium seeking in aggregative games are preconditioned forward-backward methods,” in 2018 European Control Conference (ECC).   IEEE, 2018, pp. 2188–2193.
    • [25] Y. Zheng, G. Fantuzzi, A. Papachristodoulou, P. Goulart, and A. Wynn, “Chordal decomposition in operator-splitting methods for sparse semidefinite programs,” Mathematical Programming, pp. 1–44, 2019.