I Introduction
Semidefinite programs (SDPs), i.e., convex optimization problems over the cone of psd matrices, find applications in many engineering areas, since semidefinite constraints and linear matrix inequalities characterize problems related to sensor networks [1, 2], smart grids [3], robustness analysis of interconnected, uncertain systems [4]
, machine learning and operations research
[5]. Additionally, SDP offer a convex relaxation framework that is applicable to several nonconvex optimization problems, e.g., optimal power flow [6, 7].Despite these appealing features, the computational complexity of SDP represents an important issue, since they scale badly with both the dimension of the semidefinite constraints and the number of optimization variables. This dramatically limits, or even makes vain [4, 8], the possibility of solving largescale problems in a centralized fashion. Therefore, since certain applications impose intrinsic privacy limitations that rule out centralized computation, it is fundamental to develop scalable and parallelizable algorithms.
Existing works that propose distributed approaches rely on a specific sparsity structure of the SDP. Specifically, when the graph associated with the aggregate sparsity pattern of the data matrices enjoys the chordality property, the original SDP can be decomposed into a set of mutually coupled SDP defined on cones of smaller dimensions. However, the available distributed, firstorder algorithms are either adhoc for the specific problem considered, e.g. [3, 9, 2, 1], or are variants of the alternating direction method of multipliers (ADMM) [10, 11], a special instance of the DouglasRachford operator splitting [12, §25.2]. Along the same line, in [13] an operatorsplitting method has been proposed, whose convergence requires to solve a quadratic program at every iteration. The authors in [14] introduced a proximal bundle method encompassing a messagepassing strategy, which in turn has been recently exploited in [15] together with an interiorpoint method, to overcome the typical issues of secondorder methods shown, e.g., in [16, 17].
Differently from the aforementioned literature, in this paper we provide a unifying operatortheoretic perspective to the design of scalable and distributed algorithms for decomposable, largescale SDP, paving the way for a systematic development of efficient methods that rely on monotone operator theory as a general purpose tool. Specifically, we discuss a semidecentralized and a fully distributed version of a pFB method, recently proposed in [18] to solve generalized Nash equilibrium problems. Unlike the standard FB method, a preconditioning matrix allows to circumvent intrinsic technical issues, e.g., the impossibility to explicitly compute the resolvent of an operator, by determining the step sizes and the exploration direction of the forward iteration. We exploit the chordal structure of the original problem to obtain a multiagent SDP with coupling constraints. In this context, the proposed algorithms guarantee global convergence to an optimal solution by requiring local evaluations of projections on psd cones of reduced dimensions, which is the major computational burden.
After briefly reviewing some notions of graph theory (§II), under the assumption of chordal sparsity pattern we describe how to decompose an SDP into mutually coupled SDP (§III). Finally, in §IV, we apply an operatorsplitting approach for the design of the pFBbased algorithms, and in §V, we compare the two methods with available algorithms in the literature on illustrative examples.
Basic notation
denotes the set of
real vectors,
is the space of symmetric matrices and is the cone of psd matrices. () denotes a matrix/vector with all elements equal to (). For vectors and , we denote . The symbol denotes the inner product in the appropriate space, i.e., for , and for , . Given a matrix , its entry is denoted by , while denotes its transpose. represents the Kronecker product between the matrices and . Given a discrete set , denotes its cardinality. The operator maps a matrix to a vector that stacks its columns; performs the inverse.Operatortheoretic definitions
denotes the identity operator. The mapping represents the indicator function for a set , i.e., if , otherwise. The mapping denotes the projection onto the set w.r.t. the Euclidean norm when the argument is a vector, to the Frobenius norm when it is a matrix. The setvalued mapping denotes the normal cone operator for the set , i.e., if , otherwise. denotes the set of zeros of a setvalued mapping .
Ii Mathematical problem setup
We first recall some notions of graph theory and chordality [19, 20], necessary to introduce the partially separable structure addressed in the remainder. Then, we review a key result to decompose a convex cone, i.e., the cone of psd matrices, into a set of smaller but coupled convex cones.
Iia Graph theoretic concepts
Let be a graph defined by a set of vertices and a set of edges . The graph is undirected when if and only if , while it is complete if every two vertices share exactly one edge. The set of neighboring vertices of is denoted by . A path of is a sequence of distinct vertices such that any consecutive vertices in the sequence correspond to an edge of graph . Let be the adjacency matrix of with if , otherwise, and let us assume . The weighted degree matrix is , , and the weighted Laplacian of is .
A clique is defined as the complete subgraph induced by the set of vertices , i.e., such that for any distinct vertices , and it is not a subset of any other clique. Given a path , a chord is an edge with , while a cycle is a path with and for . An undirected graph is chordal if every cycle with has at least one chord. Every nonchordal graph can be extended to a chordal graph by adding a (minimum) number of edges to the original graph (minimum fillin problem, NPcomplete [19]). Typical examples of chordal graphs are the complete graphs, trees and forests, and undirected graphs with no cycles of length greater than three.
IiB Separable cones and chordal decomposition
Given a sparse matrix , the associated sparsity pattern can be represented as an undirected graph where each clique defines a maximally dense principal submatrix of , i.e., , by means of an entry selection matrix defined as
where is the th element in , sorted in natural ordering. Conversely, returns a matrix in the original space. Note that, given cliques and associated to the same graph, and have overlapping elements if .
By considering , we define the space of symmetric matrices with sparsity pattern as
Next, we recall a wellknown result that links a matrix belonging to the cone generated by the projection of the psd cone onto the space of matrices with sparsity pattern , i.e., , and its maximal principal submatrices.
Lemma 1
Informally speaking, a matrix is positive semidefinite if and only if its maximal dense principal submatrices , , are positive semidefinite, i.e., the psd property of can be checked on matrices of smaller dimensions. To recap the introduced concepts, let us show an example on a simple graph.
Example 1
The sparsity pattern of a given matrix in Fig. 1(a) can be summarized in the graph with seven vertices depicted in Fig. 1(b). The graph happens to be chordal, since every cycle of length strictly greater than three has a chord, e.g., is a chord for the cycle . Moreover, it is characterized by four cliques, namely , , and . For instance, let us consider . Then,
where is the maximally dense principal submatrix of related with . It follows from Lemma 1 that is psd with sparsity pattern if and only if , . Finally, note that , and therefore matrices and shall be coupled via the elements , and . The same considerations apply to and .
Iii Decomposition in sparse Sdp
We consider an SDP optimization problem in standard primal form [22]:
(1) 
where , and for all , whose optimal solutions belong to the set
(2) 
which is assume to be nonempty. Moreover, we make the following assumption that characterize the data in (1).
Standing Assumption 1
The aggregate sparsity pattern of (1) is identified by the graph , i.e., and , for all .
Such a sparsity pattern also reflects on the standard dual form of (1) with dual variables and , i.e.,
(3) 
where . Clearly, in view of the equality constraint in (3), the slack matrix is necessarily sparse at any dual feasible point, with sparsity pattern . On the other hand, the primal variable is typically dense, but one can observe that both the cost function and the equality constraints in (1) depend on the entries such that . The remaining entries are arbitrary and guarantee the psd property of (matrix completion problem). Thus, according to the domainspace decomposition in [16], the conic constraint in (1) can be equivalently rewritten as . We then assume the following.
Standing Assumption 2
The graph is chordal, undirected and connected.
Thus, in view of Lemma 1, we follow the so called cliquetree conversion introduced in [16, 23] to decompose the original SDP in (1). Specifically, let , , be the set of cliques of , and let be the set containing the indices of all the cliques such that , . The SDP in (1) translates into
(4) 
where , while the coefficient matrices and are chosen so that and for all . Remarkably, their choice is not unique. Some examples on how to chose such matrices are in [13], while for the remainder we postulate the following assumption.
Standing Assumption 3
For all , are linearly independent matrices.
The SDP in (4) is fully decomposed in terms of cost function and it is defined on psd conic constraints of smaller dimensions. However, it shows couplings among the dense submatrices in the two sets of equality constraints. Specifically, while the first set is directly inherited from (1), the second set refers to the consistency constraints, which ensure that the entries of the dense submatrices and agree when share the same entries of the original in (1). Note that, all these constraints are affine in the variable , since they involve matrix multiplications only. Thus, the decomposed structure in (4), which is quite general for a sparse SDP under Standing Assumption 2, enables and motivates us to develop semidecentralized/distributed algorithms that allow to compute a solution by treating each clique as a single decision agent that controls its local variable, .
Iiia Vector notation for decomposed Sdp
For our purposes, i.e., to formalize our multiagent optimization problem and, consequently, to develop solution algorithms, it might be convenient to adopt a vector notation for the matrix variables in (4). Specifically, let us now define , for all , and for all and all . The SDP in (4) can be rewritten as multiagent SDP and recast in vector form as follows:
(5) 
where, for all , , , , which is of fullrow rank in view of Assumption 3. Moreover, and , where , , for all and . The set is defined as It follows directly that if and only if .
Iv Preconditioned ForwardBackward Operator Splitting
In this section, we look at the multiagent SDP in (4) or, equivalently, in (5), under an operator splitting perspective. By leveraging on this view, we then provide two iterative algorithms based on a preconditioned variant of the FB operator splitting [12, §25.3].
Let us introduce . The Lagrangian function associated to (5) reads as follows
where , , and are the Lagrangian multipliers associated with the coupling constraints. Moreover, let us define , , and , . Then, for each agent , we define the consistency constraint matrix as by stacking and rearranging all and , together with some (possibly rectangular) matrices of zeros with suitable dimensions. Thus, to compute a solution to the SDP in (5), the KKT stationarity conditions require that
(6) 
where . To cast the inclusion problem (6) in a compact form, we first introduce the gradient mapping, which is constant in this case, as . Then, we define the setvalued mapping , as follows:
(7) 
Thus, solving (6) is equivalent to finding such that , i.e., finding a zero of in (7), which can be written as the sum of the two following operators:
(8)  
(9) 
We note that the set of zeros of corresponds to the set of zeros of the operator , where is a “preconditioning” matrix to be designed freely. In simple words, the FB operator splitting states that the zeros of , or, equivalently, , are the fixed points of the operator . Consequently, by introducing , the FB algorithm corresponds to the fixedpoint iteration [12, (1.67)]
(10) 
Iva Semidecentralized forwardbackward algorithm
From (10), we obtain the following chain of inclusions:
and by explicitly including the operators in (8)–(9) we have
(11) 
Thus, by following the guidelines provided in [24, §IV], it is convenient to design the preconditioning matrix as
(12) 
where every , , and are chosen large enough to guarantee . Essentially, the offdiagonal terms of in (12
) allow to cancel the skewsymmetric matrices in (
11), while the step sizes constitute the main diagonal.The steps in Algorithm 1 summarize the proposed semidecentralized pFB. Remarkably, such a procedure is suitable for a parallel implementation. Specifically, each agent first updates its decision variables by means of a projection on a local psd cone of reduced dimension, i.e.,
where the projection onto of
can be efficiently computed via eigenvalue decomposition. Then, the agents communicate their decisions to a central coordinator, which is in charge of updating both the Lagrange multipliers associated with the consistency constraints (
), and the dual variable associated with the original coupling constraints (). Note that the central coordinator performs almost inexpensive computations. Due to privacy reasons, we next investigate how to avoid this direct involvement by a central coordinator, by proposing a fully distributed pFB algorithm to solve the SDP in (5).IvB Distributed forwardbackward algorithm
As in [18], to design a distributed algorithm based on the pFB splitting, we endow each agent with a local copy of dual variables and . Moreover, we augment the state of each agent with two local auxiliary variables, namely and , with the aim to enforce local consensus on the copies of the multipliers.
Let us now preliminary introduce some useful quantities, such as , , , , and . Here, is the weighted Laplacian matrix associated to . Furthermore, , , while and are defined similarly. Then, by augmenting the state and by adding the consensus constraints on the dual variables, the operators in (8)–(9) extend as follows:
(13)  
(14) 
Thus, by defining , we mimic the steps for the semidecentralized case to design a suitable preconditioning matrix so that the mapping can be iterated. Also in this case, we have that if and only if . Specifically, by resorting on the inclusion , after substituting the operators and in (13)–(14), the matrix can be conveniently chosen as
(15) 
Also in this case, the entries on the main diagonal, with (, and are defined similarly), are chosen large enough to have a positive definite . Here, the offdiagonal terms of in (15) cancel the skewsymmetric terms in (14) and the step sizes are located on the main diagonal. The distributed pFB procedure is summarized in Algorithm 2, where the steps to be performed at every iteration by each single agent are emphasized. Specifically, the algorithm alternates communication and computation tasks. Once that the agent receives the local copy of the dual variables from its neighbors in , it updates the augmented state, i.e., its decision variable together with and (S1). Then, the agents transmit to their neighbors the updated version of the augmented state to compute a local update of the copies of the multipliers (S2). Finally, we emphasize that each computation is based on either local or communicated quantities only, and it does not require a central coordinator.
V Numerical simulations
We first compare the performances of the presented algorithms on randomly generated SDP problems. The simulations are run in Matlab environment on a laptop with an Intel Core i5 2.9 GHz CPU and 8 Gb RAM. Specifically, we focus on an SDP characterized by a banded sparsity pattern illustrated in Fig. 2. Here, each agent controls a diagonal block of dimension , while sharing a subset of variables with the th agent (the parameter specifies the number of overlapping elements). For simplicity, we assume for all , and therefore the associated psd cone has dimension . As in [25], we first generate random symmetric matrices , , and with banded sparsity and entries drawn uniformly from . Then, we generate a strictly feasible matrix , with large enough to guarantee . Finally, the vector of equality constraints is computed as , .
By considering an instance with , , , Figures 3–4 show the convergence of both coupling and consistency constraints when applying Algorithm 1 and 2, while Fig. 5 illustrates the average CPU time per iterations as a function of , and , respectively. The (averaged) CPU time to compute the projection onto , for all is also reported. For each plot, only one parameter at time changes, while the other are fixed to , , and . While the two algorithms exhibit the same behavior in term of convergence, Algorithm 1 takes less time on average to perform each iteration.
Finally, Figures. 6–9 compare the behavior of the distributed pFB in Algorithm 2 and the ADMM algorithms proposed in [10] and [2] on two different instances with , , and , and , , and , respectively. The ADMM algorithms converge with fewer iterations (Fig. 6 and 8), at the price of much higher computational cost at each iteration, as shown in Fig. 7 and 9. Specifically, since the iterations of the considered algorithms shall be performed in parallel, let be the computational time taken by the th agent, , to perform the steps of a certain algorithm at iteration . The value at the generic iteration in Fig. 7 and 9 is computed as . However, we emphasize that the distributed ADMM in [10] is less general than the pFB in Algorithm 2, since it is tailored for separable SDP with local equality constraints only (this justifies for the first numerical simulation), while the one in [2], at every iteration , requires to solve an SDP for each .
Vi Conclusion and outlook
Monotone operator splitting theory promises to be a general purpose tool to design scalable and distributed algorithms for decomposable, largescale SDP. The operatortheoretic perspective given in this paper paves the way for a systematic development of efficient methods with reasonable computational burden and convergence guarantees. Accelerated strategies, as well as additional operator splitting methods, are left as future work.
References
 [1] P. Biswas, T.C. Lian, T.C. Wang, and Y. Ye, “Semidefinite programming based algorithms for sensor network localization,” ACM Transactions on Sensor Networks, vol. 2, no. 2, pp. 188–220, 2006.
 [2] A. Simonetto and G. Leus, “Distributed maximum likelihood sensor network localization,” IEEE Transactions on Signal Processing, vol. 62, no. 6, pp. 1424–1437, 2014.
 [3] E. Dall’Anese, H. Zhu, and G. B. Giannakis, “Distributed optimal power flow for smart microgrids,” IEEE Transactions on Smart Grid, vol. 4, no. 3, pp. 1464–1475, 2013.
 [4] M. S. Andersen, S. K. Pakazad, A. Hansson, and A. Rantzer, “Robust stability analysis of sparsely interconnected uncertain systems,” IEEE Transactions on Automatic Control, vol. 59, no. 8, pp. 2151–2156, 2014.
 [5] S. Boyd, L. El Ghaoui, E. Feron, and V. Balakrishnan, Linear matrix inequalities in system and control theory. Siam, 1994, vol. 15.
 [6] R. Madani, S. Sojoudi, and J. Lavaei, “Convex relaxation for optimal power flow problem: Mesh networks,” IEEE Transactions on Power Systems, vol. 30, no. 1, pp. 199–211, 2014.
 [7] S. Magnússon, P. C. Weeraddana, and C. Fischione, “A distributed approach for the optimal powerflow problem based on ADMM and sequential convex approximations,” IEEE Transactions on Control of Network Systems, vol. 2, no. 3, pp. 238–253, 2015.
 [8] S. Kim, M. Kojima, and H. Waki, “Exploiting sparsity in SDP relaxation for sensor network localization,” SIAM Journal on Optimization, vol. 20, no. 1, pp. 192–215, 2009.

[9]
Y. Weng, Q. Li, R. Negi, and M. Ilić, “Distributed algorithm for SDP state estimation,” in
2013 IEEE PES Innovative Smart Grid Technologies Conference. IEEE, 2013, pp. 1–6.  [10] A. Kalbat and J. Lavaei, “A fast distributed algorithm for decomposable semidefinite programs,” in 2015 54th IEEE Conference on Decision and Control (CDC). IEEE, 2015, pp. 1742–1749.
 [11] R. Madani, A. Kalbat, and J. Lavaei, “A lowcomplexity parallelizable numerical algorithm for sparse semidefinite programming,” IEEE Transactions on Control of Network Systems, vol. 5, no. 4, pp. 1898–1909, 2017.
 [12] H. H. Bauschke and P. L. Combettes, Convex analysis and monotone operator theory in Hilbert spaces. Springer, 2011, vol. 408.
 [13] Y. Sun, M. S. Andersen, and L. Vandenberghe, “Decomposition in conic optimization with partially separable structure,” SIAM Journal on Optimization, vol. 24, no. 2, pp. 873–897, 2014.
 [14] M. V. Nayakkankuppam, “Solving largescale semidefinite programs in parallel,” Mathematical programming, vol. 109, no. 23, pp. 477–504, 2007.
 [15] S. K. Pakazad, A. Hansson, M. S. Andersen, and A. Rantzer, “Distributed semidefinite programming with application to largescale system analysis,” IEEE Transactions on Automatic Control, vol. 63, no. 4, pp. 1045–1058, 2017.
 [16] M. Fukuda, M. Kojima, K. Murota, and K. Nakata, “Exploiting sparsity in semidefinite programming via matrix completion i: General framework,” SIAM Journal on Optimization, vol. 11, no. 3, pp. 647–674, 2001.
 [17] S. Gros, “A Newton algorithm for distributed semidefinite programs using the primaldual interiorpoint method,” in 53rd IEEE Conference on Decision and Control. IEEE, 2014, pp. 3222–3227.
 [18] P. Yi and L. Pavel, “An operator splitting approach for distributed generalized Nash equilibria computation,” Automatica, vol. 102, pp. 111–121, 2019.
 [19] J. R. S. Blair and B. Peyton, “An introduction to chordal graphs and clique trees,” in Graph theory and sparse matrix computation. Springer, 1993, pp. 1–29.
 [20] M. Mesbahi and M. Egerstedt, Graph theoretic methods in multiagent networks. Princeton University Press, 2010, vol. 33.
 [21] R. Grone, C. R. Johnson, E. M. Sá, and H. Wolkowicz, “Positive definite completions of partial Hermitian matrices,” Linear algebra and its applications, vol. 58, pp. 109–124, 1984.
 [22] L. Vandenberghe and M. S. Andersen, “Chordal graphs and semidefinite optimization,” Foundations and Trends® in Optimization, vol. 1, no. 4, pp. 241–433, 2015.
 [23] S. Kim, M. Kojima, M. Mevissen, and M. Yamashita, “Exploiting sparsity in linear and nonlinear matrix inequalities via positive semidefinite matrix completion,” Mathematical programming, vol. 129, no. 1, pp. 33–68, 2011.
 [24] G. Belgioioso and S. Grammatico, “Projectedgradient algorithms for generalized equilibrium seeking in aggregative games are preconditioned forwardbackward methods,” in 2018 European Control Conference (ECC). IEEE, 2018, pp. 2188–2193.
 [25] Y. Zheng, G. Fantuzzi, A. Papachristodoulou, P. Goulart, and A. Wynn, “Chordal decomposition in operatorsplitting methods for sparse semidefinite programs,” Mathematical Programming, pp. 1–44, 2019.
Comments
There are no comments yet.