1 Introduction
We consider solving semidefinite problems of the following form:
(P) | ||||||
subject to | ||||||
where , being linear, and . Denote the solution set as . To accomplish the task of solving (P), we consider the dual problem:
(D) | ||||||
subject to |
whose solution set is denoted as . Then for all sufficiently large , e.g., larger than the trace of any solution (ding2019optimal, , Lemma 6.1), we can reformulate this as
(1) |
We propose applying the bundle method to solve this problem, which generates a sequence of dual solutions . While the bundle method runs on the dual problem, a primal solution can be constructed through a series of rank one updates corresponding to subgradients of . However maintaining such a primal solution greatly increases memory costs. Fortunately, the primal problem enjoys having a low rank solution in many applications, e.g., matrix completion srebro2005rank and phase retrieval candes2013phaselift . Also, without specifying the detailed structure the problem, there always exists a solution to (P) with rank satisfying pataki1998rank
To utilize the existence of such a low rank solution, we employ the matrix sketching methods introduced in tropp2017practical . The main idea is the following: the skecthing method forms a linear sketch of the column and row spaces of the primal decision variable , and then uses the sketched column and row spaces to recover the primal decision variable. The recovered decision variable approximates the original well if is (approximately) low rank. Notably, we need not store the entire decision variable at each iteration, but only the sketch. Hence the memory requirements of the algorithm are substantially reduced.
Our Contributions.
Our proposed sketching bundle method produces a sequence of dual solutions and a sequence of primal solutions (which are sketched by low rank matrices ). This is done without ever needing to write down the solutions , which can be a substantial boon to computational efficiency.
In particular, we consider problems satisfying the following pair of standard assumptions: (i) Strong duality holds, meaning that there is a solution pair satisfying
and (ii) Strict Complementarity holds, meaning there is a solution pair satisfying
Under these condtions, we show and converge in terms of primal and dual feasibility and optimality. In particular, all three of these quantities converge at a rate of .
Theorem 1.1 (Primal-Dual Convergence).
Suppose the sets are both compact, and that strong duality and a strict complementary condition holds. For any , Algorithm 1 with properly chosen parameters produces a solution pair and with
that satisfies
approximate primal feasibility: | |||
approximate dual feasibility: | |||
approximate primal-dual optimality: |
Moreover, we show that assuming all of the minimizers in are low rank, the sketched primal solutions converge to the set of minimizers at the following rate.
Theorem 1.2 (Sketched Solution Convergence).
Suppose the sets are both compact, strong duality and a strict complementary condition holds, and all solutions have rank at most . For any , Algorithm 1 with properly chosen parameters produces a sketched primal solution with
that satisfies
2 Defining The Sketching Bundle Method
Our proposed proximal bundle method relies on an approximation of given by where is a convex combination of the lower bounds from previous subgradients. Notice that a subgradient of at can be computed as
(2) | ||||
(3) |
Each iteration then computes a proximal step on this piecewise linear function given by
(4) |
for some . The optimality condition of this subproblem ensures that for some Our aggregate lower bound is updated to match this certifying subgradient Then the following is an exact solution for the subproblem (4):
(5) | ||||
(6) |
If the decrease in value of from to is at least fraction of the decrease in value of from to , then the bundle method sets (called a descent step). Otherwise the method sets (called a null step).
Extracting Primal Solutions Directly.
A solution to the primal problem (P) can be extracted from the sequence of subgradients (as these describe the dual of the dual problem). Set our initial primal solution to be . When each iteration updates the model , we could update our primal variable similarly:
(7) |
As stated in Theorem 1.1, these primal solutions converge to optimality and feasibility at a rate of . Alas this approach requires us to compute and store the full matrix at each iteration. Assuming every solution is low rank, an ideal method would only require memory. The following section shows matrix sketching can accomplish this.
Extracting Primal Solutions Via Matrix Sketching.
Here we describe how the matrix sketching method of tropp2017practical can be used to store an approximation of our primal solution . First, we draw two matrices with independent standard normal entries
Here
is chosen by the user. It either represents the estimate of the true rank of the primal solution or the user’s computational budget in dealing with larges matrices.
We use and to capture the column space and the row space of :
(8) |
Hence we initially have and . Notice Algorithm 1 does not observe the matrix directly. Rather, it observes a stream of rank one updates
In this setting, and can be directly computed as
(9) | |||
(10) |
This observation allows us to form the sketch and from the stream of updates.
We then reconstruct and get the reconstructed matrix by
(11) |
where is the factorization of and returns the best rank approximation in Frobenius norm. Specifically, the best rank approximation of a matrix is , where and
are right and left singular vectors corresponding to the
largest singular values of
and is a diagonal matrix with largest singular values of . In actual implementation, we may only produce the factors defining in the end instead reconstructing in every iteration.Hence the expensive primal update (7) can be replaced by much more efficient operations (9) and (10). Then a low rank approximation of can be computed by (11).
We remark that the reconstructed matrix is not necessarily positive definite. However, this suffices for the purpose of finding a matrices close to . More sophisticated procedure is available for producing a positive semidefinite approximation of (tropp2017practical, , Section 7.3).
3 Numerical Experiments
In this section, we demonstrate Algorithm 1
equipped with the sketching procedure does solve problem instances in four important problem classes: (1) Generalized eigenvalue
(boumal2018deterministic, , Section 5.1), (2) synchronization bandeira2018random , (3) Max-Cut goemans1995improved , and (4) matrix completionsrebro2005rank . For all experiments, we set , , and where is computed via MOSEK mosek2010mosek or prior knowledge of the problem. We present the results of convergences in Figure 1 for the following problem instances (results for other problems instances are similar for each problem class): Let be the distribution of symmetric matrices in with upper triangular part (including the diagonal) being independent standard Gaussians.-
Generalized eigenvalue (GE): , for any , where and and is independent of , and .
-
synchronization (Z2): where is the all one matrix and , , and is the all one vector.
-
Max-Cut(MCut): where is the Laplacian matrix of the G1 graph Gset with vertices, and are the same as synchronization
-
Matrix Completion(MComp): A random rank matrix is generated. The index set is generated in a way that each is in
with probability
independently from anything else. Set . The linear constraint is for each . So and for each .
As can be seen from the experiments, with iterations or less, the dual and primal objective converges fairly fast except the matrix completion problem. The infeasibility measured by and distance to solution is about for most of the problems except GE. In general, we note the convergence is quicker when the problems have rank optimal solutions (GE and Z2) comparing to problems with higher rank optimal solutions (MCut and MComp).

References
- [1] The university of florida sparse matrix collection: Gset group.
- [2] Afonso S Bandeira. Random laplacian matrices and convex relaxations. Foundations of Computational Mathematics, 18(2):345–379, 2018.
- [3] Nicolas Boumal, Vladislav Voroninski, and Afonso S Bandeira. Deterministic guarantees for burer-monteiro factorizations of smooth semidefinite programs. Communications on Pure and Applied Mathematics, 2018.
- [4] Emmanuel J Candes, Thomas Strohmer, and Vladislav Voroninski. Phaselift: Exact and stable signal recovery from magnitude measurements via convex programming. Communications on Pure and Applied Mathematics, 66(8):1241–1274, 2013.
- [5] Lijun Ding, Alp Yurtsever, Volkan Cevher, Joel A Tropp, and Madeleine Udell. An optimal-storage approach to semidefinite programming using approximate complementarity. arXiv preprint arXiv:1902.03373, 2019.
- [6] Yu Du and Andrzej Ruszczyński. Rate of convergence of the bundle method. J. Optim. Theory Appl., 173(3):908–922, June 2017.
- [7] Michel X Goemans and David P Williamson. Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. Journal of the ACM (JACM), 42(6):1115–1145, 1995.
- [8] APS Mosek. The mosek optimization software. Online at http://www. mosek. com, 54(2-1):5, 2010.
- [9] Gábor Pataki. On the rank of extreme matrices in semidefinite programs and the multiplicity of optimal eigenvalues. Mathematics of operations research, 23(2):339–358, 1998.
- [10] Andrzej P Ruszczyński and Andrzej Ruszczynski. Nonlinear optimization, volume 13. Princeton university press, 2006.
-
[11]
Nathan Srebro and Adi Shraibman.
Rank, trace-norm and max-norm.
In
International Conference on Computational Learning Theory
, pages 545–560. Springer, 2005. - [12] Jos F Sturm. Error bounds for linear matrix inequalities. SIAM Journal on Optimization, 10(4):1228–1248, 2000.
- [13] Joel A Tropp, Alp Yurtsever, Madeleine Udell, and Volkan Cevher. Practical sketching algorithms for low-rank matrix approximation. SIAM Journal on Matrix Analysis and Applications, 38(4):1454–1485, 2017.
- [14] Joel A Tropp, Alp Yurtsever, Madeleine Udell, and Volkan Cevher. Randomized single-view algorithms for low-rank matrix approximation. 2017.
Appendix A Proofs of Convergence Guarantees
a.1 Auxiliary lemmas
Lemma A.1.
Lemma A.2 (Compact sublevel set).
If a convex lower semicontinuous function has a compact nonempty solution set, then all of its sublevel set is compact.
Proof.
Suppose for some , then the closed sublevel set is unbounded. Then there is a unit direction vector such that for all , . This in particular violates the fact the solution set is bounded and the proof is completed. ∎
Lemma A.3 (Quadratic Growth).
[12, Section 4] If the solution sets and are compact and strict complementarity holds, then for any fixed , there are some such that for all with with , and all with and :
Proof.
Using the proof of Lemma A.5, we find that . Thus . The result in [12, Section 4] requires the set , and the set being compact. Using [10, Theorem 7.21], the optimization problem has the same solution set as the primal SDP (P) for some large . Thus the compactness of the set and is ensured by Lemma A.2, and the proof is completed. ∎
a.2 Proof of Theorem 1.1
Let and Then we set the bundle method’s parameters to be , , and . We recall the inner product for matrices is the trace inner product and is the dot product for the vectors.
In the following three lemmas, we prove bounds on primal feasibility, dual feasibility, and optimality in terms of . From this, we can conclude these quantities converge at the claimed rate since the Du and Ruszcynskii [6] recently showed the bundle method has converge at a rate.
Lemma A.4 (Primal Feasibility).
At every descent step , we have approximate primal feasibility
Proof.
Noting that is built out of a convex combination of the rank matrices , its immediate that it is always a positive semidefinite matrix.
The definition of immediately gives the following alternative characterization of ,
Since we constructed to correspond to the first-order optimality condition of the subproblem (4), we have
Hence . The distance traveled during any descent step can be bounded by the objective value gap as
where the first inequality uses the fact that minimizes and the second inequality uses the definition of a descent step. Combining this with our feasibility bound shows
Then our choice of completes the proof. ∎
Lemma A.5 (Dual Feasibility).
At every descent step , we have approximate dual feasibility
Proof.
Standard strong duality and exact penalization arguments show for any ,
Recalling our assumption that yields the claimed feasibility bound
Lemma A.6 (Primal-Dual Optimality).
At every descent step , we have approximate primal-dual optimality bounded above by
and below by
Proof.
The standard duality analysis shows the primal-dual objective gap equals
Notice that the second term here is bounded above and below as
by Lemma A.4. Hence we only need to show that the first term also approaches zero (that is, we approach holding complementary slackness).
An upper bound on this inner product follows from Lemma A.5 as
Hence
A lower bound on this inner product follows as
where the first inequality follows from the definition of a descent step. Hence
∎