DeepAI
Log In Sign Up

Time-Varying Semidefinite Programming: Path Following a Burer–Monteiro Factorization

10/15/2022
by   Antonio Bellon, et al.
0

We present an online algorithm for time-varying semidefinite programs (TV-SDPs), based on the tracking of the solution trajectory of a low-rank matrix factorization, also known as the Burer–Monteiro factorization, in a path-following procedure. There, a predictor-corrector algorithm solves a sequence of linearized systems. This requires the introduction of a horizontal space constraint to ensure the local injectivity of the low-rank factorization. The method produces a sequence of approximate solutions for the original TV-SDP problem, for which we show that they stay close to the optimal solution path if properly initialized. Numerical experiments for a time-varying Max-Cut SDP relaxation demonstrate the computational advantages of the proposed method for tracking TV-SDPs in terms of runtime compared to off-the-shelf interior point methods.

READ FULL TEXT VIEW PDF

page 1

page 2

page 3

page 4

08/12/2018

Time-Varying Semidefinite Programs

We study time-varying semidefinite programs (TV-SDPs), which are semidef...
03/01/2018

Smoothed analysis for low-rank solutions to semidefinite programs in quadratic penalty form

Semidefinite programs (SDP) are important in learning and combinatorial ...
09/10/2018

Pursuit of Low-Rank Models of Time-Varying Matrices Robust to Sparse and Measurement Noise

In tracking of time-varying low-rank models of time-varying matrices, we...
02/25/2020

On the regularity and conditioning of low rank semidefinite programs

Low rank matrix recovery problems appear widely in statistics, combinato...
06/01/2022

Accelerated first-order methods for a class of semidefinite programs

This paper introduces a new storage-optimal first-order method (FOM), Ce...
11/28/2022

Discovering Dynamic Patterns from Spatiotemporal Data with Time-Varying Low-Rank Autoregression

The problem of broad practical interest in spatiotemporal data analysis,...

1 Introduction

Semidefinite programs (SDPs) constitute an important class of convex constrained optimization problems that is ubiquitous in statistics, signal processing, control systems, and other areas. In several applications, the data of the problem vary over time, so that this can be modeled as a time-varying semidefinite program (TV-SDP). In this paper we consider TV-SDPs of the form

(SDP)
  s.t.

where is a time parameter varying on a bounded interval.

Here denotes the space of real symmetric matrices, is a linear operator defined by for some , , and . Throughout the paper denotes the Frobenius inner product and the constraint requires to be positive semidefinite (PSD). In this time-varying setting, one looks for a solution curve in such that is an optimal solution for (SDP) at each time point .

Time-dependent problems leading to TV-SDPs occur in various applications, such as optimal power flow problems in power systems [33]

, state estimation problems in quantum systems

[1], modeling of energy economic problems [19], job-shop scheduling problems [7], as well as problems arising in signal processing, queueing theory [36] or aircraft engineering [43]. TV-SDPs can be seen as a generalization of continuous linear programming problems, which were first studied by Bellman [11] in relation to so-called bottleneck problems in multistage linear production economic processes. Since then, a large body of literature has been devoted to studying continuous linear programs with and without additional assumptions. However, the generalization of this idea to other classes of optimization problems has only recently been considered. In [46], Wang et al. study separated continuous conic programs, and finally Ahmadi and Khadir [3] consider time-varying SDPs. In contrast to our setting, they study the data to vary polynomially with time and also restrict themselves to polynomial solutions.

A naive approach to solve the time-varying problem (SDP) is to consider, at a sequence of times , the instances of the problem (SDP) for and solve them one after another. The best solvers for SDPs are interior point methods [29, 44, 6, 22, 30], which can solve them in a time that is polynomial in the input size. However, these solvers do not scale particularly well, and thus this brute-force approach may fail in applications where the volume and velocity of the data are large. Furthermore, such a straightforward method would not make use of the local information collected by solving the previous instances of the problem. Even if one considers warm starts [24, 21, 18, 42], the reduction in run time is likely to be marginal. For instance, [42, sections 5.5 and 5.6] reports a 30–60% reduction of the runtime on a collection of time-varying instances of their own choice.

Instead, in this work, we would like to utilize the idea of so-called path-following predictor-corrector algorithms as developed in [25, 5]. In classical predictor-corrector methods, a predictor step for approximating the directional derivative of the solution with respect to a small change in the time parameter is applied, together with a correction step that moves from the current approximate solution closer to the next solution at the new time point. The latter is based on a Newton step for solving the first-order optimality KKT conditions.

A limiting factor in solving both stationary and time-dependent SDPs is computational complexity when is large. A common solution to this obstacle is the Burer–Monteiro approach, as presented in the seminal work [15, 16]. In this approach, a low-rank factorization of the solution is assumed with and potentially much smaller than . In the optimization literature, the Burer–Monteiro method has been very well studied as a non-convex optimization problem, e.g. in terms of algorithms [31], quality of the optimal value [9, 39], and (global) recovery guarantees [14, 13, 17].

In a time-varying setting, the Burer–Monteiro factorization leads to

(BM)
   s.t.

which for every fixed is a quadratically constrained quadratic problem. A solution then is a curve in , which, depending on , is a space of much smaller dimension than . However, this comes at the price that the problem (BM) is now non-convex. Moreover, theoretically it may happen that local optimization methods converge to a critical point that is not globally optimal [45], although in practice the method usually shows very good performance [15, 31, 40].

The aim of this work is to combine the Burer–Monteiro factorization with path-following predictor-corrector methods and to develop a practical algorithm for approximating the solution of (BM), and consequently of (SDP), over time. As we explain in section 3, to apply such methods, we need to address the issue that the solutions of (BM) are never isolated, due to the non-uniqueness of the Burer–Monteiro factorization caused by orthogonal invariance. In this paper, we apply a well-known technique to handle this problem by restricting the solutions to a so-called horizontal space at every time step. From a geometric perspective, such an approach exploits the fact that equivalent factorizations can be identified as the same element in the corresponding quotient manifold with respect to the orthogonal group action [34].

The paper is structured as follows. In section 2 we review important foundations from the SDP literature and state the main assumptions we make on the TV-SDP problem (SDP). Section 3 presents the underlying quotient geometry of positive semidefinite rank- matrices from a linear algebra perspective, focusing in particular on the notion of horizontal space and the injectivity radius of the map . We then describe in section 4 our path-following predictor-corrector algorithm, which is based on iteratively solving the linearized KKT system for (BM) over time. A main result is the rigorous error analysis for this algorithm presented in subsection 4.3. In section 5

, we showcase numerical results that test our method on a time-varying variant of the well-known Goemans–Williamson SDP relaxation for the Max-Cut problem in combinatorial optimization and graph theory. We conclude in section 

6 with a brief discussion of our results.

2 Preliminaries and key assumptions

Naturally, the rigorous formulation of path-following algorithms requires regularity assumptions on the solution curve. In our context, this will require both assumptions on the original TV-SDP problem (SDP) as well as on its reformulation (BM). In particular for the latter, the correct choice of the dimension is crucial. In what follows, we present and discuss these assumptions in detail.

First, we briefly review some standard notions and properties for primal-dual SDP pairs; see [4, 47]. Consider the conic dual problem of (SDP):

(D-SDP)
  s.t.

where is the linear operator adjoint to . For convenience, we often drop the explicit dependence on and refer to a solution of (D-SDP) simply as . While reviewing the basic properties of SDPs, we assume the time parameter to be fixed and hence omit the subindex .

The KKT conditions for the pair of primal-dual convex problems (SDP)-(D-SDP) read

(1)

These are sufficient conditions for the optimality of the pair .

Definition 2.1 (Strict feasibility).

We say that strict feasibility holds for an instance of primal SDP if there exists a positive definite matrix that satisfies

. Similarly, strict feasibility holds for the dual if there exist a vector

satisfying .

It is well-known that under strict feasibility the KKT conditions are also necessary for optimality. Note that, in general, a pair of optimal solutions satisfies the inclusions and , where and denote the image and kernel, respectively.

Definition 2.2 (Strict complementarity).

A primal-dual optimal point is said to be strictly complementary if (or, equivalently, ). A primal-dual pair of an instance of SDP satisfies strict complementarity if there exists a strictly complementary primal-dual optimal point .

Definition 2.3 (Non-degeneracy).

A primal feasible point is primal non-degenerate if

(2)

with being the tangent space to the manifold of fixed rank- symmetric matrices at , where . Let be a rank-revealing decomposition, then

A dual feasible point is dual non-degenerate if

where is the tangent space at to the manifold of fixed rank- symmetric matrices with .

Primal-dual strict feasibility implies the existence of both a primal and a dual optimal solution with a zero duality gap. In addition, primal (dual) non-degeneracy implies dual (primal) uniqueness of the solutions. Under strict complementarity, the converse is also true, that is, primal (dual) uniqueness of the primal dual optimal solutions implies dual (primal) non-degeneracy of these solutions. Moreover, primal-dual non-degeneracy and strict complementarity hold generically. We refer to [4] for details.

With regard to the time-varying case, these facts can be generalized as follows.

Theorem 2.4.

(Bellon et al., [12, Theorem 2.19]) Let (P,D) be a primal-dual pair of TV-SDPs parametrized over a time interval such that primal-dual strict feasibility holds for any and assume that the data are continuously differentiable functions of . Let be a fixed value of the time parameter and suppose that is a non-degenerate optimal and strictly complementary point for (P,D). Then there exists and a continuously differentiable unique mapping defined on such that is a unique and strictly complementary primal-dual optimal point to (P,D) for all . In particular, the ranks of and are constant for all .

Based on these facts, for the initial problem (SDP) we make the following assumptions.

  1. [label=(A0)]

  2. The linear operator is surjective in any .

  3. (SDP) has a primal non-degenerate solution and
     (D-SDP) has a dual non-degenerate solution at any .

  4. The solution pair is strictly complementary for any .

  5. Data are continuously differentiable functions of .

Assumption 1 is standard for SDPs and in linearly constrained optimization in general, while assumptions 23 rule out many “pathological” cases [12]. In particular, assumption 2 implies that the solution pair is unique. By Theorem 2.4 and assumptions 23, and 4 have the following consequences:

  1. [label=(C0)]

  2. (SDP) has a unique and smooth solution curve , .

  3. The curve is of constant rank .

For setting up the factorized version (BM) of (SDP), it is necessary to choose the dimension of the factor matrix in (BM), ideally equal to of 2. In what follows, we assume that we know the constant rank . Given access to an initial solution at time , it is possible to compute , so this assumption is without further loss of generality.

It is worth noting that the rank cannot be arbitrary. Based on a known result of Barvinok and Pataki [10], for any SDP defined by linearly independent constraints, there always exists a solution of rank such that . Since we assume that is the unique solution to (SDP) with constant rank we conclude that

We point out that recently the Barvinok–Pataki bound has been slightly improved [28].

3 Quotient geometry of PSD rank- matrices

We now investigate the factorized formulation (BM) in more detail. As already mentioned, in contrast to the original problem (SDP), this is a nonlinear problem (specifically, a quadratically constrained quadratic problem) which is non-convex. Moreover, the property of uniqueness of a solution, which is guaranteed by 1 for the original problem (SDP), is lost in (BM), because its representation via the map

is not unique. In fact, this map is invariant under the orthogonal group action

on , where

with denoting the identity matrix, is the orthogonal group. Hence both the objective function and the constraints in (BM) are invariant under the same action. As a consequence, the solutions of (BM) are never isolated [31]. This poses a technical obstacle to the use of path-following algorithms, as the path needs to be, at least locally, uniquely defined.

On the other hand, by assuming that the correct rank of a unique solution for (SDP) has been chosen for the factorization, any solution for (BM) must satisfy (and in particular be of full column rank). From this it easily follows that the above action of the orthogonal group is indeed the only source of non-uniqueness. This corresponds to the well-known fact that the set of positive definite fixed rank- symmetric matrices, which we denote by , is a smooth manifold that can be identified with the quotient manifold , where is the open set of matrices with full column rank.

In the following, we describe how the non-uniqueness can be removed by introducing the so-called horizontal space, which is a standard concept in optimization on quotient manifolds [2]. For positive semidefinite fixed-rank matrices, this has been worked out in detail in [34]. Additional material, including the complex Hermitian case, can be found in [8]. However, in order to arrive at practical formulas that are useful for our path-following algorithm later on, we will not further refer to the concept of a quotient manifold but directly focus on the injectivity of the map on a suitable linear subspace, which we describe in the following section.

3.1 Horizontal space and unique factorizations

Given , we denote the corresponding orbit under the orthogonal group as

The orbit is an embedded submanifold of of dimension with two connected components, according to . Its tangent space at , which we denote by , is easily derived by noting that the tangent space to the orthogonal group

at the identity matrix equals the space of real skew-symmetric matrices

(see, e.g., [2, Example 3.5.3]). Therefore,

Since the map is constant on , its derivative

vanishes on , that is .

The horizontal space at , denoted by , is the orthogonal complement of with respect to the Frobenius inner product. One verifies that

since holds for all skew-symmetric if and only if is symmetric. We point out that sometimes any subspace complementary to is called a horizontal space, but we will stick to the above choice, as it is the most common and has certain theoretical and practical advantages. In particular, since , we have that is just a linear space.

The purpose of the horizontal space is to provide a unique way of representing a neighborhood of in through with . Clearly,

Moreover, the following holds.

Proposition 3.1.

The restriction of to is injective. In particular, it holds that

where

is the smallest singular value of

. This bound is sharp if . For one has the sharp estimate

As a consequence, in either case, .

Proof.

For we have by standard properties of the trace. Taking yields

To derive the second equality we used for . Clearly, and if we also have that . This proves the asserted lower bounds. To show that they are sharp, let be a (normalized) singular vector tuple such that . If , then for any such that one verifies that the matrix is in and achieves equality. When , achieves it. ∎

Since maps to , which is of the same dimension as , the above proposition implies that is a bijection between and . This already shows that the restriction of to the linear space is a local diffeomorphism between a neighborhood of in and a neighborhood of in . The subsequent more quantitative statement matches the Theorem 6.3 in [34] on the injectivity radius of the quotient manifold . For convenience we will provide a self-contained proof that is more algebraic and does not require the concept of quotient manifolds.

Proposition 3.2.

Let . Then the restriction of to is injective and maps diffeomorphically to a (relatively) open neighborhood of in .

It is interesting to note that is the largest possible ball in on which the result can hold, since the rank-one matrices comprised of singular pairs of all belong to and is rank-deficient. Another important observation is that does not depend on the particular choice of within the orbit .

Proof.

Consider . Let

be a singular value decomposition of

with and having orthonormal columns. We assume . Then by we denote a matrix with orthonormal columns and . In the case , the terms involving in the following calculation are simply not present. We write

Since , we have

Then a direct calculation yields

and analogously for . Since the four terms in the above sum are mutually orthogonal in the Frobenius inner product, the equality particularly implies

as well as

(3)

The first of these equations can be written as

By Proposition 3.1 (with , and ),

whereas

Since , this shows that we must have , which then by (3) also implies , since is invertible.

Hence, we have proven that is an injective map from to . To validate that it is a diffeomorphism onto its image we show that it is locally a diffeomorphism, for which again it suffices to confirm that is injective on for every (since and have the same dimension). It follows from Proposition 3.1 (with replaced by , which has full column rank) that the null space of equals . We claim that , which proves the injectivity of on . Indeed, let be an element in the intersection, i.e., for some skew-symmetric and . Inserting the first relation into the second, and using , yields the homogenuous Lyapunov equation

(4)

The symmetric matrix

in (4) is positive definite, since (here denotes the

-th eigenvalue of the corresponding matrix). But in this case (

4) implies , that is, . ∎

Finally, it is also possible to provide a lower bound on the radius of the largest ball around such that its intersection with is in the image (so that an inverse map is defined).

Proposition 3.3.

Any satisfying is in the image , that is, there exists a unique such that .

Observe that one could take

(5)

as a slightly cleaner sufficient condition in the proposition.

Proof.

Let with and assume a polar decomposition of , where , is positive semidefinite and is orthogonal. Let . Then

(6)

satisfies , and since is symmetric, we have . We need to show , that is, . Proposition 3.2 then implies that is unique in . Let be the orthogonal projector onto the column span of and . With that, we have the decomposition

(7)

We estimate both terms separately. Since is symmetric and positive semidefinite, the first term satisfies

(8)

A simple consideration using a singular value decomposition of and reveals that

for some and with orthonormal columns. Consequently, by von Neumann’s trace inequality (see, e.g., [27, Theorem 7.4.1.1]), we have

Inserting this in (3.1) yields

We remark that we could have concluded this inequality from [8, Theorem 2.7] where it is also stated. It actually holds for any for which is symmetric and positive semidefinite using the same argument (in particular for replaced with the initial ). Let now be a singular value decomposition of with the smallest positive singular value. Then for some positive semidefinite and it follows from well-known results, cf. [41], that111For completeness we provide the proof. The matrix is the unique solution to the matrix equation . Indeed, the linear operator on is symmetric in the Frobenius inner product and has positive eigenvalues

(the eigenvectors are rank-one matrices

with the eigenvectors of ). Hence .

Noting that and we conclude the first part with

(9)

The second term in (7) can be estimated as follows:

(10)

where we used the Cauchy-Schwarz inequality and the fact that has rank at most .

As a result, combining (7) with (9) and (10), we obtain

(11)

The right side is strictly smaller than when

which proves the assertion. ∎

Remark 3.4.

From definition (6) of , since is given by the polar decomposition , it follows that

see, e.g., [27, section 7.4.5]. The quantity in fact defines a Riemannian distance between the orbits and in the corresponding quotient manifold; see [34, Proposition 5.1].

3.2 A time interval for the factorized problem

We now return to the factorized problem formulation (BM). Let be an optimal solution of (BM) at some fixed time point  (so that and ). Based on the above propositions we are able to state a result on the allowed time interval for which the factorized problem (BM) is guaranteed to admit unique solutions on the horizontal space corresponding to the original problem (SDP). For this, exploiting the smoothness of the curve , we first define

(12)

a uniform bounds on the time derivative, as well as

(13)

on the smallest eigenvalue of , are available for . Notice that the existence of such bounds is without any further loss of generality: the existence of follows from 1, which guarantees that is a smooth curve, while the existence of is guaranteed by 2, since has a constant rank.

Theorem 3.5.

Let be a solution of (BM) as above. Then for there is a unique and smooth solution curve for the problem (BM) restricted to in the time interval .

Proof.

It suffices to show that for in the asserted time interval the solutions of (SDP) lie in the image . By Proposition 3.3, this is the case if , where is the smallest eigenvalue of . Since

and , the condition is sufficient. Then Proposition 3.2 provides the smooth solution curve for problem (BM). ∎

The results of this section motivate the definition of a version of (BM) restricted to , which we provide in the next section.

4 Path following the trajectory of solutions

In this section, we present a path-following procedure for computing a sequence of approximate solutions at different time points that tracks a trajectory of solutions to the Burer–Monteiro reformulation (BM). From this sequence we are then able to reconstruct a corresponding sequence of approximate solutions tracking the trajectory of solutions for the full space TV-SDP problem (SDP). The path-following method is based on iteratively solving the linearized KKT system. Given an iterate on the path, we explained in the previous section how to eliminate the problem of non-uniqueness of the path in a small time interval by considering problem (BM) restricted to the horizontal space . We now need to ensure that this also addresses the question of the invertibility of the linearized KKT system. We show in Theorem 4.2 that this is indeed guaranteed under standard regularity assumptions on the original problem (SDP). This is a remarkable fact of somewhat independent interest.

4.1 Linearized KKT conditions and second-order sufficiency

Given an optimal solution at time , we aim to find a solution at time . By the results of the previous section, the next solution can be expressed in a unique way as

where is in the horizontal space , provided that is small enough.

We define the following maps:

(14)

By definition, if and only if . For symmetry reasons we use the equivalent condition (which reflects the fact that is actually a linear space).

To find the new iterate we hence consider the problem

(BM)
   s.t.

This is a quadratically constrained quadratic problem whose Lagrangian is

(15)

with multipliers and . The KKT conditions of problem (BM) are

(16)

Hence, equation (16) reads explicitly as