Multiplicative updates for symmetric-cone factorizations

08/02/2021 ∙ by Yong Sheng Soh, et al. ∙ National University of Singapore 0

Given a matrix X∈ℝ^m× n_+ with non-negative entries, the cone factorization problem over a cone 𝒦⊆ℝ^k concerns computing { a_1,…, a_m}⊆𝒦 and { b_1,…, b_n}⊆ 𝒦^* belonging to its dual so that X_ij = ⟨ a_i, b_j ⟩ for all i∈ [m], j∈ [n]. Cone factorizations are fundamental to mathematical optimization as they allow us to express convex bodies as feasible regions of linear conic programs. In this paper, we introduce and analyze the symmetric-cone multiplicative update (SCMU) algorithm for computing cone factorizations when 𝒦 is symmetric; i.e., it is self-dual and homogeneous. Symmetric cones are of central interest in mathematical optimization as they provide a common language for studying linear optimization over the nonnegative orthant (linear programs), over the second-order cone (second order cone programs), and over the cone of positive semidefinite matrices (semidefinite programs). The SCMU algorithm is multiplicative in the sense that the iterates are updated by applying a meticulously chosen automorphism of the cone computed using a generalization of the geometric mean to symmetric cones. Using an extension of Lieb's concavity theorem and von Neumann's trace inequality to symmetric cones, we show that the squared loss objective is non-decreasing along the trajectories of the SCMU algorithm. Specialized to the nonnegative orthant, the SCMU algorithm corresponds to the seminal algorithm by Lee and Seung for computing Nonnegative Matrix Factorizations.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

A fundamental problem in mathematical optimization is to maximize a linear function over a convex constraint set. An important consideration for the development of tractable numerical algorithms for such problems is that the constraint set admits a compact description. A powerful paradigm for obtaining compact descriptions of convex sets is based on lifts over structured convex cones. More precisely, given a convex set , an extended formulation or lift of over a (full-dimensional closed) convex cone is a description of as the projection of an affine slice of the cone ; i.e., , where is a linear projection and an affine subspace.

Given a -lift, the problem of maximizing linear functions over reduces to one of solving a linear conic program over the cone . Linear conic programs (LCPs) capture many important classes of optimization problems – for instance, LCPs where the cone is the -dimensional nonnegative orthant correspond to Linear Programs (LPs), LCPs where is the cone of positive semidefinite matrices correspond to Semidefinite Programs (SDPs), and linear conic programs where is the -dimensional second order cone correspond to Second Order Cone programs (SOCPs). We often refer to extended formulations over as LP, SDP, and SOCP-lifts respectively.

Lifted descriptions are powerful tool for optimization because there are many examples of convex sets whose descriptions (at least on the surface) appear to be complex (such as by being specified by a number of inequalities that is exponential in the ambient dimensions) but do in fact admit compact lifted descriptions. Examples of such constraint sets include the unit ball of the norm, the spanning tree polytopes [20, 27], and the permutahedra [10].

The choice of cone also matters – the smallest known LP-lift for the stable set polytope of a perfect graph with vertices is [28] whereas there exists an SDP-lift of size using the theta body [21], and in fact it is the latter representation that gives rise to the only known polynomial-time algorithm for finding the largest stable set in a perfect graph. Moreover, the dimension of the cone is fundamentally linked to the computational complexity required to solve the associated linear conic program. As such, for an extended formulation to be practically useful for optimization, it is important to seek extended formulations involving cones of low dimensionality.

Despite the usefulness of extended formulations, it is not immediately clear how one systematically searches for such descriptions. To this end, Yannakakis shows that LP-lifts of polytopes are fundamentally linked to the existence of a structured matrix factorization of a slack matrix [28], while Gouveia, Parrilo, and Thomas [11] extend this connection to lifts using more general cones.

We explain this connection more precisely: Given an entrywise non-negative matrix and a (full-dimensional convex) cone lying in inner product space , the cone factorization problem

concerns finding two collections of vectors

in the cone and in the dual cone  where

(1)

In the case where , the cone factorization problem reduces to the Non-negative Matrix Factorization (NMF) problem in which we seek a collection of non-negative vectors and such that for all and . The smallest for which has an -dimensional NMF is the nonnegative rank of the matrix . NMFs were initially studied within the field of linear algebra (see, e.g., [4]), and later gained prominence as a dimensionality reduction tool providing interpretable parts-based representations of non-negative data [18].

In the context of LP-lifts, Yannakakis [28] showed that the existence of an -lift for a polytope is equivalent to the existence of a -dimensional NMF of its slack matrix 

, a nonnegative rectangular matrix where the rows are indexed by the bounding hyperplanes

of , the columns are indexed by the extreme points , and the -entry is given by . Subsequently, Gouveia, Parrilo, and Thomas [11] showed that a polytope admits an extended formulation over a (full-dimensional) closed convex cone if its slack matrix admits an exact cone factorization over . In fact, this relationship holds for convex sets beyond polyhedral ones, although it is then necessary to extend the notion of a slack matrix to that of a slack operator indexed over the infinite set of extreme points and/or facets.

Cone factorizations also have important applications beyond optimization. One such prominent example is in quantum information science, where factorizations over the PSD cone are relevant to the quantum correlation generation problem [13] and one-way quantum communication complexity [9]. Specifically, given a non-negative matrix , the positive semidefinite matrix factorization (PSDMF) problem concerns computing two families of PSD matrices and such that for all . The smallest for which admits an -dimensional PSDMF is known as the PSD-rank of  [7].

Computing cone factorizations.

The task of computing an exact cone factorization is in general intractable. Concretely, in the specific setting where is the non-negative orthant, Vavasis showed that even computing the non-negative rank is NP-hard [26]. On the positive side, Arora et al. propose an algorithm for computing -dimensional NMFs whose complexity is polynomial time in the dimensions of the matrix, provided the rank parameter is held constant [1].

Despite the hardness of NMFs, a wide range of numerical algorithms for computing (approximate) NMFs have been developed and implemented in a wide range of data analytical applications. Most, if not all, algorithmic approaches are based on the principle of alternating minimization. One of the most prominent approaches for computing NMFs is the Multiplicative Update (MU) algorithm proposed by Lee and Seung [19]. The update scheme is based on the majorization-minimization framework [17], and it operates by performing pointwise scaling by carefully selected non-negative weights. In fact, it is the simplicity of the MU scheme that drives its popularity. Other alternative methods include variants of projected gradient methods and coordinate descent.

There are also a wide range of methods for computing PSDMFs, which operate in a similar vein by alternating minimization. In particular, Vandaele et al. propose algorithmic approaches based on the projected gradient method and coordinate descent [24], while the authors [14, 15, 16] develop algorithms by drawing connections between PSDMFs to the affine rank minimization and the phase retrieval problems from the signal processing literature. Recently, Soh and Varvitsiotis introduced the Matrix Multiplicative Update (MMU) method, which is the analogue of the MU scheme for computing PSDMFs in which updates are performed by congruence scaling with appropriately chosen PSD matrices [22]. In particular, the Matrix Multiplicative Update scheme retains the simplicity that the MU update scheme for NMF offers.

Contributions.

In this work, we introduce the symmetric-cone multiplicative update (SCMU) algorithm for computing cone factorizations 1 in the case where the cone is symmetric; that is, it is self dual and homogenous (i.e., the automorphism group of acts transitively on ). The SCMU algorithm corresponds to Lee-Seung’s Multiplicative update algorithm [19] for NMF when is the non-negative orthant, and to the Matrix Multiplicative update algorithm in [22] for PSDMF when is the cone of PSD matrices.

The SCMU scheme is based on the principles of the majorization-minimization framework, and is multiplicative in the sense that iterates are updated by applying an appropriately chosen automorphism of (the interior) of the cone . As a result, the SCMU algorithm ensures that iterates remain in the interior of the cone , provided it is initialized in the interior. In terms of performance guarantees for the SCMU algorithm, we prove is that the Euclidean squared loss is non-increasing along its trajectories and moreover, we also show that fixed points our our scheme correspond to first-order stationary points. Additionally, if the starting factorization lies in a direct sum of simple symmetric cones, the direct sum structure is preserved throughout the execution of the SCMU algorithm. As such, the SCMU algorithm specifies a method for computing hybrid cone factorizations.

In terms of applications, when applied to computing cone factorizations of slack matrices, the SCMU algorithm provides a practical way to compute conic lifts of convex sets – for instance, it can be used to compute approximate SOCP lifts when is chosen to be (products of) the second order cone . In particular, we give explicit examples where we apply the SCMU algorithm for computing SOCP lifts of regular polygons.

Paper Organization.

In Section 2, we provide necessary background material concerning Euclidean Jordan Algebras and Symmetric Cones. In Section 3, we describe our approach and derive the multiplicative update algorithm for symmetric-cone factorizations. In Section 4, we show that the Euclidean square loss is nonincreasing along the algorithms’ trajectories and that fixed points satisfy the first-order optimality conditions. In Section 5, we conclude with numerical experiments focusing on lifts over the second-order cone.

2 Euclidean Jordan Algebras and Symmetric Cones

In this section, we provide brief background on symmetric cones to describe our algorithm and our analysis. Our discussion requires the formal language of Euclidean Jordan algebras (EJAs) from which symmetric cones arise. For further details and omitted proofs, we refer the reader to [5, 25].

Jordan Algebras. Let be a finite-dimensional vector space endowed with a bilinear product . We say that the pair form a Jordan algebra if the following properties hold:

Here, we use the shorthand notation . In the remainder of this paper, we assume that the Jordan algebra has an identity element; that is, there exists such that .

Euclidean Jordan Algebras. A Jordan algebra over which is equipped with an associative inner product , (i.e., for all ) is called Euclidean.

Spectral decomposition and powers. Our algorithm requires the computation of square-roots as well as inverses of Jordan Algebra elements. To explain how these are defined, we require a spectral decomposition theorem for EJAs. In what follows, we describe a version of this theorem based on Jordan frames (see Type-II spectral decomposition [5, Theorem III.1.2]).

More concretely, an idempotent is an element satisfying . We say that an idempotent is primitive if it is nonzero and cannot be written as a sum of two nonzero idempotents. We say that a collection of primitive idempotents form a Jordan frame if they satisfy if and only if , and . It follows that distinct elements of a Jordan frame are orthogonal with respect to the inner product .

For every there exists a Jordan frame , and real numbers  with Here, the value is called the rank of . The scalars are called the eigenvalues of and – up to re-ordering and accounting for multiplicities – are uniquely specified.

Given with spectral decomposition , we define its -th power as whenever exists for all . We say that an element is invertible if all its eigenvalues are nonzero.

The canonical trace inner product. In our description of an EJA so far, we have only assumed the existence of an inner product . There is in fact a canonical choice: Given the spectral decomposition , define the trace to be the sum of the eigenvalues . Then, the bilinear mapping is positive definite, symmetric, and also satisfies the associativity property (see, for instance, [5, Proposition II.4.3]). Furthermore, in the case where the EJA is simple (i.e., it cannot be expressed as the direct sum of smaller EJAs), the inner product is a positive scalar multiple of the canonical one ([5, Proposition III.4.1]). In the remainder of this paper we use the notation to denote the canonical EJA inner product.

The Lyapunov transformation and the quadratic representation. Consider an EJA . As the algebra product is bilinear, given any , there is a matrix  satisfying for all . We note that it is fairly common in the literature to express the defining properties of an EJA in terms of the Lyapunov operator . For isntance, the requirement is equivalent to the requirement that the operator commutes with , while the requirement is equivalent to the requirement , i.e., the operator is symmetric with respect to the trace inner product.

The quadratic representation of is defined by . The term quadratic alludes to the fact that . The quadratic representation satisfies the following properties:

(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)

Cone of squares and symmetric cones. Given an inner product space , we say that a cone is symmetric if:

  1. is self-dual; i.e., , and

  2. is homogeneous; i.e., given , there is an invertible linear map such that , and (or acts transitively on ).

Given an EJA , the cone of squares is defined as the set

The set is self-dual with respect to the inner product , and hence closed and convex.

The key connection between symmetric cones and EJAs is that all symmetric cones arise as cone of squares of some EJA; see e.g., [5, Theorem III.3.1]. In particular, the homogeneity property of the cone of squares follows from the existence of a scaling point; i.e., for it holds

(10)

As both , it follows by 9 that and are positive definite. Consequently, the product is also positive definite and thus invertible. By 7, defines an automorphism of .

Classification of EJAs and Symmetric Cones. There is in fact a complete classification of all EJAs. Let denote the fields of complex numbers, quaternions, and octonions respectively. Then every finite-dimensional EJA is isomorphic to a direct sum of these simple EJAs:

  1. Symmetric matrices over or Hermitian matrices over or with .

  2. The space with product .

  3. The space of 3-by-3 Hermitian matrices over with product .

Subsequently, since all symmetric cones arise as cones of squares of some EJA, it follows that any symmetric cone corresponds to a direct sum of:

  1. Symmetric PSD matrices over or Hermitian PSD matrices over or

  2. Second-order cones, i.e., .

  3. 3-by-3 PSD matrices over octonions .

In Table 1, we summarize the basic information for the EJA corresponding to the cone of positive semidefinite matrices and the second order cone.

The Geometric Mean. Given a symmetric cone , the (metric) geometric mean of two elements is defined as the scaling point that takes to [23], i.e.,

(11)

As established in 10, the geometric mean satisfies:

(12)

In fact, it turns out that the geometric mean is the unique element of satisfying 12. Indeed, suppose satisfies . We then have

(13)

As and are both elements of (recall 7), the equality in 13 implies that and thus .

Finally, the geometric mean satisfies the following useful properties (e.g. see [23]):

(14)
Cone of squares PSD matrices Second-order cone
Ambient dimension
Identity
Trace
Rank (of interior)
Eigendecomposition Spectral decomposition
Table 1: Summary of key properties of the EJA associated to the cone of positive semidefinite matrices and the second order cone.

3 Deriving the Symmetric-cone Multiplicative Update Algorithm

Let be a symmetric cone, and let be an entrywise non-negative matrix. Our goal is to compute vectors and belonging to the cone such that

More formally, we frame our problem as a minimization instance over the squared loss objective:

(15)

In order to (approximately) solve 15, we alternate between minimizing the ’s and the ’s. Consider the sub-problem corresponding to fixing the variables and minimizing over . The objective 15 is separable in the ’s. After dropping the suffix to simplify our notation, the problem simplifies to

(16)

where is a fixed vector (the -th column of ) and is the linear mapping

We solve the convex optimization problem 16 via the Majorization-Minimization (MM) approach (see, e.g., [17] and references therein). In order to solve a problem using the MM framework, we need to identify a family of auxilliary functions , indexed by , satisfying these conditions

(17)

The MM update scheme is given by

Subsequently, the objective is non-increasing along the tracjectory of , which follows easily from the inequalities

Recall that in our setting, the goal is to minimize the function over the symmetric cone . As the objective is quadratic, the Taylor expansion of at the past iterate is

(18)

Suppose we restrict our search to quadratic auxilliary functions of the form

(19)

for some appropriate . Using such an auxilliary function, to compute the next iterate, we need to minimize the function in 19 over the cone

(20)

Note that although the objective 19 is strictly convex since choosing ensures that is positive definite, it is not immediately clear how one handles the constraint . Instead, we show that it is possible to make a specific choice of such that the minimizer of the unconstrained auxilliary problem 20 lies in the interior of . In such a setting, we would have

(21)

and hence we can simply calculate as the unconstrained minimum of 19, namely

(22)

Furthermore, suppose we pick so that

(23)

Then, the MM update is given by

Suppose that , and that the vector is non-zero. From 7, it follows that is a non-negative linear sum of elements in , and hence so long as .

The equation 23 specifies uniquely; specifically, 23 is equivalent to , which by the scaling point interpretation of the geometric mean (recall 12) implies that

(24)

Note that for 24 to exist, we require and to be in .

Finally, to check that the function given in 19 corresponding to is indeed an auxilliary function, we need to verify that the two conditions 17 are satisfied. The proof of this fact is deferred to the next section and is the main technical result in this paper.

Summarizing the preceding discussion, employing the MM approach to minimize the function over the symmetric cone using an auxilliary function of the form 19 with leads to the following the update rule:

In a similar fashion, we get an update rule for the ’s when the ’s are fixed, where is replaced by

We summarize our procedure for computing factorizations over symmetric cones in Algorithm 1.

Input: A non-negative matrix
Output: Vectors with

  While stopping criterion not satisfied:
  
Algorithm 1 Symmetric-Cone Multiplicative Update (SCMU) algorithm

3.1 Two important special cases

First, we specialize the SCMU algorithm to the setting where , in which case the SCMU algorithm gives an iterative method for computing NMFs. The non-negative orthant is the cone of squares of the EJA , where the Jordan product is componentwise multiplication, i.e., . Moreover, the trace of is just its 1-norm, the Lyapunov transformation is and the quadartic mapping . Finally, the metric geometric mean 11 of is given by .

Putting everything together, the -th coordinate of the vector is updated by:

(25)

These updates correspond to the multiplicative update rule introduced by Lee and Seung in [19], which is one of the most widely used approaches for calculating NMFs.

Next, we specialize the SCMU algorithm for cone factorizations with PSD factors (with real entries). Letting denote the space of real symmetric matrices, the (real) PSD cone is the cone of squares of the EJA , where the Jordan product is given by . In this setting, the Lyapunov operator and the quadartic representation are superoperators (i.e., linear operators acting on a vector space of linear operators), and are concretely given by

where is the vectorization operator. Using that , we get

(26)

Moreover, the trace is just the usual trace of a symmetric matrix (i.e., the sum of its eigenvalues). Lastly, using 26, the metric geometric mean 11 specializes to

the usual geometric mean of two positive definite matrices, e.g. see [3]. Putting everything together, we have that

and thus, the SCMU algorithm in the case specializes to:

This is exactly the Matrix Multiplicative update method derived in [22].

4 Performance Guarantees of the SCMU Algorithm

Our first result is to show that, given a symmetric cone , the function defined in 19 does indeed parameterize a family of auxilliary functions for over . To do so, we need to verify that the two properties given in 17 do indeed hold. We obviously have that , and hence it remains to verify the domination property, namely that

(27)

In fact, we show in the next theorem that this bound holds for all .

Theorem 4.1.

Let and define to be the linear map

where . Furthermore, let and

Then, we have that

(28)

In particular, this immediately implies that

(29)
Proof.

We first focus on simple EJAs. We need to show the validity of the generalized inequality 28 with respect to the Euclidean inner product. Nevertheless, as for simple EJAs, any inner product is a positive multiple of the canonical one, it suffices to prove 28 for the canonical one.

The proof of 28 is broken down in two steps. First, we show that it suffices to consider the special case where . More precisely, we first prove the inequality

(30)

for all such that . Assuming that 30 holds, we let , and let . Then

Thus, by 11 we have

By 30 we have . Consequently, we have

where in the second equality we used 4. To conclude the proof, we prove 30 by showing the following two properties in Lemma 4.2 and Lemma 4.3 respectively:

(31)
(32)

Assuming the validity of 31 and 32, it is now easy to conclude that 30 holds. By definition of the map and 14 we have

Consequently, 30 is equivalent to

(33)

Lastly, the proof of 33 follows from the following chain of inequalities:

where for the first inequality we use 32, for the second equality we use property 3 of the quadratic representation, and for the last inequality we use 31.

Lastly, we consider the case where the EJA is a direct sum of simple ones. In this case, the cone of squares is a direct of simple symmetric cones, i.e., and the Jordan product is given by and Then, it is easy to check that the operators and are separable with respect to the blocks corresponding to these individual simple EJAs. Thus, if the operator inequality holds for each individual block, it holds for the full-sized operators. ∎

Next we proceed with the proof of 31.

Lemma 4.2.

For any we have that

Proof.

We will show that

For any EJA the function

is jointly concave for any fixed and [8, Theorem 3.1]. This the extension of Lieb’s Concavity Theorem in the more general setting of EJAs. Also, by [8, Lemma 3.1] we have

and thus, it follows that for any fixed the mapping

is concave; i.e., for any and we have

Setting and , and using that we get

which is exactly 31. ∎

Next we proceed with the proof of 32.

Lemma 4.3.

For any we have that

Proof.

For this, note that: