DeepAI
Log In Sign Up

Efficient algorithms for tensor scaling, quantum marginals and moment polytopes

We present a polynomial time algorithm to approximately scale tensors of any format to arbitrary prescribed marginals (whenever possible). This unifies and generalizes a sequence of past works on matrix, operator and tensor scaling. Our algorithm provides an efficient weak membership oracle for the associated moment polytopes, an important family of implicitly-defined convex polytopes with exponentially many facets and a wide range of applications. These include the entanglement polytopes from quantum information theory (in particular, we obtain an efficient solution to the notorious one-body quantum marginal problem) and the Kronecker polytopes from representation theory (which capture the asymptotic support of Kronecker coefficients). Our algorithm can be applied to succinct descriptions of the input tensor whenever the marginals can be efficiently computed, as in the important case of matrix product states or tensor-train decompositions, widely used in computational physics and numerical mathematics. We strengthen and generalize the alternating minimization approach of previous papers by introducing the theory of highest weight vectors from representation theory into the numerical optimization framework. We show that highest weight vectors are natural potential functions for scaling algorithms and prove new bounds on their evaluations to obtain polynomial-time convergence. Our techniques are general and we believe that they will be instrumental to obtain efficient algorithms for moment polytopes beyond the ones consider here, and more broadly, for other optimization problems possessing natural symmetries.

READ FULL TEXT VIEW PDF

page 1

page 2

page 3

page 4

11/21/2017

Alternating minimization, scaling algorithms, and the null-cone problem from invariant theory

Alternating minimization heuristics seek to solve a (difficult) global o...
02/12/2021

Barriers for recent methods in geodesic optimization

We study a class of optimization problems including matrix scaling, matr...
04/06/2018

Quantum Machine Learning Matrix Product States

Matrix product states minimize bipartite correlations to compress the cl...
04/30/2020

Information geometry of operator scaling

Matrix scaling is a classical problem with a wide range of applications....
04/05/2019

Spectral analysis of matrix scaling and operator scaling

We present a spectral analysis for matrix scaling and operator scaling. ...
09/29/2022

Optimization of Functions Given in the Tensor Train Format

Tensor train (TT) format is a common approach for computationally effici...
04/30/2022

The Tensor Track VII: From Quantum Gravity to Artificial Intelligence

Assuming some familiarity with quantum field theory and with the tensor ...

1 Introduction and summary of results

1.1 Moment polytopes

As this paper is quite technical, with some non-standard material for computer scientists, we begin with motivating the main object we study, as it is extremely natural from an optimization perspective, the moment polytope. Consider first the following diverse set of problems, trying to pick up common features among them (besides the obvious guess that they all are special cases of the framework we consider in this paper).

  1. The Schur-Horn Theorem: Can a given Hermitian matrix be conjugated by unitary matrices to achieve a given diagonal?

  2. Eigenvalues of sums of Hermitian matrices: Do there exist Hermitian matrices , ,

    with prescribed eigenvalues such that

    ?

  3. Optimization: Can a given non-negative matrix be converted to another with prescribed row and column sums, by only reweighing its rows and columns?

  4. Quantum information: Can multiple parties, each holding a particle of a pure quantum state, locally transform their particles so that each particle is maximally entangled with the others?

  5. Analytic inequalities: Given linear maps and , does there exist a finite constant such that for all integrable functions we have

    An important special case of such framework111These inequalities are the celebrated Brascamp-Lieb inequalities, which capture many more important inequalities such as Hölder’s, Loomis-Whitney, and many others. See for instance [GGOW17] for a more detailed discussion. is Cauchy-Schwarz, with .

  6. Algebraic complexity: Given an arithmetic formula (with inversion gates) in non-commuting variables, is it non-zero?

  7. Polynomial support: Given oracle access to a homogeneous polynomial with non-negative integer coefficients on variables, is a specified monomial (given as integer vector of exponents) in the Newton polytope222Given a polynomial , define its support as the set of monomials whose coefficient in is nonzero. The Newton polytope of is given by the convex hull of the exponent vectors of these monomials. of ?

Some of the problems above are in and for others, there are sufficient hints that they are in (see [Hor54, LSW98, BGO17, GGOW17, GGOW16, Gur05]

). While they may seem non-linear in their inputs, convexity plays an important role in each of them, as they all reduce to solving linear programs (implicitly defined with large number of facets). More specifically, each input to each problem defines a point and a polytope, and the answer is yes iff the point is in the polytope. These polytopes turn out to be special cases of

moment polytopes.

This appearance of linearity and convexity is quite surprising, in some settings more so than others. Indeed, moment polytopes arise (and are used to understand problems) in many diverse settings such as symplectic geometry, algebraic geometry, lattice theory and others [GS, dS08]. The snag is that these polytopes are often defined by a huge number of inequalities (e.g. see [GGOW17]); typically the number is exponential or larger in the dimension of the input.333However, in many of these areas even finiteness provides progress, as even decidability may not be obvious. This motivates our efforts to develop efficient algorithms for them.

In order to explain the appearance of convex polytopes in these settings, we need to notice another common aspect of all problems above: their answers remain invariant under some group action! This is easy to see in some of the examples, which explicitly specify the groups. In the first, for matrices of size , it is , the group of transformations conjugating the input. In the second, each of the three matrices may be conjugated by a unitary. In the 3rd, it is the product of two (positive) diagonal invertible matrices which scale (resp.) the rows and columns. In the 4th problem, as each party is allowed to perform quantum measurements with post selection, the group representing each party’s operations is if its particle has states, and so the full group is a direct product of these ’s. The 5th problem is invariant to basis changes in the host space and the other spaces.444This reveals the Brascamp-Lieb polytopes [BCCT08, GGOW17] as special cases of (slices of) moment polytopes, which have an efficient weak separation oracle.

The 6th is much harder to guess without Cohn’s characterization of the free skew field, but turns out to be

acting on a different representation of the formulas. In the 7th, though it may not seem useful at first sight, acts by simply scaling every variable of the polynomial by a nonzero constant factor.

Having mentioned the two common features of the problems above (convexity and the invariance under a group action) we will now illustrate how one can use the structure of the group action in order to obtain moment polytopes. Let be a “nice" 555The technical definition requires the group to be algebraic, reductive and connected and so on. But for the purpose of this paper, one can think of groups like , , their direct products etc. group acting linearly and continuously on a vector space and be a point in . The orbit of a point is the set of all vectors obtained by the action of on . The orbit closure of is simply the closure of its orbit in the Euclidean topology. As the previous paragraph observed, all of the problems above are questions about the orbit closures, which suggests understanding orbit closures is a fundamental task with many applications. A natural approach to study such orbit closures is by looking at the infinitesimal action of the group on every point .

This brings us to the moment map, denoted by , which is essentially a gradient of the log of the norm of along the group action.666Indeed, the original name was momentum map, and is inspired from Hamiltonian physics, in which momentum is proportional to the derivative of position. Apparently moment maps are common in physics, where they are used to obtain conserved quantities (i.e. invariants) from symmetries of the phase space of symplectic manifolds describing some Hamiltonian system. In the general setting, we have the action of continuous group on a manifold, and the moment map provides a reverse map, from the manifold to the group (or more precisely, to the dual of the Lie algebra of the group). More explicitly, for each point we can define the function , and will be the gradient of evaluated at the identity element of . The moment map carries a lot of information about the group action, and one of its striking features is that the set of possible spectra of the image of any orbit closure under the moment map is a rational convex polytope [Kos73, Ati82a, GS82a, Kir84a]! That is a mouthful. So consider an example to see what we mean. Consider the action of on some vector space . Then the moment map maps to (set of all matrices). Then the collection , as varies over an orbit-closure forms a rational convex polytope. Here denotes the vector of eigenvalues of arranged in decreasing order. Note that is a quadratic function of , so the appearance of convexity is extremely surprising and non trivial. This polytope, which we will more explicitly see in the next section, is the so called moment polytope of the group action on the orbit of .

In the matrix scaling case (Problem 3), it turns out that the moment map applied to a certain matrix gives us precisely the marginals of (that is, the vector of row sums and column sums normalized to sum ).777There is a slight technicality here and the moment map is actually the absolute values squared of the entries of . Thus, testing whether can be scaled to another matrix with prescribed row and column sums is equivalent to testing whether the prescribed vector of row and column sums belongs to the moment polytope of the action of on . Similarly, all of the seven problems listed above fit into this framework (membership in moment polytope) for a suitable choice of group and representation.888For some of the problems mentioned above, it is non-trivial to phrase them as moment polytopes.

The reader might notice the dual nature of the problems above. They are both of algebraic as well as analytic nature. This phenomenon is extremely general and crucial for our paper. The analytic nature helps in designing algorithms, making the problem amenable to general optimization techniques, while the algebraic helps with analysis of these analytic algorithms and provides potential functions to track progress made by these algorithms. We will see that this will be the case for us as well.

1.2 Our setting

In this paper, we will be concerned with the moment polytopes of a natural “basis" change group action on tensors, which are of interest for several reasons. The moment polytopes in this setting capture fundamental problems in quantum many-body physics - the so called one-body quantum marginal problem. They also capture fundamental problems in representation theory related to Kronecker coefficients, which are central objects of study in algebraic combinatorics and play an important role in geometric complexity theory. Moreover, as we will see, these moment polytopes generalize many of the settings described above and we believe that their complexity is representative of the complexity of general moment polytopes.

These moment polytopes (and their related problems) are most natural to state from the point of view of quantum systems and their quantum marginals 999

These generalize the classical notion of marginals of a probability distribution on several variables.

, so we start by defining them first. But before we define quantum systems some brief notation must be established.

Let denote the space of dimensional tensors of format , and let be a tensor in . If we regard as a vector, with being it’s conjugate transpose, then is a Hermitian positive semidefinite (PSD) operator on . We will denote by the norm of (when viewed as a vector). With this notation in mind, we then define a quantum system with subsystems as a PSD operator on with unit trace 101010A reader not familiar with the basics of quantum systems may want to skip a couple of paragraphs ahead..

Given a quantum system on  and a subset , we define its (quantum) marginals or reduced density matrices by , where denotes the partial trace over tensor factors . In the same way that describes the state of the entire quantum system, characterizes the state of the subsystems labeled by  (in an analogous way to the classical marginal of a probability distribution). For , we write ; these operators are known as the one-body marginals or one-body reduced density matrices of . Each is uniquely characterized by the property that

(1)

for all operators on .

For a tensor and a given subset of the subsystems, the marginals of  with respect to have a particularly simple description: using the standard basis, identify  with a matrix , where we denote . The matrix is known as a flattening, unfolding, or matricization [Lan12, Hac12] of the tensor . Then, is its Gram matrix.

Given a Hermitian operator on  (i.e., an Hermitian matrix), we write  for the vector of eigenvalues of , ordered non-increasingly. If is PSD with unit trace then its eigenvalues form a probability distribution, so is an element of . We also abbreviate . We will be particularly interested in characterizing the eigenvalues of the one-body marginals, motivated by the following fundamental problem in quantum mechanics [Kly06]:

Problem 1.1 (One-body quantum marginal problem).

Given , decide if there exists a tensor such that  for all .

Remark 1.2.

Note that the above problem is equivalent to the following, given density matrices PSD matrices with unit trace , determine if there exists a tensor (pure state) such that  for all . Since a unitary change of basis comes for free on each subsystem, only the eigenvalues of are relevant.

The above problem is extremely fundamental from the point of view of quantum many-body physics. It is a special case of the more general quantum marginal problem, which puts constraints on the marginals of multiple systems and is known to be QMA-complete (for growing ) [Liu06].

We note that the normalization to trace one is natural; since for all , we can simultaneously rescale all marginals simply by rescaling the tensor.

Now we discuss how Problem 1.1 can be phrased as a question about moment polytopes [NM84, Bri87, Kly06, Res10, VW17, Wal14]. Let , where denotes the group of invertible -matrices. Then acts on  by

As the group acts by rescaling slices of the tensor, we will call any a tensor scaling of .111111The extra coordinate with dimension can be equivalently thought of as enumerating an -tuple of tensors in and the group acts simultaneously on all the tensors in the tuple. Much of the theory remains similar if one sets and that can be done mentally on a first reading. In the quantum language, it is the difference between acting on pure states vs acting on mixed states .

What is the moment map in this setting? It turns out that the moment captures exactly the notion of one-body quantum marginals. It is more convenient to define the moment map on the projective space (namely restrict ourselves to tensors of unit norm), since we don’t care about the scalar multiples. We will denote the projective space corresponding to by and identify it with the set of rank-one trace-one PSD operators on , . Then the moment map can be written as121212After identifying the Lie algebra of  with its dual.

(2)

where  denotes the space of Hermitian -matrices. Now consider a projective subvariety of such as or an orbit-closure131313Here, the closure can be taken either in the Euclidean or in the Zariski topology. i.e. for some given tensor .141414In general can be any -stable irreducible projective subvariety of . Let us look at the collection of marginal eigenvalues when restricted to tensors in :

(3)

We emphasize again the amazing, surprising and non-trivial fact that is a rational convex polytope [NM84, Kir84b, Kir84a, Bri87] – known as the moment polytope or Kirwan polytope of .151515Note that we have identified with the set of rank density matrices and hence it is far from being a convex set - yet the spectrum of its image under the moment map is convex. This means that can in principle be given in terms of finitely many affine inequalities in eigenvalues of the one-body marginals [Kly06, Res10, VW17]. In particular, the preceding applies to , so we can rephrase Problem 1.1 as follows: Given , is it a point in ? More generally, we can consider the following decision problem:

Problem 1.3 (General moment polytope).

Given , decide if there exists a tensor such that  for all .

When is the orbit closure of some given tensor , we will abbreviate the moment polytope by . In quantum information theory, moment polytopes of orbit closures have been called entanglement polytopes as they characterize the multipartite entanglement from the perspective of the one-body marginals [WDGC13, SOK14]. But, along with the corresponding invariant-theoretic multiplicities, they are also of interest in algebraic and geometric complexity theory [BLMW11, BI11, CDW12, CVZ17]. The corresponding decision problem is the following:

Problem 1.4 (Moment polytope of orbit closure).

Given and , decide if there exists such that  for all .

That is, Problem 1.4 asks whether is a point in .161616When , there is a physical interpretation of the orbit-closure. means that can be obtained to arbitrary precision from (which is naturally understood as a -partite quantum state) by a class of quantum operations known as stochastic local operations and classical communication (SLOCC) [BPR00]. SLOCC can be intuitively understood as follows: we imagine that different parties hold the different systems of a quantum state; SLOCC then corresponds to a sequence of local quantum operations and measurements, where we allow for post-selection on specific measurement outcomes. Problem 1.4 then asks if given a tensor , does there exist a obtainable by a sequence of SLOCC operations from s.t. for all . This is a generalization of the SLOCC entanglement distillation question where

is the uniform distribution for all

.

One can show that Problem 1.4 is intimately related to Problems 1.3 and 1.1: iff for a generic (Corollary 2.6). We will explain this in Section 2. We will therefore focus our attention on Problem 1.4.

It is natural to go beyond the decision problem and look for an algorithm that finds a tensor  with the desired marginals, as well as the group element that transforms  into . Since such an will be in the orbit through , we demand only that the marginals are correct up to some target accuracy.

Definition 1.5 (-close).

The marginals of are said to be -close to if for . Here, is the -norm.

Problem 1.6 (Tensor scaling).

Given , , and , find such that has marginals that are -close to .

While it may not be immediately clear, there exist scalings as in Problem 1.6 for any if and only if the answer to Problem 1.4 is yes, i.e., if and only if .

The polytopes admit alternative characterization in terms of invariant theory [NM84]. We explain this connection in Section 2, as it is central to the analysis of our algorithms. For now, we only mention an important special case. Let denote the Kronecker coefficients, which are fundamental objects in the classical representation theory of the symmetric and general linear groups [FH13, Sta00]. They also feature in geometric complexity theory as a potential way of creating representation theoretic obstructions [Mul07, BLMW11]. For example, can be defined as the multiplicity of the irreducible -representation in the tensor product . Here, , , and are partitions, which we may think of nonincreasing vectors in with . Then,

(4)

so the solution to the one-body quantum marginal problem captures precisely the asymptotic support of the Kronecker coefficients [CM06, Kly06, CHM07]. We note that the problem of deciding whether is known to be NP-hard [IMW17]. However since the asymptotic vanishing of Kronecker coefficients is captured by the quantum marginal problem, it has been conjectured that it should have a polynomial time algorithm and we make progress towards this question.171717We note that the closely related Littlewood-Richardson coefficients (which capture the same problem for the representations of the general linear group) satisfy the so called saturation property: iff [KT99]. Hence the asymptotic support is the same as support for this case and this is also a key ingredient in the polynomial time algorithms for testing if [BI12]. Since Kronecker coefficients are so poorly understood, understanding their asymptotic support would also go a long way in understanding them.

1.3 Prior work

As mentioned above, Problem 1.1 can be approached by first computing (the defining inequalities of) the moment polytope . The problem of computing moment polytopes has a long history in mathematics (e.g., [Ati82b, GS82b, GS84, Kir84b, Kir84a, Bri99, BS00, Res10, Bel10]). That the one-body quantum marginal problem falls into this framework was first noticed by Klyachko [Kly04], who gave a complete description of the polytopes in terms of finite lists of inequalities (cf. [DH05, Kly06, AK08]). Before that, only low-dimensional special cases were known [HSS03, Bra03, Fra02]. Further developments include the minimal complete description from [Res10] and the cohomology-free variant [VW17]. Yet, all these descriptions in terms of inequalities are largely computationally infeasible; explicit descriptions are known up to formats  [Kly06] and  [VW17], and when all dimensions are two [HSS03].

Problems 1.3 and 1.4 can in principle be approached using classical computational invariant theory (e.g., [Stu08, DK15, WDGC13]), based on the invariant-theoretic description of  and degree bounds (we recall both in Section 2). In practice, however, this is completely infeasible except for very small dimensions. The problem of describing  also falls into the framework of [Res10], but it is not clear how to turn this into an algorithm. In summary, all the methods described above are computationally expensive and take time at least exponential in the input size.

None of the preceding algebraic methods can be used to solve Problem 1.6, since they only decide membership but do not produce the transformation that produces a tensor with the desired target spectra. This calls for the development of numerical algorithms for Problem 1.6. Curiously this development came stemmed from motivations in algebraic complexity and the PIT problem. The first such algorithm was proposed in [Gur04]. Its complexity analysis, that brought on the connection to invariant theory (and other fields, some mentioned above) was achieved in [GGOW16]. In the language we use here, it deals with (operator scaling) and uniform marginals, and results in polynomial time algorithms for problems in diverse areas discussed there.181818The underlying algebraic problem associated with operator scaling, namely non-commutative singularity and rank of symbolic matrices found a different, algebraic algorithm in the works of [IQS17, DM15] The operator scaling problem was then extended in two directions, which we mention next: one direction being general values of (tensor scaling) and the other being and arbitrary marginals.

For general , a deterministic algorithm was given in [BGO17] (based on a proposal in [VDDM03] for ). Very recently, a randomized polynomial time algorithm for operator scaling to general marginals was given in [Fra18]. The two papers [BGO17, Fra18] study two different generalizations of the operator scaling problem in [GGOW16]. The present paper completes a natural square by studying a common generalization of the problems studied in [BGO17, Fra18]. All these algorithms can be seen as noncommutative generalizations of the Sinkhorn-Knopp algorithm for ‘matrix scaling‘ [Sin64, SK67].

It was shown recently known that Problem 1.1 is in NPcoNP [BCMW17]. In view of Eq. 4, this should be contrasted with the NP-hardness of deciding whether a single Kronecker coefficient is zero or not [IMW17].

1.4 Summary of results

Our main result in this paper is a randomized algorithm for tensor scaling to general marginals (Problem 1.6). As a consequence, we obtain algorithms for all other problems.

Theorem 1.7.

There is a randomized algorithm running in time , that takes as input with Gaussian integer entries (specified as a list of real and complex parts, each encoded in binary, with bit size ) and with rational entries (specified as a list of numerators and denominators, each encoded in binary, with bit size ). The algorithm either correctly identifies that , or it outputs a scaling such that the marginals of are -close to the target spectra . Here is the total bit-size of the input, .

As a consequence of Theorem 1.7, we obtain a randomized algorithm for a promise version of the membership Problem 1.4 (and hence for Problem 1.1, see Corollary 2.6).

Corollary 1.8.

There is a randomized algorithm running in time , that takes as input with Gaussian integer entries (specified as a list of real and complex parts, each encoded in binary, with bit size ) and with rational entries (specified as a list of numerators and denominators, each encoded in binary, with bit size ). The algorithm distinguishes between the following two cases:

  1. .

  2. is -far (in norm) from any point .

Here is the total bit-size of the input, .

This yields the following corollary.

Corollary 1.9.

There is a randomized algorithm running in time , that takes as input with rational entries (specified as a list of numerators and denominators, each encoded in binary, with bit size ). The algorithm distinguishes between the following two cases:

  1. i.e. there exists such that for all .

  2. is -far (in norm) from any point .

As described before, Problem 1.1 captures the asymptotic vanishing of Kronecker coefficients. Hence we get the following corollary which describes a randomized polynomial time algorithm for a promise version of the asymptotic Kronecker problem.

Corollary 1.10.

There is a randomized algorithm running in time , that takes as input three partitions with entries described in binary with bit-size at most . The algorithm distinguishes between the following two cases:

  1. There exists and integer s.t. .

  2. For all s.t. , it holds that is -far (in -norm) from .

Here denotes the Kronecker coefficient and .

In many applications, the tensor  can be more succinctly represented than by its many coordinates. If the representation is preserved by scalings and allows for efficient computation of the marginals, then this yields a useful optimization of Algorithm 1. A prime example of which are the so called matrix-product states or tensor-train decompositions with polynomial bond dimension [VDDM03, Orú14]. We won’t define these states here (see Section 6 for a formal definition) but we will just say that these have much smaller (exponentially smaller in ) descriptions than specifying all the coordinates of the tensors. This class includes the unit tensors and the matrix multiplication tensors, which are central objects in algebraic complexity theory [BI11, BCS13] and whose moment polytopes are not known!

Theorem 1.11 (Informal).

There is a randomized algorithm running in time , that takes as input a matrix-product state with input size and with rational entries (specified as a list of numerators and denominators, each encoded in binary, with bit size ). The algorithm either correctly identifies that , or it outputs a scaling such that the marginals of are -close to the target spectra .

It is a very exciting open problem to improve the running time dependence on in Corollary 1.8, Corollary 1.9 and Corollary 1.10 to . This would yield randomized polynomial time algorithms for Problem 1.1, Problem 1.4 and the asymptotic Kronecker problem due to the following theorem that we prove in Section 5.

Theorem 1.12 (Minimal gap).

Let be a nonzero tensor. If is a scaling with marginals that are -close to , then . Here and is the minimal integer s.t. has integral entries.

An analogous result for the full moment polytope was proven in [BCMW17]. We believe that the inverse exponential bound in the above theorem cannot be improved to an inverse polynomial bound. Therefore developing scaling algorithms with runtime dependence is of paramount importance.

Before describing our algorithm and high level intuition for its analysis, let us describe the algorithm and analysis for a rather special case of matrix scaling, which turns out to very enlightening.

1.5 Simple example: matrix scaling

The matrix scaling problem (Problem 3 in Section 1.1) provides us with a template for what is to come, and understanding the evolution of a particular algorithm for this problem will give us intuition on how to solve the more general tensor scaling problem, and how invariant theory naturally appears.

If one wants to scale a given matrix

to a doubly stochastic matrix (that is, one whose rows and columns each sum to

), a natural algorithm (first proposed in [Sin64]) arises from the fact that the group is a Cartesian product. We can alternately use scalings of the form to normalize the row sums of and scalings of the form to normalize the column sums of .

To this end, set to be a diagonal matrix having to be the inverse of the sum of the elements of the row of , and define in a similar way for the columns of . The algorithm can be described as follows: repeatedly (for a polynomial number of iterations) apply the following steps:

  • Normalize the rows of . That is,

  • Normalize the columns of . That is, .

If, throughout this process, matrix never gets sufficiently close to a doubly stochastic matrix (in distance), then we will conclude that cannot be scaled to doubly stochastic; otherwise we can conclude that can be scaled to doubly stochastic. The process also gives us a way to obtain the scalings that approach doubly stochastic - while there are multiple algorithms for the decision problem (which turns out to be the bipartite perfect matching problem), not all help find the scalings!

The analysis of this algorithm (from [LSW98]; also see [GY98] for a different potential function) is extremely simple, and follows a three step approach based on a progress measure .

The following two properties of the potential function will be useful for us.

  1. If is scalable to doubly stochastic, then .

  2. if row or column normalized.

The three step approach then is the following:

  1. [Lower bound]: Initially (wlog we assume is row normalized) 191919There is some dependency on the bit complexity of the input that we are ignoring..

  2. [Progress per step]: If is row or column normalized and sufficiently far from being doubly stochastic, then normalizing increases . One can explicitly bound the increase using a robust version of the AM-GM inequality.

  3. [Upper bound]: is bounded by if is row or column normalized.

This three-step analysis shows that the scaling algorithm is able to solve the doubly stochastic scaling problem in polynomial time. The difficult part of the analysis is coming up with a potential function satisfying the properties above. This is the role played by invariant theory later. A source of good potential functions will turn out to be highest weight vectors

, which are (informally speaking) “eigenvectors” of the action of certain subgroups of the main group action. Note that the permanent is an eigenvector of the action of

since equals for .

If we want to solve the more general scaling problem, where we are given a prescribed value for the row and column sums, say as an non-negative integer vector , the same natural algorithm can be applied. The only change one needs to make in the algorithm above is that we will now normalize the rows of to have sums and the columns to have sum . The analysis is also quite similar: one can choose the potential function, for example, to be the permanent of matrix obtained from by repeating row times and column times. However, the distinction between the uniform and the non-uniform versions of the problems is much starker in our higher dimensional non-commutative setting, as we will see next.

1.6 Techniques and proof overview

Our algorithm and its analysis generalize two recent works [BGO17, Fra18], which in turn generalize the analysis of matrix scaling in Section 1.5. The paper [BGO17] studies the special case when is the uniform distribution (over a set of size ) for all while the paper [Fra18] studies the special case . Our algorithm is a natural common generalization of the algorithms in [Fra18, BGO17] while our analysis generalizes the analysis in [BGO17] replacing the use of invariants with highest weight vectors (we will explain what these are later).

Let us develop some intuition for the algorithm. It is usually the case with scaling problems, as we saw with matrix scaling, and more generally in the framework of alternating minimization, that one of the constraints is easy to satisfy by scaling. The same is true for the problem we have at hand. We are given a tensor . Suppose we want . With the shorthand , we act on by , where the non-trivial element is in the location. This is will satisfy the constraint. Or indeed, one can choose any matrix s.t. and act on by . This will also satisfy the constraint. By choosing each time to fix the index which is “farthest” from its target spectrum, we have defined an iterative algorithm (up to the choice of at each step) that keeps on alternately fixing the constraints. It turns out that this algorithm works (for any choice of at each step!) when ’s are all uniform and converges in a polynomial number of iterations [BGO17].

Interestingly, the choice of that works for general ’s is that of upper triangular matrices!202020This choice works for all

’s. We don’t know if this choice of upper-triangularity is necessary. There is also a nice interpolation between the case of uniform

’s and ’s with distinct entries. See Section 6.4. This was the choice made in [Fra18] as well. This restriction on scaling factors will make the analysis more complicated as we shall soon see. One intuitive reason for the difference between the uniform and the general case is the following: in the general case, we made an arbitrary decision to try to scale to have marginals while we could have chosen to scale it to any s.t. . This choice of basis is not present in the uniform case since all bases are the same!

This restriction on scaling factors creates another problem: it disconnects the orbit space (see example below). Thus, we need to initialize the algorithm with a random basis change of the given input, and only then resume the restricted scaling. This idea is used as well in [Fra18]. We explain, via an example, why this random basis change (or at least a “clever” basis change) is needed at the start of the algorithm. Consider the diagonal unit tensor , where iff . It is easy to see that without the initial randomization, the algorithm (which chooses an upper triangular at each step) would only produce diagonal tensors ( iff ). And the marginals of any such tensor are isospectral. On the other hand, the -orbit of  is dense in and so . In particular, can be scaled to tensors with non-isospectral marginals.

The algorithm is described as Algorithm 1. The following is the main theorem regarding the analysis of Algorithm 1 from which Theorem 1.7 follows up to an analysis of the bit complexity of Algorithm 1.

Input: with Gaussian integer entries (specified as a list of real and complex parts, each encoded in binary, with bit size ) and with rational entries (specified as a list of numerators and denominators, each encoded in binary, with bit size ) such that  for all .

Output: Either the algorithm correctly identifies that , or it outputs such that the marginals of satisfy Eq. 5; in particular the marginals are -close to the target spectra .

Algorithm:

  1. Let be the least integer such that has integer entries for all (i.e. is the common denominator of all ). Let denote the tuple of matrices ( is ) whose entries are chosen independently uniformly at random from , where and .

  2. For , if the marginal  is singular then output and halt.
    Otherwise, update .

  3. For , repeat the following:

    • Compute and, for , the one-body marginals  and the distances .

    • Select an index for which  is largest. If , output and halt.

    • Compute the Cholesky decomposition212121Usually the Cholesky decomposition refers to where is lower triangular. However using such a decomposition for a different matrix, one can easily obtain , where is upper triangular. Simply set where is a permutation matrix which swaps and and , where is lower triangular., where is an upper-triangular matrix. Update .

  4. Output .

Algorithm 1 Scaling algorithm for Theorem 1.13
Theorem 1.13 (Tensor scaling).

Let be a (nonzero) tensor whose entries are Gaussian integers of bitsize no more than . Also, let with rational entries of bitsize no more than  such that  for all . Finally, let .

Then, with probability at least 1/2, Algorithm 1 either correctly identifies that , or it outputs such that the marginals of are -close to . In fact, we have

(5)

in the latter case, where is the trace norm.

Remark 1.14.

Note that the condition

implies that

See Lemma 3.3.

To analyze our algorithm and prove Theorem 1.13, we follow a three-step argument similar to the analysis in Section 1.5. This has been used to great effect for operator scaling and tensor scaling in [Gur04, GGOW16, BGO17, Fra18] after identifying the appropriate potential function.

As we described in Section 1.5, the appropriate potential functions to choose are the ones which are eigenvectors of an appropriate group action. In the matrix scaling case, we were acting by and hence we chose the potential function to be permanent which is an eigenvector for this group action. In our algorithm, we are acting by the group corresponding to (direct products of) upper triangular matrices (this is known as the Borel subgroup). So for us, the right potential functions to consider are functions which are eigenvectors for the action of (tuples of) upper triangular matrices. One such class of functions are the so called highest weight vectors from representation theory222222Here we restrict our attention to the action on polynomials because that is what we need to describe the intuition for the analysis of the algorithm. But the discussion of weight vectors applies to arbitrary (rational) representations of the group , see Section 2.1., which we come to next.

What are highest weight vectors? We have the action of on . Let us consider the space of degree polynomial functions on , denoted by . The action of on induces an action of on given by . Consider a tuple of vectors , . Then we say that is a highest weight vector with weight if

for all such that is an upper triangular matrix for each . Note that this necessitates for each . This also necessitates (not trivial to see why) that for all , .

The following two properties of highest weight vectors will be crucial for our analysis:

  1. [[NM84], see Theorem 2.4]: Let be a rational vector. Then iff there exists an integer s.t. has integer entries and there exists a highest weight vector with weight s.t. for some . Here . This extends a fact used in previous papers: the uniform vector is is iff some invariant polynomial does not vanish on .

  2. [Proposition 2.9] The space of highest weight vectors with weight is spanned by polynomials with integer coefficients that satisfy the following bound

    (6)

    This extends an identical bound in past papers from invariant polynomials to highest weight vectors.

We use classical constructions of highest weight vectors [Pro07, BI13, BGO17] to derive the second fact. These constructions are only semi-explicit (e.g. it is not clear if they can be evaluated efficiently), however they suffice for us because we only need a bound on their evaluations for their use as a potential function. We note that such bounds on their evaluations haven’t been observed before in the invariant theory literature (except in [BGO17] for the special case of invariants) whereas for us they are extremely crucial! We also emphasize that it is crucial for us that the bound is singly exponential in . Some naive strategies of using solution sizes for linear systems only yield bounds that are doubly exponential in .

The potential function we use is . Here is some highest weight vector of degree (for some ), integer coefficients and weight that satisfies as well as Eq. 6. Such a exists by the discussion above. Using these properties, a three-step analysis, similar to the one in Section 1.5, follows the following outline.

  1. [Lower bound]: Since for some , therefore for a random choice of , . Furthermore, since we choose to have integer coefficients, . After the normalization in Step (2), we get . It is not hard to see that .

  2. [Progress per step]: increases at each step. Furthermore, if the current spectrum are “far" from the target spectrum, then one can explicitly bound the increase. Here the highest weight vector property of as well as Pinsker’s inequality from information theory play an important role.

  3. [Upper bound]: always. This follows from Eq. 6 and the fact that we maintain the unit norm property of after the normalization in Step (2) of the algorithm.

These three steps imply that in a polynomial number of iterations, one should get close to the target spectrum. A complete analysis is presented in Section 3.2. Note that to ensure that we only use a polynomial amount of random bits for the initial randomization, we need the highest weight vectors to have degree at most exponential in the input parameters. This is achieved by relying on Derksen’s degree bounds [Der01] (see Proposition 2.5).

1.7 Additional discussion

We would like to point out two important distinctions between the analysis for matrix scaling in Section 1.5 and our analysis here. First is that, as we have seen, there is a major difference between the uniform and the non-uniform versions of our problem - while this was not the case for matrix scaling. This phenomenon is general and is a distinction between commutative and non-commutative group actions. It has to do with the fact that all irreducible representations of commutative groups are one-dimensional, whereas for non-commutative groups they are not. Secondly, in the matrix scaling analysis, the upper bound was easy to obtain as well. Whereas for us, the upper bound step is the hardest and requires the use of deep results in representation theory. The upper bound steps were the cause of main difficulty in the papers [GGOW16, BGO17, Fra18] as well 232323In some of the papers, lower bound is the hard step, due to the use of a dual kind of potential function. and this is one key point of distinction between commutative and non-commutative group actions.

We believe that our framework of using the highest weight vectors as potential functions for the analysis of analytic algorithms is the right way to approach moment polytope problems - even beyond the cases that we consider in this paper.

The approach taken in [Fra18] (for the case of ) is one of reducing the non-uniform version of the problem to the uniform version, which was solved in [GGOW16] for the case of (the reduction in [GGOW17] is a simple special case of the reduction in [Fra18]). The reduction is complicated and a bit ad hoc. We generalize this reduction to our setting () in Section 4, and providing a somewhat more principled view of the reduction along the way. However, it still seems rather specialized and mysterious compared to the general reduction in geometric invariant theory from the “non-uniform” to the “uniform” case (also known as the shifting trick, see Section 2.2).

We also note that applying the results of [BGO17] to the reduction in Section 4 in a black-box manner does not yield our main theorem (Theorem 1.7) - the number of iterations would be exponential in the bit-complexity of , and we would even require an exponential number of bits for the randomization step! To remedy these issues with the reduction in Section 4 one must delve in to the relationship between the reduction and the invariant polynomials. We will see, by fairly involved calculation, that invariant polynomials evaluated on the reduction will result in the same construction of highest weight vectors anyway. This teaches us two lessons:

  1. Highest weight vectors are the only suitable potential functions in sight. Though it may have other conceptual benefits, the reduction in Section 4 is no better than the shifting trick for the purpose of obtaining potential functions!

  2. We had to look at the construction of highest weight vectors in Section 2.4 before calculating them from the reduction - the calculation might not have been so easy a priori! Again, the classical construction of highest weight vectors saves the day.

It is interesting to discuss some of the salient features and possible variations of Algorithm 1 (we expand on these points in the main text):

  • Iterations and randomness. The algorithm terminates after at most iterations and uses bits of randomness. For fixed or even inverse polynomial , this is polynomial in the input size. In fact, this is better than the number of iterations in [Fra18]: there, the number of iterations also depended on .

  • Bit complexity: To get an algorithm with truly polynomial run time, one needs to truncate the group elements ’s up to polynomial number of bits after the decimal point. We provide an explanation on why this doesn’t affect the performance of the algorithm in Section 3.3.

  • Degenerate spectra. If is degenerate, i.e. for some , then we may replace the Cholesky decomposition in step 3 by into two block upper triangular matrices, where the block sizes are the degeneracies - the set of such matrices is a so-called parabolic subgroup of the general linear group (Section 6.4

    ). Moreover, the random matrix 

    need only be generic up to action of the parabolic subgroup. In particular, when scaling to uniform spectra then no randomization is required and we can use Hermitian square roots, so Algorithm 1 reduces to the uniform tensor scaling algorithm of [BGO17].

  • Singular spectra. As written, Item 3 of Algorithm 1 fails if the spectra are singular, that is if for some we have . However, in this case, one may first pass to a smaller tensor tensor obtained by restricting the index to the last coordinates. We’ll show in Section 3.4 that is scalable by upper triangulars to marginals , if and only if is scalable by upper triangulars to , .

We discuss extensions of Algorithm 1 for more general varieties with “good” parametrizations in Section 6.

1.8 Conclusions and open problems

We provide an efficient weak membership oracle for moment polytopes corresponding to a natural class of group actions on tensors. This generalizes recent works on operator and tensor scaling and also yields efficient algorithms for promise versions of the one-body quantum marginal problem and the asymptotic support of Kronecker coefficients. Our work leaves open several interesting questions some of which we state below.

  • Improve the dependency on error in the running time of Algorithm 1 to . As discussed, this will immediately yield polynomial time algorithms for the one-body quantum marginal problem. This is open even for the uniform version of the problem. Here the notion of geodesic convexity of certain “capacity" optimization problems should play a key role (e.g. see [AZGL18]).

  • Extend the weak membership oracle we develop to moment polytopes of other group actions, using Kirwan’s gradient flow [Kir84b] as proposed in [Wal14]. The quantitative tools developed in this paper naturally extend to this setup and will elaborate on this in forthcoming work.

  • Develop separation oracles for moment polytopes. A related question is: can we optimize over moment polytopes? This will have algorithmic applications on the problem of computing quantum functionals, as described in [CVZ17]. In this paper, Strassen’s support functionals are generalized to quantum functionals, which are defined by convex optimization over the entanglement polytope. Thus, separation oracles for moment polytopes could lead to efficient algorithms for computing quantum functionals, which are important for comparing tensor powers (see [Str86, Str88]).

  • Find natural instances of combinatorial optimization problems which can be encoded as moment polytopes. Some examples can be found in

    [GGOW17].

1.9 Roadmap of the paper

In Section 2, we present results from geometric invariant theory and explain how they can be made quantitative. We use this in Section 3, where we analyze the proposed tensor scaling algorithm. In

Section 4, we explain how the reduction in [Fra18] can be naturally understood in the framework of this paper. In Section 5, we show a lower bound on the distance to the moment polytope of any rational point not contained in it. This lower bound depends only on the description of the rational point and the dimension of our tensor space , and it allows us to solve membership problems by using the tensor scaling algorithm. In Section 6, we extend our algorithm to general varieties and degenerate spectra. In Appendix A.1 we discuss the Borel polytope, providing an alternate proof that it is in fact a rational polytope.

Acknowledgements

We would like to thank Shalev Ben David, Robin Kothari, Anand Natarajan, Frank Verstraete, John Watrous, John Wright, and Jeroen Zuiddam for interesting discussions.

PB is partially supported by DFG grant BU 1371 2-2. CF is supported in part by Simons Foundation award 332622. MW acknowledges financial support by the NWO through Veni grant no. 680-47-459. AW is partially supported by NSF grant CCF-1412958.

2 Geometric invariant theory

In this section, we present some results from geometric invariant theory that will feature centrally in the analysis of our algorithm in Section 3. While stated for tensors, all results in this section can easily be extended to arbitrary rational representations of connected complex reductive algebraic groups. Most of the results are well known and only some are new. All previously known results will be cited with references and we will make sure to highlight the new components. Section 2.1 discusses basics of the highest weight theory. Section 2.2 gives a formal definition of the moment map and also discusses the so called “shifting trick" that reduces the problem of membership in moment polytopes to a null cone problem. Section 2.3 considers degree bounds for highest weight vectors which are used to bound the initial randomness used in Algorithm 1. Section 2.4 recalls a classical construction of highest weight vectors and uses this construction to prove bounds on their evaluations (crucial in the analysis of Algorithm 1). Section 2.5 develops a necessary and sufficient condition for Borel scalability (i.e., scaling using tuples of upper-triangular matrices).

As before, let , , , and a -stable irreducible projective subvariety (e.g., an orbit closure).

2.1 Highest weight theory

We first recall the representation theory of  (see, e.g., [FH13] for an introduction). Let  be a finite-dimensional -representation, equipped with a -invariant inner product. Let denote the subgroup consisting of invertible diagonal matrices, called the maximal torus of . Since is commutative, its action can be jointly diagonalized. Thus, any finite-dimensional -representation  can be written as a direct sum of so-called weight spaces, , where acts on any vector as for all . Here, is an integer vector and