1 Introduction and summary of results
1.1 Moment polytopes
As this paper is quite technical, with some nonstandard material for computer scientists, we begin with motivating the main object we study, as it is extremely natural from an optimization perspective, the moment polytope. Consider first the following diverse set of problems, trying to pick up common features among them (besides the obvious guess that they all are special cases of the framework we consider in this paper).

The SchurHorn Theorem: Can a given Hermitian matrix be conjugated by unitary matrices to achieve a given diagonal?

Eigenvalues of sums of Hermitian matrices: Do there exist Hermitian matrices , ,
with prescribed eigenvalues such that
? 
Optimization: Can a given nonnegative matrix be converted to another with prescribed row and column sums, by only reweighing its rows and columns?

Quantum information: Can multiple parties, each holding a particle of a pure quantum state, locally transform their particles so that each particle is maximally entangled with the others?

Analytic inequalities: Given linear maps and , does there exist a finite constant such that for all integrable functions we have
An important special case of such framework^{1}^{1}1These inequalities are the celebrated BrascampLieb inequalities, which capture many more important inequalities such as Hölder’s, LoomisWhitney, and many others. See for instance [GGOW17] for a more detailed discussion. is CauchySchwarz, with .

Algebraic complexity: Given an arithmetic formula (with inversion gates) in noncommuting variables, is it nonzero?

Polynomial support: Given oracle access to a homogeneous polynomial with nonnegative integer coefficients on variables, is a specified monomial (given as integer vector of exponents) in the Newton polytope^{2}^{2}2Given a polynomial , define its support as the set of monomials whose coefficient in is nonzero. The Newton polytope of is given by the convex hull of the exponent vectors of these monomials. of ?
Some of the problems above are in and for others, there are sufficient hints that they are in (see [Hor54, LSW98, BGO17, GGOW17, GGOW16, Gur05]
). While they may seem nonlinear in their inputs, convexity plays an important role in each of them, as they all reduce to solving linear programs (implicitly defined with large number of facets). More specifically, each input to each problem defines a point and a polytope, and the answer is yes iff the point is in the polytope. These polytopes turn out to be special cases of
moment polytopes.This appearance of linearity and convexity is quite surprising, in some settings more so than others. Indeed, moment polytopes arise (and are used to understand problems) in many diverse settings such as symplectic geometry, algebraic geometry, lattice theory and others [GS, dS08]. The snag is that these polytopes are often defined by a huge number of inequalities (e.g. see [GGOW17]); typically the number is exponential or larger in the dimension of the input.^{3}^{3}3However, in many of these areas even finiteness provides progress, as even decidability may not be obvious. This motivates our efforts to develop efficient algorithms for them.
In order to explain the appearance of convex polytopes in these settings, we need to notice another common aspect of all problems above: their answers remain invariant under some group action! This is easy to see in some of the examples, which explicitly specify the groups. In the first, for matrices of size , it is , the group of transformations conjugating the input. In the second, each of the three matrices may be conjugated by a unitary. In the 3rd, it is the product of two (positive) diagonal invertible matrices which scale (resp.) the rows and columns. In the 4th problem, as each party is allowed to perform quantum measurements with post selection, the group representing each party’s operations is if its particle has states, and so the full group is a direct product of these ’s. The 5th problem is invariant to basis changes in the host space and the other spaces.^{4}^{4}4This reveals the BrascampLieb polytopes [BCCT08, GGOW17] as special cases of (slices of) moment polytopes, which have an efficient weak separation oracle.
The 6th is much harder to guess without Cohn’s characterization of the free skew field, but turns out to be
acting on a different representation of the formulas. In the 7th, though it may not seem useful at first sight, acts by simply scaling every variable of the polynomial by a nonzero constant factor.Having mentioned the two common features of the problems above (convexity and the invariance under a group action) we will now illustrate how one can use the structure of the group action in order to obtain moment polytopes. Let be a “nice" ^{5}^{5}5The technical definition requires the group to be algebraic, reductive and connected and so on. But for the purpose of this paper, one can think of groups like , , their direct products etc. group acting linearly and continuously on a vector space and be a point in . The orbit of a point is the set of all vectors obtained by the action of on . The orbit closure of is simply the closure of its orbit in the Euclidean topology. As the previous paragraph observed, all of the problems above are questions about the orbit closures, which suggests understanding orbit closures is a fundamental task with many applications. A natural approach to study such orbit closures is by looking at the infinitesimal action of the group on every point .
This brings us to the moment map, denoted by , which is essentially a gradient of the log of the norm of along the group action.^{6}^{6}6Indeed, the original name was momentum map, and is inspired from Hamiltonian physics, in which momentum is proportional to the derivative of position. Apparently moment maps are common in physics, where they are used to obtain conserved quantities (i.e. invariants) from symmetries of the phase space of symplectic manifolds describing some Hamiltonian system. In the general setting, we have the action of continuous group on a manifold, and the moment map provides a reverse map, from the manifold to the group (or more precisely, to the dual of the Lie algebra of the group). More explicitly, for each point we can define the function , and will be the gradient of evaluated at the identity element of . The moment map carries a lot of information about the group action, and one of its striking features is that the set of possible spectra of the image of any orbit closure under the moment map is a rational convex polytope [Kos73, Ati82a, GS82a, Kir84a]! That is a mouthful. So consider an example to see what we mean. Consider the action of on some vector space . Then the moment map maps to (set of all matrices). Then the collection , as varies over an orbitclosure forms a rational convex polytope. Here denotes the vector of eigenvalues of arranged in decreasing order. Note that is a quadratic function of , so the appearance of convexity is extremely surprising and non trivial. This polytope, which we will more explicitly see in the next section, is the so called moment polytope of the group action on the orbit of .
In the matrix scaling case (Problem 3), it turns out that the moment map applied to a certain matrix gives us precisely the marginals of (that is, the vector of row sums and column sums normalized to sum ).^{7}^{7}7There is a slight technicality here and the moment map is actually the absolute values squared of the entries of . Thus, testing whether can be scaled to another matrix with prescribed row and column sums is equivalent to testing whether the prescribed vector of row and column sums belongs to the moment polytope of the action of on . Similarly, all of the seven problems listed above fit into this framework (membership in moment polytope) for a suitable choice of group and representation.^{8}^{8}8For some of the problems mentioned above, it is nontrivial to phrase them as moment polytopes.
The reader might notice the dual nature of the problems above. They are both of algebraic as well as analytic nature. This phenomenon is extremely general and crucial for our paper. The analytic nature helps in designing algorithms, making the problem amenable to general optimization techniques, while the algebraic helps with analysis of these analytic algorithms and provides potential functions to track progress made by these algorithms. We will see that this will be the case for us as well.
1.2 Our setting
In this paper, we will be concerned with the moment polytopes of a natural “basis" change group action on tensors, which are of interest for several reasons. The moment polytopes in this setting capture fundamental problems in quantum manybody physics  the so called onebody quantum marginal problem. They also capture fundamental problems in representation theory related to Kronecker coefficients, which are central objects of study in algebraic combinatorics and play an important role in geometric complexity theory. Moreover, as we will see, these moment polytopes generalize many of the settings described above and we believe that their complexity is representative of the complexity of general moment polytopes.
These moment polytopes (and their related problems) are most natural to state from the point of view of quantum systems and their quantum marginals ^{9}^{9}9
These generalize the classical notion of marginals of a probability distribution on several variables.
, so we start by defining them first. But before we define quantum systems some brief notation must be established.Let denote the space of dimensional tensors of format , and let be a tensor in . If we regard as a vector, with being it’s conjugate transpose, then is a Hermitian positive semidefinite (PSD) operator on . We will denote by the norm of (when viewed as a vector). With this notation in mind, we then define a quantum system with subsystems as a PSD operator on with unit trace ^{10}^{10}10A reader not familiar with the basics of quantum systems may want to skip a couple of paragraphs ahead..
Given a quantum system on and a subset , we define its (quantum) marginals or reduced density matrices by , where denotes the partial trace over tensor factors . In the same way that describes the state of the entire quantum system, characterizes the state of the subsystems labeled by (in an analogous way to the classical marginal of a probability distribution). For , we write ; these operators are known as the onebody marginals or onebody reduced density matrices of . Each is uniquely characterized by the property that
(1) 
for all operators on .
For a tensor and a given subset of the subsystems, the marginals of with respect to have a particularly simple description: using the standard basis, identify with a matrix , where we denote . The matrix is known as a flattening, unfolding, or matricization [Lan12, Hac12] of the tensor . Then, is its Gram matrix.
Given a Hermitian operator on (i.e., an Hermitian matrix), we write for the vector of eigenvalues of , ordered nonincreasingly. If is PSD with unit trace then its eigenvalues form a probability distribution, so is an element of . We also abbreviate . We will be particularly interested in characterizing the eigenvalues of the onebody marginals, motivated by the following fundamental problem in quantum mechanics [Kly06]:
Problem 1.1 (Onebody quantum marginal problem).
Given , decide if there exists a tensor such that for all .
Remark 1.2.
Note that the above problem is equivalent to the following, given density matrices PSD matrices with unit trace , determine if there exists a tensor (pure state) such that for all . Since a unitary change of basis comes for free on each subsystem, only the eigenvalues of are relevant.
The above problem is extremely fundamental from the point of view of quantum manybody physics. It is a special case of the more general quantum marginal problem, which puts constraints on the marginals of multiple systems and is known to be QMAcomplete (for growing ) [Liu06].
We note that the normalization to trace one is natural; since for all , we can simultaneously rescale all marginals simply by rescaling the tensor.
Now we discuss how Problem 1.1 can be phrased as a question about moment polytopes [NM84, Bri87, Kly06, Res10, VW17, Wal14]. Let , where denotes the group of invertible matrices. Then acts on by
As the group acts by rescaling slices of the tensor, we will call any a tensor scaling of .^{11}^{11}11The extra coordinate with dimension can be equivalently thought of as enumerating an tuple of tensors in and the group acts simultaneously on all the tensors in the tuple. Much of the theory remains similar if one sets and that can be done mentally on a first reading. In the quantum language, it is the difference between acting on pure states vs acting on mixed states .
What is the moment map in this setting? It turns out that the moment captures exactly the notion of onebody quantum marginals. It is more convenient to define the moment map on the projective space (namely restrict ourselves to tensors of unit norm), since we don’t care about the scalar multiples. We will denote the projective space corresponding to by and identify it with the set of rankone traceone PSD operators on , . Then the moment map can be written as^{12}^{12}12After identifying the Lie algebra of with its dual.
(2) 
where denotes the space of Hermitian matrices. Now consider a projective subvariety of such as or an orbitclosure^{13}^{13}13Here, the closure can be taken either in the Euclidean or in the Zariski topology. i.e. for some given tensor .^{14}^{14}14In general can be any stable irreducible projective subvariety of . Let us look at the collection of marginal eigenvalues when restricted to tensors in :
(3) 
We emphasize again the amazing, surprising and nontrivial fact that is a rational convex polytope [NM84, Kir84b, Kir84a, Bri87] – known as the moment polytope or Kirwan polytope of .^{15}^{15}15Note that we have identified with the set of rank density matrices and hence it is far from being a convex set  yet the spectrum of its image under the moment map is convex. This means that can in principle be given in terms of finitely many affine inequalities in eigenvalues of the onebody marginals [Kly06, Res10, VW17]. In particular, the preceding applies to , so we can rephrase Problem 1.1 as follows: Given , is it a point in ? More generally, we can consider the following decision problem:
Problem 1.3 (General moment polytope).
Given , decide if there exists a tensor such that for all .
When is the orbit closure of some given tensor , we will abbreviate the moment polytope by . In quantum information theory, moment polytopes of orbit closures have been called entanglement polytopes as they characterize the multipartite entanglement from the perspective of the onebody marginals [WDGC13, SOK14]. But, along with the corresponding invarianttheoretic multiplicities, they are also of interest in algebraic and geometric complexity theory [BLMW11, BI11, CDW12, CVZ17]. The corresponding decision problem is the following:
Problem 1.4 (Moment polytope of orbit closure).
Given and , decide if there exists such that for all .
That is, Problem 1.4 asks whether is a point in .^{16}^{16}16When , there is a physical interpretation of the orbitclosure. means that can be obtained to arbitrary precision from (which is naturally understood as a partite quantum state) by a class of quantum operations known as stochastic local operations and classical communication (SLOCC) [BPR00]. SLOCC can be intuitively understood as follows: we imagine that different parties hold the different systems of a quantum state; SLOCC then corresponds to a sequence of local quantum operations and measurements, where we allow for postselection on specific measurement outcomes. Problem 1.4 then asks if given a tensor , does there exist a obtainable by a sequence of SLOCC operations from s.t. for all . This is a generalization of the SLOCC entanglement distillation question where
is the uniform distribution for all
.One can show that Problem 1.4 is intimately related to Problems 1.3 and 1.1: iff for a generic (Corollary 2.6). We will explain this in Section 2. We will therefore focus our attention on Problem 1.4.
It is natural to go beyond the decision problem and look for an algorithm that finds a tensor with the desired marginals, as well as the group element that transforms into . Since such an will be in the orbit through , we demand only that the marginals are correct up to some target accuracy.
Definition 1.5 (close).
The marginals of are said to be close to if for . Here, is the norm.
Problem 1.6 (Tensor scaling).
Given , , and , find such that has marginals that are close to .
While it may not be immediately clear, there exist scalings as in Problem 1.6 for any if and only if the answer to Problem 1.4 is yes, i.e., if and only if .
The polytopes admit alternative characterization in terms of invariant theory [NM84]. We explain this connection in Section 2, as it is central to the analysis of our algorithms. For now, we only mention an important special case. Let denote the Kronecker coefficients, which are fundamental objects in the classical representation theory of the symmetric and general linear groups [FH13, Sta00]. They also feature in geometric complexity theory as a potential way of creating representation theoretic obstructions [Mul07, BLMW11]. For example, can be defined as the multiplicity of the irreducible representation in the tensor product . Here, , , and are partitions, which we may think of nonincreasing vectors in with . Then,
(4) 
so the solution to the onebody quantum marginal problem captures precisely the asymptotic support of the Kronecker coefficients [CM06, Kly06, CHM07]. We note that the problem of deciding whether is known to be NPhard [IMW17]. However since the asymptotic vanishing of Kronecker coefficients is captured by the quantum marginal problem, it has been conjectured that it should have a polynomial time algorithm and we make progress towards this question.^{17}^{17}17We note that the closely related LittlewoodRichardson coefficients (which capture the same problem for the representations of the general linear group) satisfy the so called saturation property: iff [KT99]. Hence the asymptotic support is the same as support for this case and this is also a key ingredient in the polynomial time algorithms for testing if [BI12]. Since Kronecker coefficients are so poorly understood, understanding their asymptotic support would also go a long way in understanding them.
1.3 Prior work
As mentioned above, Problem 1.1 can be approached by first computing (the defining inequalities of) the moment polytope . The problem of computing moment polytopes has a long history in mathematics (e.g., [Ati82b, GS82b, GS84, Kir84b, Kir84a, Bri99, BS00, Res10, Bel10]). That the onebody quantum marginal problem falls into this framework was first noticed by Klyachko [Kly04], who gave a complete description of the polytopes in terms of finite lists of inequalities (cf. [DH05, Kly06, AK08]). Before that, only lowdimensional special cases were known [HSS03, Bra03, Fra02]. Further developments include the minimal complete description from [Res10] and the cohomologyfree variant [VW17]. Yet, all these descriptions in terms of inequalities are largely computationally infeasible; explicit descriptions are known up to formats [Kly06] and [VW17], and when all dimensions are two [HSS03].
Problems 1.3 and 1.4 can in principle be approached using classical computational invariant theory (e.g., [Stu08, DK15, WDGC13]), based on the invarianttheoretic description of and degree bounds (we recall both in Section 2). In practice, however, this is completely infeasible except for very small dimensions. The problem of describing also falls into the framework of [Res10], but it is not clear how to turn this into an algorithm. In summary, all the methods described above are computationally expensive and take time at least exponential in the input size.
None of the preceding algebraic methods can be used to solve Problem 1.6, since they only decide membership but do not produce the transformation that produces a tensor with the desired target spectra. This calls for the development of numerical algorithms for Problem 1.6. Curiously this development came stemmed from motivations in algebraic complexity and the PIT problem. The first such algorithm was proposed in [Gur04]. Its complexity analysis, that brought on the connection to invariant theory (and other fields, some mentioned above) was achieved in [GGOW16]. In the language we use here, it deals with (operator scaling) and uniform marginals, and results in polynomial time algorithms for problems in diverse areas discussed there.^{18}^{18}18The underlying algebraic problem associated with operator scaling, namely noncommutative singularity and rank of symbolic matrices found a different, algebraic algorithm in the works of [IQS17, DM15] The operator scaling problem was then extended in two directions, which we mention next: one direction being general values of (tensor scaling) and the other being and arbitrary marginals.
For general , a deterministic algorithm was given in [BGO17] (based on a proposal in [VDDM03] for ). Very recently, a randomized polynomial time algorithm for operator scaling to general marginals was given in [Fra18]. The two papers [BGO17, Fra18] study two different generalizations of the operator scaling problem in [GGOW16]. The present paper completes a natural square by studying a common generalization of the problems studied in [BGO17, Fra18]. All these algorithms can be seen as noncommutative generalizations of the SinkhornKnopp algorithm for ‘matrix scaling‘ [Sin64, SK67].
It was shown recently known that Problem 1.1 is in NPcoNP [BCMW17]. In view of Eq. 4, this should be contrasted with the NPhardness of deciding whether a single Kronecker coefficient is zero or not [IMW17].
1.4 Summary of results
Our main result in this paper is a randomized algorithm for tensor scaling to general marginals (Problem 1.6). As a consequence, we obtain algorithms for all other problems.
Theorem 1.7.
There is a randomized algorithm running in time , that takes as input with Gaussian integer entries (specified as a list of real and complex parts, each encoded in binary, with bit size ) and with rational entries (specified as a list of numerators and denominators, each encoded in binary, with bit size ). The algorithm either correctly identifies that , or it outputs a scaling such that the marginals of are close to the target spectra . Here is the total bitsize of the input, .
As a consequence of Theorem 1.7, we obtain a randomized algorithm for a promise version of the membership Problem 1.4 (and hence for Problem 1.1, see Corollary 2.6).
Corollary 1.8.
There is a randomized algorithm running in time , that takes as input with Gaussian integer entries (specified as a list of real and complex parts, each encoded in binary, with bit size ) and with rational entries (specified as a list of numerators and denominators, each encoded in binary, with bit size ). The algorithm distinguishes between the following two cases:

.

is far (in norm) from any point .
Here is the total bitsize of the input, .
This yields the following corollary.
Corollary 1.9.
There is a randomized algorithm running in time , that takes as input with rational entries (specified as a list of numerators and denominators, each encoded in binary, with bit size ). The algorithm distinguishes between the following two cases:

i.e. there exists such that for all .

is far (in norm) from any point .
As described before, Problem 1.1 captures the asymptotic vanishing of Kronecker coefficients. Hence we get the following corollary which describes a randomized polynomial time algorithm for a promise version of the asymptotic Kronecker problem.
Corollary 1.10.
There is a randomized algorithm running in time , that takes as input three partitions with entries described in binary with bitsize at most . The algorithm distinguishes between the following two cases:

There exists and integer s.t. .

For all s.t. , it holds that is far (in norm) from .
Here denotes the Kronecker coefficient and .
In many applications, the tensor can be more succinctly represented than by its many coordinates. If the representation is preserved by scalings and allows for efficient computation of the marginals, then this yields a useful optimization of Algorithm 1. A prime example of which are the so called matrixproduct states or tensortrain decompositions with polynomial bond dimension [VDDM03, Orú14]. We won’t define these states here (see Section 6 for a formal definition) but we will just say that these have much smaller (exponentially smaller in ) descriptions than specifying all the coordinates of the tensors. This class includes the unit tensors and the matrix multiplication tensors, which are central objects in algebraic complexity theory [BI11, BCS13] and whose moment polytopes are not known!
Theorem 1.11 (Informal).
There is a randomized algorithm running in time , that takes as input a matrixproduct state with input size and with rational entries (specified as a list of numerators and denominators, each encoded in binary, with bit size ). The algorithm either correctly identifies that , or it outputs a scaling such that the marginals of are close to the target spectra .
It is a very exciting open problem to improve the running time dependence on in Corollary 1.8, Corollary 1.9 and Corollary 1.10 to . This would yield randomized polynomial time algorithms for Problem 1.1, Problem 1.4 and the asymptotic Kronecker problem due to the following theorem that we prove in Section 5.
Theorem 1.12 (Minimal gap).
Let be a nonzero tensor. If is a scaling with marginals that are close to , then . Here and is the minimal integer s.t. has integral entries.
An analogous result for the full moment polytope was proven in [BCMW17]. We believe that the inverse exponential bound in the above theorem cannot be improved to an inverse polynomial bound. Therefore developing scaling algorithms with runtime dependence is of paramount importance.
Before describing our algorithm and high level intuition for its analysis, let us describe the algorithm and analysis for a rather special case of matrix scaling, which turns out to very enlightening.
1.5 Simple example: matrix scaling
The matrix scaling problem (Problem 3 in Section 1.1) provides us with a template for what is to come, and understanding the evolution of a particular algorithm for this problem will give us intuition on how to solve the more general tensor scaling problem, and how invariant theory naturally appears.
If one wants to scale a given matrix
to a doubly stochastic matrix (that is, one whose rows and columns each sum to
), a natural algorithm (first proposed in [Sin64]) arises from the fact that the group is a Cartesian product. We can alternately use scalings of the form to normalize the row sums of and scalings of the form to normalize the column sums of .To this end, set to be a diagonal matrix having to be the inverse of the sum of the elements of the row of , and define in a similar way for the columns of . The algorithm can be described as follows: repeatedly (for a polynomial number of iterations) apply the following steps:

Normalize the rows of . That is,

Normalize the columns of . That is, .
If, throughout this process, matrix never gets sufficiently close to a doubly stochastic matrix (in distance), then we will conclude that cannot be scaled to doubly stochastic; otherwise we can conclude that can be scaled to doubly stochastic. The process also gives us a way to obtain the scalings that approach doubly stochastic  while there are multiple algorithms for the decision problem (which turns out to be the bipartite perfect matching problem), not all help find the scalings!
The analysis of this algorithm (from [LSW98]; also see [GY98] for a different potential function) is extremely simple, and follows a three step approach based on
a progress measure .
The following two properties of the potential function will be useful for us.

If is scalable to doubly stochastic, then .

if row or column normalized.
The three step approach then is the following:

[Lower bound]: Initially (wlog we assume is row normalized) ^{19}^{19}19There is some dependency on the bit complexity of the input that we are ignoring..

[Progress per step]: If is row or column normalized and sufficiently far from being doubly stochastic, then normalizing increases . One can explicitly bound the increase using a robust version of the AMGM inequality.

[Upper bound]: is bounded by if is row or column normalized.
This threestep analysis shows that the scaling algorithm is able to solve the doubly stochastic scaling problem in polynomial time. The difficult part of the analysis is coming up with a potential function satisfying the properties above. This is the role played by invariant theory later. A source of good potential functions will turn out to be highest weight vectors
, which are (informally speaking) “eigenvectors” of the action of certain subgroups of the main group action. Note that the permanent is an eigenvector of the action of
since equals for .If we want to solve the more general scaling problem, where we are given a prescribed value for the row and column sums, say as an nonnegative integer vector , the same natural algorithm can be applied. The only change one needs to make in the algorithm above is that we will now normalize the rows of to have sums and the columns to have sum . The analysis is also quite similar: one can choose the potential function, for example, to be the permanent of matrix obtained from by repeating row times and column times. However, the distinction between the uniform and the nonuniform versions of the problems is much starker in our higher dimensional noncommutative setting, as we will see next.
1.6 Techniques and proof overview
Our algorithm and its analysis generalize two recent works [BGO17, Fra18], which in turn generalize the analysis of matrix scaling in Section 1.5. The paper [BGO17] studies the special case when is the uniform distribution (over a set of size ) for all while the paper [Fra18] studies the special case . Our algorithm is a natural common generalization of the algorithms in [Fra18, BGO17] while our analysis generalizes the analysis in [BGO17] replacing the use of invariants with highest weight vectors (we will explain what these are later).
Let us develop some intuition for the algorithm. It is usually the case with scaling problems, as we saw with matrix scaling, and more generally in the framework of alternating minimization, that one of the constraints is easy to satisfy by scaling. The same is true for the problem we have at hand. We are given a tensor . Suppose we want . With the shorthand , we act on by , where the nontrivial element is in the location. This is will satisfy the constraint. Or indeed, one can choose any matrix s.t. and act on by . This will also satisfy the constraint. By choosing each time to fix the index which is “farthest” from its target spectrum, we have defined an iterative algorithm (up to the choice of at each step) that keeps on alternately fixing the constraints. It turns out that this algorithm works (for any choice of at each step!) when ’s are all uniform and converges in a polynomial number of iterations [BGO17].
Interestingly, the choice of that works for general ’s is that of upper triangular matrices!^{20}^{20}20This choice works for all
’s. We don’t know if this choice of uppertriangularity is necessary. There is also a nice interpolation between the case of uniform
’s and ’s with distinct entries. See Section 6.4. This was the choice made in [Fra18] as well. This restriction on scaling factors will make the analysis more complicated as we shall soon see. One intuitive reason for the difference between the uniform and the general case is the following: in the general case, we made an arbitrary decision to try to scale to have marginals while we could have chosen to scale it to any s.t. . This choice of basis is not present in the uniform case since all bases are the same!This restriction on scaling factors creates another problem: it disconnects the orbit space (see example below). Thus, we need to initialize the algorithm with a random basis change of the given input, and only then resume the restricted scaling. This idea is used as well in [Fra18]. We explain, via an example, why this random basis change (or at least a “clever” basis change) is needed at the start of the algorithm. Consider the diagonal unit tensor , where iff . It is easy to see that without the initial randomization, the algorithm (which chooses an upper triangular at each step) would only produce diagonal tensors ( iff ). And the marginals of any such tensor are isospectral. On the other hand, the orbit of is dense in and so . In particular, can be scaled to tensors with nonisospectral marginals.
The algorithm is described as Algorithm 1. The following is the main theorem regarding the analysis of Algorithm 1 from which Theorem 1.7 follows up to an analysis of the bit complexity of Algorithm 1.
Theorem 1.13 (Tensor scaling).
Let be a (nonzero) tensor whose entries are Gaussian integers of bitsize no more than . Also, let with rational entries of bitsize no more than such that for all . Finally, let .
Then, with probability at least 1/2, Algorithm 1 either correctly identifies that , or it outputs such that the marginals of are close to . In fact, we have
(5) 
in the latter case, where is the trace norm.
Remark 1.14.
To analyze our algorithm and prove Theorem 1.13, we follow a threestep argument similar to the analysis in Section 1.5. This has been used to great effect for operator scaling and tensor scaling in [Gur04, GGOW16, BGO17, Fra18] after identifying the appropriate potential function.
As we described in Section 1.5, the appropriate potential functions to choose are the ones which are eigenvectors of an appropriate group action. In the matrix scaling case, we were acting by and hence we chose the potential function to be permanent which is an eigenvector for this group action. In our algorithm, we are acting by the group corresponding to (direct products of) upper triangular matrices (this is known as the Borel subgroup). So for us, the right potential functions to consider are functions which are eigenvectors for the action of (tuples of) upper triangular matrices. One such class of functions are the so called highest weight vectors from representation theory^{22}^{22}22Here we restrict our attention to the action on polynomials because that is what we need to describe the intuition for the analysis of the algorithm. But the discussion of weight vectors applies to arbitrary (rational) representations of the group , see Section 2.1., which we come to next.
What are highest weight vectors? We have the action of on . Let us consider the space of degree polynomial functions on , denoted by . The action of on induces an action of on given by . Consider a tuple of vectors , . Then we say that is a highest weight vector with weight if
for all such that is an upper triangular matrix for each . Note that this necessitates for each . This also necessitates (not trivial to see why) that for all , .
The following two properties of highest weight vectors will be crucial for our analysis:

[[NM84], see Theorem 2.4]: Let be a rational vector. Then iff there exists an integer s.t. has integer entries and there exists a highest weight vector with weight s.t. for some . Here . This extends a fact used in previous papers: the uniform vector is is iff some invariant polynomial does not vanish on .

[Proposition 2.9] The space of highest weight vectors with weight is spanned by polynomials with integer coefficients that satisfy the following bound
(6) This extends an identical bound in past papers from invariant polynomials to highest weight vectors.
We use classical constructions of highest weight vectors [Pro07, BI13, BGO17] to derive the second fact. These constructions are only semiexplicit (e.g. it is not clear if they can be evaluated efficiently), however they suffice for us because we only need a bound on their evaluations for their use as a potential function. We note that such bounds on their evaluations haven’t been observed before in the invariant theory literature (except in [BGO17] for the special case of invariants) whereas for us they are extremely crucial! We also emphasize that it is crucial for us that the bound is singly exponential in . Some naive strategies of using solution sizes for linear systems only yield bounds that are doubly exponential in .
The potential function we use is . Here is some highest weight vector of degree (for some ), integer coefficients and weight that satisfies as well as Eq. 6. Such a exists by the discussion above. Using these properties, a threestep analysis, similar to the one in Section 1.5, follows the following outline.

[Lower bound]: Since for some , therefore for a random choice of , . Furthermore, since we choose to have integer coefficients, . After the normalization in Step (2), we get . It is not hard to see that .

[Progress per step]: increases at each step. Furthermore, if the current spectrum are “far" from the target spectrum, then one can explicitly bound the increase. Here the highest weight vector property of as well as Pinsker’s inequality from information theory play an important role.

[Upper bound]: always. This follows from Eq. 6 and the fact that we maintain the unit norm property of after the normalization in Step (2) of the algorithm.
These three steps imply that in a polynomial number of iterations, one should get close to the target spectrum. A complete analysis is presented in Section 3.2. Note that to ensure that we only use a polynomial amount of random bits for the initial randomization, we need the highest weight vectors to have degree at most exponential in the input parameters. This is achieved by relying on Derksen’s degree bounds [Der01] (see Proposition 2.5).
1.7 Additional discussion
We would like to point out two important distinctions between the analysis for matrix scaling in Section 1.5 and our analysis here. First is that, as we have seen, there is a major difference between the uniform and the nonuniform versions of our problem  while this was not the case for matrix scaling. This phenomenon is general and is a distinction between commutative and noncommutative group actions. It has to do with the fact that all irreducible representations of commutative groups are onedimensional, whereas for noncommutative groups they are not. Secondly, in the matrix scaling analysis, the upper bound was easy to obtain as well. Whereas for us, the upper bound step is the hardest and requires the use of deep results in representation theory. The upper bound steps were the cause of main difficulty in the papers [GGOW16, BGO17, Fra18] as well ^{23}^{23}23In some of the papers, lower bound is the hard step, due to the use of a dual kind of potential function. and this is one key point of distinction between commutative and noncommutative group actions.
We believe that our framework of using the highest weight vectors as potential functions for the analysis of analytic algorithms is the right way to approach moment polytope problems  even beyond the cases that we consider in this paper.
The approach taken in [Fra18] (for the case of ) is one of reducing the nonuniform version of the problem to the uniform version, which was solved in [GGOW16] for the case of (the reduction in [GGOW17] is a simple special case of the reduction in [Fra18]). The reduction is complicated and a bit ad hoc. We generalize this reduction to our setting () in Section 4, and providing a somewhat more principled view of the reduction along the way. However, it still seems rather specialized and mysterious compared to the general reduction in geometric invariant theory from the “nonuniform” to the “uniform” case (also known as the shifting trick, see Section 2.2).
We also note that applying the results of [BGO17] to the reduction in Section 4 in a blackbox manner does not yield our main theorem (Theorem 1.7)  the number of iterations would be exponential in the bitcomplexity of , and we would even require an exponential number of bits for the randomization step! To remedy these issues with the reduction in Section 4 one must delve in to the relationship between the reduction and the invariant polynomials. We will see, by fairly involved calculation, that invariant polynomials evaluated on the reduction will result in the same construction of highest weight vectors anyway. This teaches us two lessons:

Highest weight vectors are the only suitable potential functions in sight. Though it may have other conceptual benefits, the reduction in Section 4 is no better than the shifting trick for the purpose of obtaining potential functions!

We had to look at the construction of highest weight vectors in Section 2.4 before calculating them from the reduction  the calculation might not have been so easy a priori! Again, the classical construction of highest weight vectors saves the day.
It is interesting to discuss some of the salient features and possible variations of Algorithm 1 (we expand on these points in the main text):

Iterations and randomness. The algorithm terminates after at most iterations and uses bits of randomness. For fixed or even inverse polynomial , this is polynomial in the input size. In fact, this is better than the number of iterations in [Fra18]: there, the number of iterations also depended on .

Bit complexity: To get an algorithm with truly polynomial run time, one needs to truncate the group elements ’s up to polynomial number of bits after the decimal point. We provide an explanation on why this doesn’t affect the performance of the algorithm in Section 3.3.

Degenerate spectra. If is degenerate, i.e. for some , then we may replace the Cholesky decomposition in step 3 by into two block upper triangular matrices, where the block sizes are the degeneracies  the set of such matrices is a socalled parabolic subgroup of the general linear group (Section 6.4
). Moreover, the random matrix
need only be generic up to action of the parabolic subgroup. In particular, when scaling to uniform spectra then no randomization is required and we can use Hermitian square roots, so Algorithm 1 reduces to the uniform tensor scaling algorithm of [BGO17]. 
Singular spectra. As written, Item 3 of Algorithm 1 fails if the spectra are singular, that is if for some we have . However, in this case, one may first pass to a smaller tensor tensor obtained by restricting the index to the last coordinates. We’ll show in Section 3.4 that is scalable by upper triangulars to marginals , if and only if is scalable by upper triangulars to , .
We discuss extensions of Algorithm 1 for more general varieties with “good” parametrizations in Section 6.
1.8 Conclusions and open problems
We provide an efficient weak membership oracle for moment polytopes corresponding to a natural class of group actions on tensors. This generalizes recent works on operator and tensor scaling and also yields efficient algorithms for promise versions of the onebody quantum marginal problem and the asymptotic support of Kronecker coefficients. Our work leaves open several interesting questions some of which we state below.

Improve the dependency on error in the running time of Algorithm 1 to . As discussed, this will immediately yield polynomial time algorithms for the onebody quantum marginal problem. This is open even for the uniform version of the problem. Here the notion of geodesic convexity of certain “capacity" optimization problems should play a key role (e.g. see [AZGL18]).

Develop separation oracles for moment polytopes. A related question is: can we optimize over moment polytopes? This will have algorithmic applications on the problem of computing quantum functionals, as described in [CVZ17]. In this paper, Strassen’s support functionals are generalized to quantum functionals, which are defined by convex optimization over the entanglement polytope. Thus, separation oracles for moment polytopes could lead to efficient algorithms for computing quantum functionals, which are important for comparing tensor powers (see [Str86, Str88]).

Find natural instances of combinatorial optimization problems which can be encoded as moment polytopes. Some examples can be found in
[GGOW17].
1.9 Roadmap of the paper
In Section 2, we present results from geometric invariant theory and explain how they can be made quantitative. We use this in Section 3, where we analyze the proposed tensor scaling algorithm. In
Section 4, we explain how the reduction in [Fra18] can be naturally understood in the framework of this paper. In Section 5, we show a lower bound on the distance to the moment polytope of any rational point not contained in it. This lower bound depends only on the description of the rational point and the dimension of our tensor space , and it allows us to solve membership problems by using the tensor scaling algorithm. In Section 6, we extend our algorithm to general varieties and degenerate spectra. In Appendix A.1 we discuss the Borel polytope, providing an alternate proof that it is in fact a rational polytope.
Acknowledgements
We would like to thank Shalev Ben David, Robin Kothari, Anand Natarajan, Frank Verstraete, John Watrous, John Wright, and Jeroen Zuiddam for interesting discussions.
PB is partially supported by DFG grant BU 1371 22. CF is supported in part by Simons Foundation award 332622. MW acknowledges financial support by the NWO through Veni grant no. 68047459. AW is partially supported by NSF grant CCF1412958.
2 Geometric invariant theory
In this section, we present some results from geometric invariant theory that will feature centrally in the analysis of our algorithm in Section 3. While stated for tensors, all results in this section can easily be extended to arbitrary rational representations of connected complex reductive algebraic groups. Most of the results are well known and only some are new. All previously known results will be cited with references and we will make sure to highlight the new components. Section 2.1 discusses basics of the highest weight theory. Section 2.2 gives a formal definition of the moment map and also discusses the so called “shifting trick" that reduces the problem of membership in moment polytopes to a null cone problem. Section 2.3 considers degree bounds for highest weight vectors which are used to bound the initial randomness used in Algorithm 1. Section 2.4 recalls a classical construction of highest weight vectors and uses this construction to prove bounds on their evaluations (crucial in the analysis of Algorithm 1). Section 2.5 develops a necessary and sufficient condition for Borel scalability (i.e., scaling using tuples of uppertriangular matrices).
As before, let , , , and a stable irreducible projective subvariety (e.g., an orbit closure).
2.1 Highest weight theory
We first recall the representation theory of (see, e.g., [FH13] for an introduction). Let be a finitedimensional representation, equipped with a invariant inner product. Let denote the subgroup consisting of invertible diagonal matrices, called the maximal torus of . Since is commutative, its action can be jointly diagonalized. Thus, any finitedimensional representation can be written as a direct sum of socalled weight spaces, , where acts on any vector as for all . Here, is an integer vector and