Let be any characteristic zero field and let be a univariate polynomial. This work concerns the study of expressions of as a linear combination of powers of affine forms.
We consider expressions of of the form:
with , . We denote by the minimum value such that there exists a representation of the previous form with terms.
This model was already studied in , where we gave explicit examples of polynomials of degree requiring terms for the field .
The main goal of this work is to design algorithms that reconstruct the optimal representation of polynomials in this model, i.e., algorithms that receive as input and compute the exact value and a set of triplets of coefficients, nodes and exponents such that . We assume that is given in dense representation, i.e., as a tuple of elements of .
Model 1.1 extends two already well-studied models. The first one is the Waring model, where all the exponents are equal to the degree of the polynomial, i.e., for all .
For a polynomial of degree , we consider expressions of of the form:
with . We denote by the Waring rank of , which is the minimum value such that there exists a representation of the previous form with terms.
Waring rank has been studied by algebraists and geometers since the 19th century. The algorithmic study of Model 1.2 is usually attributed to Sylvester. We refer to  for the historical background and to section 1.3 of that book for a description of the algorithm (see also Kleppe  and Proposition 46 of Kayal ). Most of the subsequent work was devoted to the multivariate generalization111In the literature, Waring rank is usually defined for homogeneous polynomials. After homogenization, the univariate model 1.2 becomes bivariate and the “multivariate generalization” therefore deals with homogeneous polynomials in 3 variables or more. of Model 1.2, with much of the 20th century work focused on the determination of the Waring rank of generic polynomials [1, 6, 14]. A few recent papers [20, 5] have begun to investigate the Waring rank of specific polynomials such as monomials, sums of coprime monomials, the permanent and the determinant.
The second model that we generalize is the Sparsest Shift model, where all the shifts are required to be equal.
For a polynomial , we consider expressions of of the form:
with . We denote by the minimum value such that there exists a representation of the previous form with terms.
This model and its variations have been studied in the computer science literature at least since Borodin and Tiwari . Some of these papers deal with multivariate generalizations [12, 9], with “supersparse” polynomials222In that model, the size of the monomial is defined to be instead of as in the usual dense encoding.  or establish condition for the uniqueness of the sparsest shift . It is suggested at the end of  to allow “multiple shifts” instead of a single shift, and this is just what we do in this paper. More precisely, as is apparent from Model 1.1, we do not place any constraint on the number of distinct shifts: it can be as high as the number of affine powers. It would also make sense to place an upper bound
on the number of distinct shifts. This would provide a smooth interpolation between the sparsest shift model (where) and Model 1.1, where .
1.1 Our results
We provide both structural and algorithmic results. Our structural results are presented in Section 3. We compare the expressive power of our 3 models: sums of affine powers, sparsest shift and the Waring decomposition. Namely, we show that some polynomials have a much smaller expression as a sum of affine powers than in the sparsest shift or Waring models. Moreover, we show that the Waring and sparsest shift models are “orthogonal” in the sense that (except in one trivial case) no polynomial can have a small representation in both models at the same time. We also show that some real polynomials have a short expression as a sum of affine powers over the field of complex numbers, but not over the field of real numbers. Finally, we study the uniqueness of the optimal representation as a sum of affine powers. It turns out that our reconstruction algorithms only work in a regime where the uniqueness of optimal representations is guaranteed.
As already explained, we present algorithms that find the optimal representation of an input polynomial . We achieve this goal in several cases, but we do not solve the problem in its full generality. One typical result is as follows (see Theorem 4.4 in Section 4 for a more detailed statement which includes a description of the algorithm).
Let be a polynomial that can be written as
where the constants are all distinct, , and . Assume moreover that for all , where denotes the number of indices such that .
Then, . Moreover, there is a polynomial time algorithm that receives as input and computes the -tuples of coefficients , of nodes and exponents .
From the point of view of the optimality of representations, it is quite natural to assume an upper bound on the numbers . Indeed, if there is an index such that then the powers are linearly dependent, and there would be a smaller expression of as a linear combination of these polynomials.333It is hardly more difficult to show that one must have for any optimal expression, see [8, Proposition 18]. We would therefore have instead of . It would nonetheless be interesting to relax the assumption in this theorem. Another restriction is the assumption that the shifts are all distinct. We relax that assumption in Section 5 but we still need to assume that all the exponents corresponding to a given shift belong to a “small” interval (see Theorem 5.3 for a precise statement). Alternatively, we can assume instead that there is a large gap between the exponents in two consecutive occurences of the same shift as in Theorem 5.8.
In Section 6 we extend the sum of affine powers model to several variables. We consider expressions of the form
where , and is a (non constant) linear form for all . This is clearly a generalization of the univariate model 1.1 and of multivariate Waring decomposition. Work on multivariate sparsest shift has developed in a different direction: one idea  has been to transform the input polynomial into a sparse polynomial by applying a (possibly) different shift to each variable. The model from  is more general than , and we do not generalize any of these two models. Our algorithmic strategy for reconstructing expressions of the form (1) is to transform the multivariate problem into univariate problems by projection, and to “lift” the solution of different projections to the solution of the multivariate problem. This can be viewed as an analogue of “case 1” of Kayal’s algorithm for Waring decomposition [17, Theorem 5].
1.2 Main tools
Most of our results444The structural results about real polynomials from Section 3.1 rely instead on Birkhoff interpolation . hinge on the study of certain differential equations satisfied by the input polynomial . We consider differential equations of the form
where the ’s are polynomials. If the degree of is bounded by for every , we say that (2) is a Shifted Differential Equation (SDE) of order and shift . Section 2 recalls some (mostly standard) background on differential equations and the Wronskian determinant.
When is a polynomial with an expression of size in Model 1.1 we prove in Proposition 2.6 that satisfies a “small” SDE, of order and shift zero. The basic idea behind our algorithms is to look for one of these “small” SDEs satisfied by , and hope that the powers in an optimal decomposition of satisfy the same SDE. This isn’t just wishful thinking because the SDE from Proposition 2.6 is satisfied not only by but also by the powers .
Unfortunately, this basic idea by itself does not yield efficient algorithms. The main difficulty is that could satisfy several SDE of order and shift 0. By Remark 2.7 we can efficiently find such a SDE, but what if we don’t find the “right” SDE, i.e., the SDE which (by Proposition 2.6) is guaranteed to be satisfied by and by the powers ? One way around this difficulty is to assume that the exponents are all sufficiently large compared to . In this case we can show that every SDE of order and shift which is satisfied by is also satisfied by . This fact is established in Corollary 4.2, and yields the following result (see Theorem 4.3 in Section 4 for a more detailed statement which includes a description of the algorithm).
Theorem 1.5 (Big exponents).
Let be a polynomial that can be written as
where the constants are all distinct, and . Then, . Moreover, there is a polynomial time algorithm that receives as input and computes the -tuples of coefficients , of nodes and exponents .
The algorithm of Theorem 1.4 is more involved: contrary to Theorem 1.5, we cannot determine all the terms in a single pass. Solving the SDE only allows the determination of some (high degree) terms. We must then subtract these terms from , and iterate.
In the first version of this paper,555arxiv.org/abs/1607.05420v1 instead of a we used a SDE of order and shift originating from the Wronskian determinant (compare the two versions of Proposition 2.6). Switching to the new SDE led to significant improvements in most of our algorithmic results. For instance, in the first version of Theorem 1.5 the exponents had to satisfy the condition instead of the current (less stringent) condition .
1.3 Models of computation
Our algorithms take as inputs polynomials with coefficients in an arbitrary field of characteristic 0. At this level of generality, we need to be able to perform arithmetic operations (additions, multiplications) and equality tests between elements of . When we write that an algorithm runs in polynomial time, we mean that the number of such steps is polynomial in the input size. This is a fairly standard setup for algebraic algorithms (it is also interesting to analyze the bit complexity of our algorithms for some specific fields such as the field of rational numbers; more on this at the end of this subsection and in Section 1.4). An input polynomial of degree is represented simply by the list of coefficients of its monomials, and its size thus equals . In addtion to arithmetic operations and equality tests, we need to to be able to compute roots of polynomials with coefficients in . This is in general unavoidable: for an an optimal decomposition of in Model 1.1, the coefficients may lie in an extension field of (see Section 3 and more precisely Example 3.3 in Section 3.1 for the case ). If the optimal decomposition has size , we need to compute roots of polynomials of degree at most .666Except in the algorithm of Theorem 5.3, where we need to compute roots of polynomials at most . Here is a parameter of the algorithm, see Theorem 5.3 for details. As a rule, root finding is used only to output the nodes of the optimal decomposition,777Once the ’s have been determined, we also need to do some linear algebra computations with these nodes to determine the coefficients . but the “internal working” of our algorithms remains purely rational (i.e., requires only arithmetic operations and comparisons). This is similar to the symbolic algorithm for univariate sparsest shifts of Giesbrecht, Kaltofen and Lee (, p. 408 of the journal paper), which also needs access to a polynomial root finder.
The one exception to this rule is the algorithm of Theorem 1.4. As mentioned at the end of Section 1.2, this is an iterative algorithm. At each step of the iteration we have to compute roots of polynomials (which may lie outside ), and we keep computing with these roots in the subsequent iterations. For more details see Theorem 4.4 and the discussion after that theorem. We make a first step toward removing root finding from the internal working of this algorithm in Proposition 4.5.
We also take some steps toward the analysis of our algorithms in the bit model of computation. We focus on the algorithm of Theorem 1.4 since it is the most difficult to analyze due to its iterative nature. We show in Proposition 4.6 that for polynomials with integer coefficients, this algorithm can be implemented in the bit model to run in time polynomial in the bit size of the output. We do not have a polynomial running time bound as a function of the input size (more on this in Section 1.4).
1.4 Future work
One could try to extend the results of this paper in several directions. For instance, one could try to handle “supersparse” polynomials like in the Sparsest Shift algorithm of . The multivariate case would also deserve further study. As explained above we proceed by reduction to the univariate case, but one could try to design more “genuinely multivariate” algorithms. For Waring decomposition, such an algorithm is proposed in “case 2” of [17, Theorem 5]. Its analysis relies on a randomness assumption for the input (our multivariate algorithm is randomized, but in this paper we never assume that the input polynomial is randomized).
One should also keep in mind, however, that the basic univariate problem studied in the present paper is far from completely solved: our algorithms all rely on some assumptions for the exponents in a decomposition of , and some algorithms also rely on a distinctness assumption for the shifts . It would be very interesting to weaken these assumptions, or even to remove them entirely. With a view toward this question, one could first try to improve the lower bounds from . Indeed, the same tools (Wronskians, shifted differential equations) turn out to be useful for the two problems (lower bounds and reconstruction algorithms) but the lower bound problem appears to be easier. For real polynomials we have already obtained optimal lower bounds in  using Birkhoff interpolation, but it remains to give an algorithmic application of this lower bound method.
Another issue that we have only begun to address is the analysis of the bit complexity of our algorithms. It would be straightforward to give a polynomial bit size bound for, e.g., the algorithm of Theorem 4.3 but this issue seems to be more subtle for Theorem 1.4 due to the iterative nature of our algorithm. It is in fact not clear that there exists a solution of size polynomially bounded in the input size (i.e., in the bit size of given as a sum of monomials). More precisely, we ask the following question.
We define the dense size of a polynomial as . Assume that can be written as
with , , and that this decomposition satisfies the conditions of Theorem 1.4: the constants are all distinct, and for all , where denotes the number of indices such that .
Is it possible to bound the bit size of the constants by a polynomial function of the dense size of ?
As explained at the end of Section 1.3, under the same conditions we have a decomposition algorithm that runs in time polynomial in the bit size of the output. It follows that the above question has a positive answer if and only if there is a decomposition algorithm that runs in time polynomial in the bit size of the input (i.e., in time polynomial in the dense size of ).
One could also ask similar questions in the case where the conditions of Theorem 1.4 do not hold. For instance, assuming that has an optimal decomposition with integer coefficients, is there such a decomposition where the coefficients are of size polynomial in the size of ?
In this section we present some tools that are useful for their algorithmic applications in Sections 4 and 5. Section 3 can be read independently, except for the proof of Proposition 3.11 and Theorem 5.3 which use the Wronskian.
2.1 The Wronskian
In mathematics the Wronskian is a tool mainly used in the study of differential equations, where it can be used to show that a set of solutions is linearly independent.
Definition 2.1 (Wronskian).
For univariate functions , which are times differentiable, the Wronskian is defined as
It is a classical result, going back at least to , that the Wronskian captures the linear dependence of polynomials in .
For , the polynomials are linearly dependent if and only if the Wronskian vanishes everywhere.
For every and every we denote by the multiplicity of as a root of , i.e., is the maximum such that divides . The following result from  gives a Wronskian-based bound on the multiplicity of a root in a sum of polynomials.
Let be some linearly independent polynomials and , and let . Then:
where is finite since .
In  one can find several properties concerning the Wronskian (and which have been known since the 19 century). In this work we will use the following properties, which can be easily derived from those of . For the sake of completeness we include a short proof.
Let be linearly independent polynomials and let . If with and for all , then divides . Moreover, if , then
Hence, if we set , then
Proof. Consider the Wronskian matrix whose -th entry is with , . Since divides , then , for some of degree . Since divides every element in the -th column of , we can factor it out from the Wronskian. This proves that divides . Once we have factored out for all , we observe that , where is the determinant of a matrix whose -th entry has degree for all and . Hence, Finally, we observe that if :
For , the upper bound for follows directly from Lemma 2.3.
We observe that the result holds when some of the .
2.2 Shifted Differential Equations
A Shifted Differential Equation (SDE) is a differential equation of the form
where is the unknown function and the are polynomials in
The quantity is called the order of the equation, and the quantity is called the shift. We will usually denote such a differential equation by .
One of the key ingredients for our results is that if is small, then satisfies a “small” SDE. More precisely:
Let and let be written as
where , and for all .
Then, satisfies a which is also satisfied by the terms . In particular, if , then satisfies a .
Proof. If we can find a which is satisfied by all the , by linearity the same SDE will be satisfied by and the theorem will be proved. The existence of this common SDE is equivalent to the existence of a nonzero solution for the following linear system in the unknowns :
where , and . There are unknowns, so we need to show that the matrix of this linear system has rank smaller than . It suffices to show that for each fixed value of , the subsystem:
has a matrix of rank . In other words, we have to show that the subspace spanned by the polynomials has dimension less than . But is included in the subspace spanned by the polynomials
This is due to the fact that the polynomials and belong respectively to the spans of the polynomials and We conclude that .
A polynomial satisfies a if and only if the polynomials are linearly dependent over . The existence of such a SDE can therefore be decided efficiently by linear algebra, and when a exists it can be found explicitly by solving the corresponding linear system (see, e.g., [23, Corollary 3.3a] for an analysis of linear system solving in the bit model of computation). We use this fact repeatedly in the algorithms of Sections 4 and 5.
In this paper we will use some results concerning the set of solutions of a SDE. They are particular cases of properties that apply to linear homogeneous differential equations.
The set of polynomial solutions of a SDE of order is a vector space of dimension at most
is a vector space of dimension at most.
Given two SDE of order :
we say that they are equivalent if for all . The following result can be found in [22, Property 61]. We include a short proof.
For any set of -linearly independent polynomials , there exists a unique SDE (up to equivalence) of order satisfied simultaneously by all the ’s.
Proof. Suppose there exist two different SDE satisfied by , namely:
Then, we set for all . By definition we have that and we aim at proving that for all . Assume that there exists such that . Then, the following SDE
has order and is satisfied by , a contradiction to Lemma 2.8.
3 Structural results
In this section we compare the expressive power of our 3 models: sums of affine powers, sparsest shift and the Waring decomposition. We will see in Section 3.2 that some polynomials have a much smaller expression as a sum of affine powers than in the sparsest shift or Waring models. Moreover, we show that the Waring and sparsest shift models are “orthogonal” in the sense that (except in one trivial case) no polynomial can have a small representation in both models at the same time.
We begin this investigation of structural properties with the field of real numbers, where an especially strong version of orthogonality holds true. We also show that some real polynomials have a short expression as a sum of affine powers over the field of complex numbers, but not over the field of real numbers. This observation has algorithmic implications: given a polynomial , we may have to work in a field extension of to find the optimal representation for . These “real” results can be derived fairly quickly from results in our previous paper . We then move to arbitrary fields of characteristic zero in Section 3.2. Finally, we study the uniqueness of optimal representations in Section 3.3. It turns out that the algorithms of Sections 4 and 5 only work in a regime where the uniqueness of optimal representations is guaranteed.
3.1 The real case
In  the authors considered polynomials with real coefficients and proved the following result.
[8, Theorem 13] Consider a polynomial identity of the form:
where the are distinct constants, the constants are not all zero, the and are arbitrary constants, and for every . Then, we must have .
Let be a polynomial of the form:
For every we denote by the number of exponents smaller than , i.e., .
If for all , then . Moreover, if for all then (3) is the unique optimal expression for .
Proof. Suppose that can be written in another way
with . Set and denote by (respectively, ) the index such that (respectively, ). Note that one of the two indices will be equal to 0 if the exponent appears only in one of the two expressions (3) and (4).
We can rewrite this as
with , and .
To prove the first assertion, let us assume that for all . Assume also for contradiction that and . By Theorem 3.1, we must have . The upper bounds on and imply . However we have from our assumption that , which contradicts the previous inequality. This shows that , i.e., if then the highest degree terms are the same. Continuing by induction, we find that all the terms in the two expressions are equal. In particular we would have , a contradiction. This shows that , i.e., that .
To prove the second assertion, let us now assume further that for all . Assume also that . By Theorem 3.1, either or . In the second case, the upper bounds on and imply that . This is in contradiction with the assumption that . We conclude that that must be equal to 0, i.e., the highest degree terms are the same. Continuing by induction, we obtain that all the terms of the two decompositions are equal, thus showing that (3) is the unique optimal expression for in this model.
Let be a field extension of . Theorem 1 in  shows that whenever the value is "small", then it is equal to ; more precisely, if then . This is no longer the case for the Affine Power model as the following example shows.
As a consequence of Theorem 3.1 we can easily derive the following result.
Let be a polynomial of degree . Either for some (and ), or the following holds:
Proof. We set and and assume that . We write in two different ways:
where the are all distinct, and . Let us move the term to the left hand side of the equation. We then have two cases to consider:
if for all , we have terms on the left hand side of the equation and terms on the right hand side. Theorem 3.1 shows that .
If for some , we have or terms on the left hand side of the equation and terms on the right hand side. By Theorem 3.1, .
Consider the degree polynomial
We observe that and . Hence, the inequality in Corollary 3.4 is optimal up to one unit.
A similar proofs to that of Corollary 3.4 yield the following result:
Let be a polynomial of degree . Either or the following inequality holds:
3.2 Fields of characteristic zero
We now switch from the real field to an arbitrary field of characteristic zero. By definition we have and for any polynomial . We show in Example 3.7 that there are polynomials such that is much smaller than both and .
We first make some basic observations about Sparsest Shift. For any , the polynomials are linearly independent, hence can be uniquely expressed as where . Consider such a decomposition for , and let be the number of nonzero terms. It follows that the derivatives with admit as a common root.
For every , we consider the polynomial . It is easy to check that for all . By [5, Proposition 3.1] we have that if with , then ; and thus we get that .
One can easily check that for every , the polynomials and do not share a common root. Consider a decomposition of in the sparsest shift model. By the above observations, for any pair of consecutive coefficients in this decomposition at least one of the 2 coefficients is nonzero. This implies that .
In the remainder of Section 3.2 we give (in Proposition 3.9) a weaker version of Corollary 3.4 that works for any field of characteristic zero. Moreover, for we provide a family of polynomials showing that the bound from Proposition 3.9 is sharp.
Let , , and let be distinct constants. If , then the set of polynomials
is linearly independent.
Let be a polynomial of degree . Either for some (and ), or the following holds:
Proof. We set and and assume that . We express in two different ways:
with all distinct and . First, we are going to prove that for all . Indeed, if there exists such that , then we set and differentiate the previous equality times to obtain
where for all . From this equality, we deduce that the set
is linearly dependent. However,
The polynomials on the right-hand side are of degree at most , and they are linearly independent by Jordan’s lemma. This is a contradiction since is linearly dependent. We have proved that for all , and we conclude that
3.3 Uniqueness results for sums of affine powers
The following result is an analogue of Theorem 3.1 for polynomials with coefficients over , where is any field of characteristic zero.
Consider a polynomial identity of the form:
where the are distinct, the are not all zero, are arbitrary, and for every . Then we must have .
Proof. We assume