Given a perfect field of characteristic , we address the following question: how quickly can one compute the th coefficient of an algebraic power series
where is assumed to be a large positive integer? This question was recognized as a very important one in complexity theory, as well as in various applications to algorithmic number theory: Atkin-Swinnerton-Dyer congruences, integer factorization, discrete logarithm and point-counting [10, 3].
As such, the question is rather vague; both the data structure and the computation model have to be stated more precisely. The algebraic series will be specified in as some root of a polynomial in , of degree and of height . To do this specification unequivocally, we will make several assumptions. First, we assume that is separable, that is and its derivative are coprime in . Second, we assume that is irreducible111The first assumption is not always implied by the second one, as exemplified by , and in general by any irreducible polynomial in . in . Note that both assumptions are satisfied if is assumed to be the minimal polynomial of and that irreducibility implies separability as soon as we know that has at least one root in . The polynomial might have several roots in . In order to specify uniquely its root , we further assume that we are given a nonnegative integer together with in such that
In other words, the data structure used to represent is the polynomial together with the initial coefficients . (Actually coefficients are enough to ensure the uniqueness of . However coefficients are needed to ensure its existence; for this reason, we will always assume the coefficients of are given up to index .) We observe that it is always possible to choose less than or equal to the -adic valuation of the -resultant of and , hence a fortiori .
Under these assumptions, the classical Newton iteration  allows the computation of the first coefficients of in quasi-linear complexity . Here, and in the whole article (with the notable exception of Section 4), the algorithmic cost is measured by counting the number of basic arithmetic operations and applications of the Frobenius map () and of its inverse () in the ground field . The soft-O notation indicates that polylogarithmic factors in the argument are omitted. Newton’s iteration thus provides a quasi-optimal algorithm to compute . A natural and important question is whether faster alternatives exist for computing the coefficient alone.
With the exception of the rational case (), where the th coefficient can be computed in complexity by binary powering , the most efficient algorithm currently known to compute in characteristic has complexity . It relies on baby step / giant step techniques, combined with fast multipoint evaluation.
Surprisingly, in positive characteristic , a radically different approach leads to a spectacular complexity drop to . However, the big-O term hides a (potentially exponential) dependency in
. The good behavior of this estimate with respect to the indexresults from two facts. First, if the index is written in radix as , then the coefficient is given by the simple formula
where the () are the section operators defined by
-vector space under the action of the section operators. Consequently, with respect to a fixed-basis of this vector space, one can express as a column vector , the section operators as square matrices (), and the evaluation at as a row vector . Formula (1) then becomes
Since is about , and since the size of the matrices does not depend on , formula (3) yields an algorithm of complexity . This observation (for any -automatic sequence) is due to Allouche and Shallit [1, Cor. 4.5]. However, this last assertion hides the need to first find the linear representation . As shown in [2, Ex. 5], already in the case of a finite prime field, translating the -automaticity in terms of linear algebra yields matrices whose size can be about . Thus, their precomputation has a huge impact on the cost with respect to the prime number .
In the particular case of a prime field , and under the assumption , this was improved in  by building on an idea originally introduced by Christol in : one can compute in complexity . So far, this was the best complexity result for this task.
Contributions. We further improve the complexity result from  down to (Theorem 3.4, Section 3.2). Here is the exponent of matrix multiplication. In the case where is a finite field, we propose an even faster algorithm, with bit complexity linear in and quasi-linear in (Theorem 4.1, Section 4). It is obtained by blending the approach in Section 3.2 with ideas and techniques imported from the characteristic zero case . All these successive algorithmic improvements are consequences of our main theoretical result (Theorem 2.2, Section 2.2), which can be thought of as an effective version of Christol’s Theorem (and in particular reproves it).
2. Effective version of Christol’s theorem
We keep the notation of the introduction. Christol’s theorem is stated as follows.
Theorem 2.1 (Christol).
Let in be a formal power series that is algebraic over , where is a perfect field with positive characteristic. Then there exists a finite-dimensional -vector space containing and stable by the section operators.
The aim of this section is to state and to prove an effective version of Theorem 2.1, on which our forthcoming algorithms will be built. Our approach follows the initial treatment by Christol , which is based on Furstenberg’s theorem [14, Thm. 2]. For the application we have in mind, it turns out that the initial version of Furstenberg’s theorem will be inefficient; hence we will first need to strengthen it, considering residues around the moving point instead of residues at . Another new input we shall use is a globalization argument allowing us to compare section operators at and at . This argument is formalized throught Frobenius operators and is closely related to the Cartier operator used in a beautiful geometric proof of Christol’s theorem due to Deligne  and Speyer , and further studied by Bridy .
2.1. Frobenius and sections
Recall that the ground field is assumed to be a perfect field of prime characteristic , for example a finite field , where . Let be the field of rational functions over and let .
Since is a perfect field, the Frobenius endomorphism defined by is an automorphism of . It extends to a ring homomorphism, still denoted by , from to which raises an element of to the power . This homomorphism is an isomorphism and its inverse writes
where each , with , maps onto itself.
The use in (4) of the same notation as in Formula (2) is not a mere coincidence. The algebraic series provides an embedding of into the field of Laurent series , which is the evaluation of an element of at the point . We will call the corresponding map, which sends to . The Frobenius operator extends from to , and the same holds for the sections (). These extensions are exactly those of Eq. (2). The ’s in Eq. (4) then appear as global variants of the ’s in Eq. (2). Moreover, global and local operators are compatible, in the sense that they satisfy
As for rational functions, the Frobenius operator and the section operators induce, respectively, a ring isomorphism from onto and maps () from onto such that . The operators and () are not -linear but only -linear. More precisely, for any in , in , and in ,
In other words both and are actually semi-linear.
2.2. The key theorem
Let be the set of polynomials such that and .
For and for , there exists a (unique) polynomial in such that
The rest of this subsection is devoted to the proof of Theorem 2.2. Although mainly algebraic, the proof is based on the rather analytic remark that any algebraic function in can be obtained as the residue at of some rational function in (see Lemma 2.3). This idea was already used in Furstenberg , whose work has been inspiring for us. The main new insight of our proof is the following: we replace several small branches around zero by a single branch around a moving point. In order to make the argument work, we shall need further to relate the behavior of the section operators around and around the aforementioned moving point. This is where the reinterpretation of the ’s in terms of Frobenius operators will be useful.
We consider the ring of power series over . Its fraction field is the field of Laurent series over . There is an embedding taking a polynomial in to its Taylor expansion around . Formally, it is simply obtained by mapping the variable to . It extends to a field extension . We will often write for the image of in . The field is moreover endowed with a residue map , defined by (by convention, if ). It is clearly -linear.
For any polynomial , the following equality holds:
Since is a simple root of , the series has a simple zero at . This means that it can be written with , . Taking the logarithmic derivative with respect to gives
akin to [14, Formula (15), p. 276], from which we derive
where . Since does not vanish at , the series has no pole at . Therefore, the residue of is nothing but . Besides the residue of the second summand vanishes. All in all, the residue of is . ∎
We now introduce analogues of section operators over . For this, we first observe that the Frobenius operator defines an isomorphism . Moreover is a field extension of of degree . A basis of over is of course , but it will be more convenient for our purposes to use a different one. It is given by the next lemma.
The family is a basis of over .
For simplicity, we set . We have:
where is the square matrix whose entry (for ) is . In particular, is upper triangular and all its diagonal entries are equal to . Thus is invertible and the conclusion follows. ∎
For and in , we define the section operators by
(These operations look like those used in [2, §3.2], but they are not exactly the same.) Clearly extends the operator defined by Eq. (2) and for all . We observe moreover that the ’s stabilize the subrings and , since corresponds to .
The following commutation relation holds over :
Let us write as with and for all . Its image under can be expressed in two different ways as follows:
We identify the coefficient in . For doing so, we observe that the terms obtained with do not contribute, while the contribution of the term is the residue of . We then get
Going back to the definition of , we derive , from which the lemma follows. ∎
Proof of Theorem 2.2.
(compare with [2, §3.2]). The stability of under follows using the fact that is the minimal polynomial of over . If we know in addition that lies in then is in and, therefore, falls in as well. Theorem 2.2 is proved. ∎
It is possible to slightly vary the bounds on the degree and the height, and to derive this way other stability statements. For example, starting from a polynomial with and , we have:
with and . Moreover provided that .
Another remark in this direction is the following: if has degree at most , the section has degree at most for any . Indeed, has degree at most . In other words, the subspace is stable by the section operators ().
3. Application to algorithmics
Theorem 2.2 exhibits an easy-to-handle finite dimensional vector space which is stable under the section operators. In this section, we derive from it two efficient algorithms that compute the th term of in linear time in . The first is less efficient, but easier to understand; we present it mainly for pedagogical purposes.
3.1. First algorithm: modular computations
The first algorithm we will design follows rather straightforwardly from Theorem 2.2. It consists of the following steps:
we compute the matrix giving the action of the Frobenius with respect to the “modified monomial basis” ;
we deduce the matrix of with respect to ;
we extract from it the matrices of the section operators ;
we compute the th coefficient of using Formula (1).
Let us be a bit more precise (though we will not give full details because we will design in §3.2 below an even faster algorithm). Let be the matrix of in the basis ; its th column contains the coordinates of the vector in the basis , which are also the coordinates of in the monomial basis . It is easily seen that the matrix of with respect to is , which is, by definition, the matrix obtained by applying to each entry of .
We now discuss the complexity of the computation of . Thanks to Theorem 2.2 and Eq. (4), we know that its entries are polynomials of degree at most . However, this bound is not valid for the entries of . Indeed, in full generality, the latter are rational fractions whose numerators and denominators have degrees of magnitude . In order to save the extra factor , we rely on modular techniques: we choose a polynomial of degree and perform all computations modulo . To make the method work, must be chosen in such a way that both and make sense modulo , i.e. must be coprime with the denominators of the entries of . The latter condition discards a small number of choices, so that a random polynomial
will be convenient with high probability.
Using fast polynomial and matrix algorithms, the computation of modulo can be achieved within operations in , while the inversion of modulo requires operations in , where is the matrix multiplication exponent. Since we count an application of as a unique operation, the cost of the first two steps is as well. The third step is free as it only consists in reorganizing coefficients. As for the evaluation of the formula (1), each application of has a cost of operations in . The total complexity of our algorithm is then operations in .
We do not need actually to apply the Frobenius inverse since, at the end of the computation, we raise the last intermediate result at the power . The complexity can then be reached even if we do not count an application of as a single operation.
A detailed example
Consider and the polynomial
It admits a unique root in which is congruent to modulo .
The matrix of the Frobenius with respect to the basis writes , where and the largest entry of have degrees and , respectively. However, by Theorem 2.2, we know that has polynomial entries of degree at most . Noticing that is not a root of the resultant of and , we can compute and its inverse modulo . The result of this computation is displayed partly on Figure 1. We observe that the maximal degree of the entries of is and reaches our bound (which is then tight for this example). We furthermore observe that is block triangular, as expected after Remark 2.6.
Let us now compute the images of under the section operators. For this, we write in . We then have to compute the product . As a result, we obtain:
Rearranging the terms, we finally find that
To conclude this example, suppose that we want to compute the th coefficient of . Applying Eq. (1), we find that it is equal to the constant coefficient of . Therefore we have to compute . Repeating twice what we have done before, we end up with
Plugging in the above equality, we get , from which we conclude that .
In the above example, only the constant coefficient of was needed to carry out the whole computation. This is related to the fact that has -adic valuation . More generally if has -adic valuation , we will need the first coefficients of since the final division by will induce a loss of -adic precision of “digits”. This does not change the complexity bound, since .
3.2. Second algorithm: Hermite–Padé approximation
For obvious reasons related to the size of the computed objects, we cannot hope to achieve a complexity lower than linear with respect to using the approach of Section 3.1. However, the exponent on still can be improved. In order to achieve this, we return to Theorem 2.2. The key idea is to leap efficiently from the polynomial to the polynomial in Formula (7).
Let in and . By Theorem 2.2, there exists in such that , or, equivalently,
The algorithmic question is to recover efficiently the ’s starting from the ’s. Identifying coefficients in Eq. (8) yields a linear system over in the coefficients of the unknown polynomials . This system has unknowns and an infinite number of linear equations. The point is that the following truncated version of Eq. (8)
is sufficient to uniquely determine . This is a direct consequence of the following.
If in satisfies , then .
The resultant of and with respect to is a polynomial of degree at most . On the other hand, we have a Bézout relation
where and are bivariate polynomials in . By evaluating the previous equality at it follows that
holds in , and therefore . Thus and have a non-trivial common factor; since is irreducible, it must divide . But , so . ∎
Solving Eq. (9) amounts to solving a
Hermite–Padé approximationproblem. In terms of linear algebra, it translates into solving a linear system over in the coefficients of the unknown polynomials . This system has unknowns and linear equations. Moreover, it has a very special shape: it has a quasi-Toeplitz structure, with displacement rank . Therefore, it can be solved using fast algorithms for structured matrices [17, 4] in operations in . These algorithms first compute a (quasi)-inverse of the matrix encoding the homogenous part of the system, using a compact data-structure called displacement generators (or, representation); then, they apply it to the vector encoding the inhomogeneous part. The first step has complexity , the second step has complexity .
In our setting, we will need to solve systems of this type, each corresponding to the current digit of in radix . An important feature is that these systems share the same homogeneous part, which only depends on the coefficients of the power series occurring on the right-hand side of (9). Only the inhomogeneous parts vary: they depend on the linear combination . Putting these facts together yields Algorithm 1 and the following complexity result.
Let be a perfect field with characteristic . Let be an irreducible polynomial in of height and degree . We assume that we are given a nonnegative integer and a polynomial such that and .
There there exists a unique series congruent to modulo for which . Moreover, Algorithm 1 computes the th coefficient of for a cost of operations in .
The first assertion is Hensel’s Lemma [12, Th. 7.3].
The precomputation of modulo for can be performed using Newton iteration, for a total cost of operations in . As explained above, this is enough to set up the homogeneous part of the quasi-Toeplitz system; its inversion has cost .
Let us turn to the main body of the computation, which depends on the index . For each -digit of , we first construct the inhomogeneous part of the system. For this, we extract the coefficients of in , for , for a total cost of operations in . We then apply the inverse of the system to it, for a cost of (using a naive matrix vector multiplication222One can actually achieve this step for a cost of operations in using the quasi-Toeplitz structure; however this is not that useful since the cost of the previous step was already .). This is done times. The other steps of the algorithm have negligible cost. ∎
4. Improving the complexity with respect to
As shown in Theorem 3.4, Algorithm 1 has a nice complexity with respect to the parameters , and : it is polynomial with small exponents. However, the complexity with respect to is not that good, as it is exponential in , which is the relevant parameter. Thus, when is large (say ), Algorithm 1 runs slowly and is no longer usable.
For this reason, it is important to improve the complexity with respect to . In this section, we introduce some ideas to achieve this. More precisely, our aim is to design an algorithm whose complexity with respect to and is , and remains polynomial in all other relevant parameters. In the current state of knowledge, it seems difficult to decrease further the exponent on ; indeed, the question addressed in this paper is related to other intensively studied questions (e.g., counting points via -adic cohomologies) for which the barrier has not been overcome yet.
Notations and assumptions
We keep the notation of previous sections. We make one additional hypothesis: the ground field is a finite field. We assume that is represented as where is an irreducible monic polynomial over of degree . We choose a monic polynomial of degree lifting . We set where is the ring of -adic integers.
The algorithm we are going to design is not algebraic in the sense that it does not only perform algebraic operations in the ground field , but will sometimes work over (or, more exactly, over finite quotients of ). For this reason, throughout this section, we will use bit complexity instead of algebraic complexity.
We use the notation to indicate a quantity whose growth is at most polynomial in . The precise result we will prove reads as follows.
Under the assumptions of Theorem 3.4 and the above extra assumptions, there exists an algorithm of bit complexity that computes the th coefficient of .
If is bounded by a (fixed) polynomial in and , then Theorem 4.1 has been proved already. In the sequel, we will then always assume that .