Suppose with prime and is a univariate polynomial with degree and all coefficients having absolute value less than . Let denote the number of roots of in (see, e.g., [24, 23, 2, 18, 14, 28] for further background on prime power rings). Computing is a fundamental problem occuring in polynomial factoring [21, 9, 4, 25, 15], coding theory , and cryptography . The function is also a basic ingredient in the study of Igusa zeta functions and algorithms over [16, 11, 10, 4, 30, 5, 20, 29, 6, 1].
In spite of the fundamental nature of computing , the fastest earlier general algorithms had complexity exponential in :  gave a deterministic algorithm taking time . While the -constant was not stated in , the proof of the main theorem there indicates that the dependence on in their algorithm is linear in . Note that counting the roots via brute-force takes time , so the algorithm from  is preferable, at least theoretically, for . Here, we present a simpler, dramatically faster randomized algorithm (Algorithm 2.3 of the next section) that appears practical for all .
Following the notation above, there is a Las Vegas randomized algorithm that computes in time . In particular, the number of random bits needed is , and the space needed is bits.
means that, with a fixed error probability (which we can take to be, say,
), our algorithm under-estimates the number of roots. Our algorithm otherwise gives a correct root count,and always correctly announces whether the output count is correct or not. This type of randomization is standard in numerous number-theoretic algorithms, such as the fastest current algorithms for factoring polynomials over finite fields or primality checking (see, e.g., [2, 17, 7]).
At a high level, our algorithm here and the algorithm from  are similar in that they reduce the main problem to a collection of computations, mostly in the finite field , indexed by the nodes of a tree with size at worst exponential in . Also, both algorithms count by partitioning the roots in into clusters having the same mod reduction. Put another way, for each root of the mod reduction of , we calculate the number of “lifts” has to a root , paying special attention to those that are degenerate: The latter kind of root might not lift to a root in , or might lift to as many as roots in (see, e.g., Lemma 2.1 below). Another subtlety to be aware of is that we compute the number of roots in , without listing all of them: The number of roots in can be as high as : Indeed, consider the polynomials (when ) and (when ). So we can’t attain a time or space bound sub-exponential in unless we do something more clever than naively store every root (see Remark 1.2 below).
In finer detail, the algorithm from  solves a “small” polynomial system at each node of a recursion tree (using a specially tailored Gröbner basis computation ), while our algorithm performs a univariate factorization in at each node of a smaller recursion tree. Our use of fast factorization (as in ) is why we avail to randomness, but this pays off: Gaining access to individual roots in (as suggested in ) enables us to give a more streamlined algorithm.
von zur Gathen and Hartlieb presented in  a randomized polynomial-time algorithm to compute all factorizations of certain . (Examples like show that unique factorization fails badly for , and the number of possible factorizations can be exponential in .) Their algorithm is particularly interesting since it uses a compact data structure to encode all the (possibly exponentially many) factorizations of . Unfortunately, their algorithm has the restriction that not divide the discriminant of . Their complexity bound, in our notation, is the sum of and a term involving the complexity of finding the mod reduction of a factorization over (see Remarks 4.10–4.12 from ). The complexity of just counting the number of possible factorizations (or just the number of possible linear factors) of from their data structure does not appear to be stated.
Creating an efficient classification of the roots of in (also improving the data structure from  by removing all restrictions on ), within time polynomial in , is a problem we hope to address in future work.
For the reader interested in implementations, we have a preliminary Maple implementation of Algorithm 2.3 freely downloadable from www.math.tamu.edu/~rojas/countpk.map . A few timings (all done on a Dell XPS13 Laptop with 8Gb RAM and a 256Gb ssd, running Ubuntu Linux 14.04) are listed below:111The timings in years were based on extrapolating (without counting the necessary expansion of laptop memory beyond 8Gb) from examples with much smaller already taking over an hour.
Brute-force Our Maple code Random degree years 0.077sec. Random degree years 0.116sec. 9min. 18sec. 20.075sec years 40.019sec. years 45.988sec. years 1min. 50.323sec.
Our Maple implementations of brute-force and Algorithm 2.3
here are 5 lines long and 16 lines long, respectively.
In particular, our random above were generated by
taking uniformly random integer coefficients in
and then multiplying (or ) random cubic examples together: This
results in longer timings for our code than directly picking a single random
polynomial of high degree. The actual numbers of roots in the last
examples are respectively , ,
1.1. A Recurrence from Partial Factorizations
Throughout this paper, we will use the integers to represent elements of , unless otherwise specified. With this understanding, we will use the following notation:
For any we let denote the mod reduction of and, for any root of , we call degenerate if and only if mod . Letting denote the usual -adic valuation with , we then define . Finally, fixing , let us inductively define a set of pairs as follows: We set . Then, for any with and any degenerate root of with , we define , and mod .
The “perturbations” of will help us keep track of how the roots of in cluster (in a -adic metric sense) about the roots of . Since is merely the coefficient of in the Taylor expansion of about , it is clear that is always an integer (under the assumptions above).
We will see in the next section how can be identified with a finite rooted directed tree. In particular, it is easy to see that the set is always finite since, by construction, only with and are possible (see also Lemma 3.3 of the next section).
Let us take , , and . Setting , a simple calculation then shows that , which has roots in . The root is non-degenerate so the only possible would be an .
In particular, and thus and mod . Since and is a non-degenerate root of , we see that the only possible would be an .
In particular, , so and mod , which only has non-degenerate roots. So by Definition 1.3 there can be no and thus our collection of pairs consists of just pairs.
Using base- expansion, there is an obvious bijection between the ring of -adic integers and the set of root-based paths in an infinite -ary tree. It is then natural to build a (finite) tree to store the roots of in . This type of tree structure was studied earlier by Schmidt and Stewart in [26, 27], from the point of view of classification and (in our notation) upper bounds on . However, it will be more algorithmically efficient to endow our set with a tree structure. The following fundamental lemma relates to a recursion tree structure on .
Following the notation above, let denote the number of non-degenerate roots of in . Then, provided and is not identically mod , we have
We prove Lemma 1.5 in the next section, where it will also easily follow that Lemma 1.5 applies recursively, i.e., our root counting formula still holds if one replaces with . There we also show how Lemma 1.5 leads to our recursive algorithm (Algorithm 2.3) for computing . Note that by construction, implies that is a degenerate root of . So the two sums above range over certain degenerate roots of . Note also that depends only on the residue class of mod , so we will often abuse the notations and by allowing as well. The following example illustrates how can be computed recursively.
2. Algebraic Preliminaries and Our Algorithm
Let us first recall the following version of Hensel’s Lemma:
(See, e.g., [22, Thm. 2.3, pg. 87, Sec. 2.6] Suppose is not identically zero mod , N, and is a non-degenerate root of . Then there is a unique with mod and mod .
The following lemma enables us to understand the lifts of degenerate roots of .
Following the notation of Lemma 2.1, suppose instead that is a root of of (finite) multiplicity . Suppose also that and there is a with mod and mod . Then .
Proof of Lemma 2.2: We may assume, by base- expansion that for some . Note that mod since is a degenerate root. Note also that for all . Letting we then see that mod . So mod implies that mod and thus .
To conclude, our multiplicity assumption implies that mod . So then and thus .
We are now ready to state our main algorithm.
Algorithm 2.3 (RandomizedPrimePowerRootCounting).
Input. with prime and satisfying
Output. An integer that, with probability at least , is exactly .
1: Let .
3: Let . Return.
5: Let . Return.
7: Let .
8: For a degenerate root of do
9: Let .
11: Let .
13: Let .
16: If the preceding For loop did not access all the degenerate roots of
17: Print ‘‘Sorry, your Las Vegas factoring method failed.
You have an under-count so you should try re-running.’’
19: Print ‘‘If you’ve seen no under-count messages then your count is correct!’’
Before proving the correctness of Algorithm 2.3, it will be important to prove our earlier key lemma.
Proof of Lemma 1.5: Proving our formula clearly reduces to determining how many lifts each possible root of has to a root of in . Toward this end, note that Lemma 2.1 implies that each non-degenerate lifts to a unique root of in . In particular, this accounts for the summand in our formula. So now we merely need to count the lifts of the degenerate roots.
Assume is a degenerate root of , write via base- expansion as before, set , and let . Clearly then, mod and, by construction, and is not identically mod .
If then mod independent of . So there are exactly values of with mod . This accounts for the second summand in our formula.
If then is a root of with mod if and only if mod . Also, (thanks to Lemma 2.2) because is a degenerate root. Since the base- digits do not appear in the last equality, the number of possible lifts of is thus exactly times the number of roots of . So this accounts for the third summand in our formula and we are done.
We are at last ready to prove the correctness of Algorithm 2.3.
Proof of Correctness of Algorithm 2.3: Assume temporarily that Algorithm 2.3 is correct when is not identically mod . Since (for any integers with ) mod mod , Steps 1–6 of our algorithm then clearly correctly dispose of the case where is identically mod . So let us now prove correctness when is not identically mod . Applying Lemma 1.5, we then see that it is enough to prove that the value of is the value of our formula for when the For loops of Algorithm 2.3 runs correctly.
Step 7 ensures that the value of is initialized as . Steps 8–15 (once the For loop is completed) then simply add the second and third summands of our formula to thus ensuring that , provided the For loop has run correctly, along with all the For loops in the recursive calls to RandomizedPrimePowerRootCounting. Should any of these For loops run incorrectly, Steps 16–20 ensure that our algorithm correctly announces an under-count. So we are done.
3. Our Complexity Bound: Proving Theorem 1.1
Let us now introduce a tree structure on that will simplify our complexity analysis.
Let us identify the elements of with nodes of a labelled rooted directed tree defined inductively as follows:
The root node of is labelled .
The non-root nodes of are uniquely labelled by each with .
There is an edge from node to node if and only if and there is a degenerate root of with and .
The label of a directed edge from node to node is .
In particular, the edges are labelled by powers of in , and the labels of the nodes lie in .
Letting (from Example 1.4)
and, the trees
and are drawn below, on the
left and right respectively:
Note that has depth
Recall, from Example 1.6, that has exactly roots in .
The following lemma will be central in our complexity analysis.
Suppose with prime, and has degree . Then:
The depth of is at most .
The degree of the root node of is at most .
The degree of any non-root node of labelled , with parent and , is at most . In particular,
has at most nodes at depth , and thus a total of no greater than nodes.
Assertion (2): Since has degree , and the multiplicity of any degenerate root of is at least , we see that has no more than degenerate roots in . Every edge emanating from the root node of corresponds to a unique degenerate root of (and not every degenerate root of need yield a valid edge emanating from the root of ), so we are done.
The degree bound for non-root nodes follows similarly to the
degree bound for the root node:
Letting , it
suffices to prove that for all
. Note that we must have
since for . So then, the coefficient of in must be divisible by for all . In other words, the coefficient of in must be divisible by for all , and thus . That follows from the definition of , and since and is a strictly decreasing function of .
To prove the final bound, note that Lemma 2.2 implies that the term in the sum is at most the multiplicity of the root of . Since the sum of the multiplicities of the degenerate roots of is no greater than , we are done.
Assertion (4): By Assertion (3), the sum of the degrees of the (as ranges over all depth node labels of ) is no greater than , which is at most .
By applying Assertion (3) to all nodes of depth , the sum of the degrees of the (as ranges over all depth node labels of ) is no greater than the sum of the degrees of the (as ranges over all depth node labels of ).
Since we thus obtain that, for every depth , the sum of the degrees of the (as ranges over all depth node labels of ) is no greater than . So by the final part of Assertion (3), our tree has no more than nodes at any fixed depth . So by Assertion (1) we are done.
We are at last ready to prove our main theorem.
Proof of Theorem 1.1: Since we already proved at the end of the last section that Algorithm 2.3 is correct, it suffices to prove the stated complexity bound for Algorithm 2.3. Proving that Algorithm 2.3 runs as fast as stated will follow easily from (a) the fast randomized Kedlaya-Umans factoring algorithm from  and (b) applying Lemma 3.3 to show that the number of necessary factorizations and -adic valuation calculations is well-bounded.
More precisely, the For loops and recursive calls of Algorithm 2.3 can be interpreted as a depth-first search of , with being built along the way. In particular, we begin at the root node by factoring in via , in order to find the degenerate roots of . (Factoring in fact dominates the complexity of the gcd computation that gives us if we use a deterministic near linear-time gcd algorithm such as that of Knuth and Schönhage (see, e.g., [BCS97, Ch. 3]).) This factorization takes time and requires random bits.
Now, in order to continue the recursion, we need to compute -adic valuations of polynomial coefficients in order to find the and determine the edges emanating from our root. Expanding each can clearly be done mod , so each such expansion takes time no worse than via Horner’s method and fast finite ring arithmetic (see, e.g., [2, 13]). Computing then takes time no worse than using, say, the standard binary method for evaluating powers of . There are no more than possible (thanks to Lemma 3.3), so the total work so far is
(To simplify our bound, we are rolling multiplicative constants into the exponent, at the price of a negligible increase in the little- terms in the exponent.)
The remaining work can then be bounded similarly, but with one small twist: By Assertion (4) of Lemma 3.3, the number of nodes at depth of our tree is never more than and, by Assertion (3), the sum of the degrees of the at level is no greater than .
Now observe that (for ) the amount of work needed to compute the at level (which are used to define the polynomials at level ) is no greater than, and this will be dominated by the subsequent computations of the expansions of the . In particular, by the basic calculus inequality (valid for any ), the total amount of work for the factorizations for each subsequent level of will be
with a total of random bits needed. The expansions of the at level will take time no greater than to compute. So our total work at each subsequent level is then
So then, the total amount of work for our entire tree will be
and the number of random bits needed is .
We are nearly done, but we must still ensure that our algorithm has the correct Las Vegas properties. In particular, while finite field factoring can be assumed to succeed with probability , we use multiple calls to finite field factoring, each of which could fail. The simplest solution is to simply run our finite field factoring algorithm sufficiently many times to reduce the over-all error probability. In particular, thanks to Lemma 3.3, it is enough to enforce a success probability of for each application of finite field factoring. This implies that we should run the algorithm from  many times each time we need a factorization over . So, multiplying our last total by , this yields a final complexity bound of