The characteristic polynomial is a fundamental invariant of a matrix: its roots give the eigenvalues, and the trace and determinant can be extracted from its coefficients. In fact, the best known division-free algorithm for computing determinants over arbitrary rings [kaltofen:92a] does so using the characteristic polynomial. Over -adic fields, computing the characteristic polynomial is a key ingredient in algorithms for counting points of varieties over finite fields (see [kedlaya:01a, harvey:07a, harvey:14a].
When computing with -adic matrices, the lack of infinite memory implies that the entries may only be approximated at some finite precision . As a consequence, in designing algorithms for such matrices one must analyze not only the running time of the algorithm but also the accuracy of the result.
Let be known at precision . The simplest approach for computing the characteristic polynomial of is to compute either using recursive row expansion or various division free algorithms [seifullin:02a] [kaltofen:92a]. There are two issues with these methods. First, they are slower than alternatives that allow division, requiring , and operations. Second, while the lack of division implies that the result is accurate modulo as long as , they still do not yield the optimal precision.
A faster approach over a field is to compute the Frobenius normal form of , which is achievable in running time [storjohann:01a]. However, the fact that it uses division frequently leads to catastrophic losses of precision. In many examples, no precision remains at the end of the calculation.
Instead, we separate the computation of the precision of from the computation of an approximation to . Given some precision on , we use [caruso-roe-vaccon:14a]*Lem. 3.4 to find the best possible precision for . The analysis of this precision is the subject of much of this paper. With this precision known, the actual calculation of may proceed by lifting to a temporarily higher precision and then using a sufficiently stable algorithm (see Remark 5.3).
One benefit of this approach is that we may account for diffuse precision: precision that is not localized on any single coefficient of . For example, let , consider a diagonal matrix with diagonal entries , let and set . The valuation of the coefficient of in will be , and if and is known with precision then the constant term of will be known with precision larger than (see [caruso-roe-vaccon:15a]*Prop. 3.2).
As long as none of the eigenvalues of are congruent to modulo , then none of the coefficients of the characteristic polynomial of will have precision larger than . But , so the precision content of these two polynomials should be equivalent. The solution is that the extra precision in is diffuse and not visible on any individual coefficient. We formalize this phenomenon using lattices; see Section 2.1 for further explanation, and [caruso:17a]*§3.2.2 for a specific example of the relationship between and .
Since the description of Kedlaya’s algorithm in [kedlaya:01a], the computation of characteristic polynomials over -adic numbers has become a crucial ingredient in many counting-points algorithms. For example, [harvey:07a, harvey:14a] use -adic cohomology and the characteristic polynomial of Frobenius to compute zeta functions of hyperelliptic curves.
In most of these papers, the precision analysis usually deals with great details on how to obtain the matrices (e.g. of action of Frobenius) that are involved in the point-counting schemes. However, the computation of their characteristic polynomials is often a little bit less thoroughly studied: some refer to fast algorithms (using division), while others apply division-free algorithms.
In [caruso-roe-vaccon:15a], the authors have begun the application of the theory of differential precision of [caruso-roe-vaccon:14a] to the stable computation of characteristic polynomials. They have obtained a way to express the optimal precision on the characteristic polynomial, but have not given practical algorithms to attain this optimal precision.
The contribution of this paper.
Thanks to the application the framework of differential precision of [caruso-roe-vaccon:14a] in [caruso-roe-vaccon:15a], we know that the precision of the characteristic polynomial of a matrix is determined by the comatrix In this article, we provide:
Corollary 2.4: a simple, criterion to decide whether is defined at precision higher than the precision of (when ).
Theorem 3.11: a algorithm with operations in to compute the optimal precision on each coefficient of (when is given with uniform precision on its entries).
Proposition 5.6: a algorithm to compute the optimal precision on each eigenvalue of
Organization of the article.
In Section 2, we review the differential theory of precision developed in [caruso-roe-vaccon:14a] and apply it to the specific case of the characteristic polynomial, giving conditions under which the differential will be surjective (and thus provide a good measure of precision). We also give a condition based on reduction modulo that determines whether the characteristic polynomial will have a higher precision that the input matrix, and show that the image of the set of integral matrices has the structure of an -module when is itself integral. Finally, we give a compact description of , the main ingredient in the differential.
In Section 3, we develop algorithms to approximate the Hessenberg form of , and through it to find and thus find the precision of the characteristic polynomial of . In Section 4, we give a algorithm to compute the compact description of .
Finally, we propose in Section 5 algorithms to compute the optimal coefficient-wise precision for the characteristic polynomial. We also give the results of some experiments demonstrating that these methods can lead to dramatic gains in precision over standard interval arithmetic. We close with results describing the precision associated to eigenvalues of a matrix.
Throughout the paper, will refer to a complete, discrete valuation field, to its valuation, its ring of integers and a uniformizer. We will write that if there exists some such that We will write for an matrix over , and the characteristic polynomial map, for the characteristic polynomial of and for the differential of at , as a linear map from to the space of polynomials of degree less than . We fix an such that the multiplication of two matrices over a ring is in operations in the ring. Currently, the smallest known is less than thanks to [legall:14a]. We will denote by
the identity matrix of rankin When there is no ambiguity, we will drop this for scalar matrices, e.g. for and denotes Finally, we write for the elementary divisors of , sorted in increasing order of valuation.
2 Theoretical study
2.1 The theory of p-adic precision
We recall some of the definitions and results of [caruso-roe-vaccon:14a] as a foundation for our discussion of the precision for the characteristic polynomial of a matrix. We will be concerned with two -manifolds in what follows: the space of matrices with entries in and the space of monic degree polynomials over . Given a matrix , the most general kind of precision structure we may attach to is a lattice in the tangent space at . However, representing an arbitrary lattice requires basis vectors, each with entries. We therefore frequently work with certain classes of lattices, either jagged lattices where we specify a precision for each matrix entry or flat lattices where every entry is known to a fixed precision . Similarly, precision for monic polynomials can be specified by giving a lattice in the tangent space at , or restricted to jagged or flat precision in the interest of simplicity.
Let be the characteristic polynomial map. Our analysis of the precision behavior of rests upon the computation of its derivative , using [caruso-roe-vaccon:14a]*Lem. 3.4. For a matrix , we identify the tangent space at with itself, and the tangent space at with the space of polynomials of degree less than . Let denote the comatrix of (when , we have ) and the differential at . Recall [caruso-roe-vaccon:14a]*Appendix B [caruso-roe-vaccon:15a]*§3.3 that is given by
For , the following conditions are equivalent:
the differential is surjective
the matrix has a cyclic vector (i.e. is similar to a companion matrix)
the eigenspaces ofover the algebraic closure of all have dimension
the characteristic polynomial of is equal to the minimal polynomial of .
For any , the image of at will be the same as the image of at , so we may assume that is a companion matrix. For a companion matrix, the bottom row of consists of so is surjective.
Now suppose that has a repeated eigenvalue over . After conjugating into Jordan normal form over , the entries of will also be block diagonal, and divisible within each block by the product of , where ranges over the eigenvalues and dimensions of the other Jordan blocks. Since occurs in two Jordan blocks, will divide every entry of and will not be surjective.
We also have an analogue of Proposition 2.1 for integral matrices.
For , the following conditions are equivalent:
the image of under is .
the reduction of modulo has a cyclic vector.
Write (resp. ) for the open (resp. closed) ball of radius in , and let denote the elementary divisors of .
Suppose that satisfies one of the conditions in Proposition 2.1, and let
Then, for all and all , any lattice such that satisfies:
Recall [caruso-roe-vaccon:15a]*Def. 3.3 that the precision polygon of is the lower convex hull of the Newton polygons of the entries of . By [caruso-roe-vaccon:15a]*Prop. 3.4, the endpoints of the precision polygon occur at height and . By convexity, .
Since the coefficients of are given by polynomials in the entries of with integral coefficients, [caruso-roe-vaccon:15a]*Prop. 2.2 implies the conclusion.
The relationship between precision and the images of lattices under allows us to apply Proposition 2.2 to determine when the precision of the characteristic polynomial is the minimum possible.
Suppose that is known with precision . Then the characteristic polynomial of has precision lattice strictly contained in if and only if the reduction of modulo does not have a cyclic vector.
Note that this criterion is checkable using operations in the residue field [storjohann:01a].
2.2 Stability under multiplication by
By definition, the codomain of is . However, when is given, is canonically isomorphic to as a -vector space. For our purpose, it will often be convenient to view as an -linear mapping .
Let be the subring of consisting of polynomials for which , and as a submodule of . Then is stable under multiplication by .
Let and . By (1), is given by the -span of the entries of . Using the fact that the product of matrix with its comatrix is the determinant, and thus . The span of the entries of the left hand side is precisely , while the span of the entries of the right hand side is contained within since .
If , then is stable under multiplication by and hence is a module over .
2.3 Compact form of
Let be the companion matrix associated to :
with . By Proposition 2.1, there exists a matrix such that . Applying the same result to the transpose of
, we find that there exists another invertible matrixsuch that .
We keep the previous notations and assumptions. Let be the row vector . Then
for some .
Write . From , we deduce . Therefore each column of lies in the right kernel of modulo . On the other hand, a direct computation shows that every column vector lying in the right kernel of modulo can be written as for some . We deduce that for some row vector . Applying the same reasoning with , we find that can be written for some and we are done.
Proposition 2.7 shows that can be encoded by the datum of the quadruple whose total size stays within : the polynomials and are determined by coefficients while we need entries to write down the matrices and . We shall see moreover in Section 4 that interesting information can be read off of this short form .
With the previous notation, if the quadruple for is which can be computed in operations in This is faster than computing which is, at first sight, in operations in
via Hessenberg form
In this section, we combine the computation of a Hessenberg form of a matrix and the computation of the inverse through the Smith normal form (SNF) over a complete discrete valuation field (CDVF) to compute and . If , then only division by invertible elements of will occur.
3.1 Hessenberg form
We begin with the computation of an approximate Hessenberg form.
A Hessenberg matrix is a matrix with
Given integers , an approximate Hessenberg matrix is a matrix with
If and is an (approximate) Hessenberg matrix similar to , we say that H is an (approximate) Hessenberg form of
It is not hard to prove that every matrix over a field admits a Hessenberg form. We prove here that over if a matrix is known at finite (jagged) precision, we can compute an approximate Hessenberg form of it. Moreover, we can provide an exact change of basis matrix. It relies on the following algorithm.
Algorithm 1: Approximate Hessenberg form computation
Input: a matrix in
1. for do
2. swap the row with a row () s.t. is minimal.
3. for do
4. Eliminate the significant digits of by pivoting with row using a matrix
Algorithm 1 computes and realizing an approximate Hessenberg form of is exact over finite extensions of and , and the computation is in operations in at precision the maximum precision of a coefficent in
Let us assume that is a finite extensions of or Inside the nested for loop, if we want to eliminate with pivot with the ’s being units, the corresponding coefficient of the corresponding shear matrix is the lift(in or adequate extension) of Exactness follows directly. Over other fields, we can not lift, but the computations are still valid. The rest is clear.
From a Hessenberg form of it is well known that one can compute the characteristic polynomial of in operations in [Cohen:2013]*pp. 55–56. However, this computation involves division, and its precision behavior is not easy to quantify.
3.2 Computation of the inverse
In this section, we prove that to compute the inverse of a matrix over a CDVF , the Smith normal form is precision-wise optimal in the flat-precision case. We first recall the differential of matrix inversion.
Let Then for It is always surjective.
We then have the following result about the loss in precision when computing the inverse.
Let . If is a flat precision of on then can be computed at precision by a SNF computation and this lower-bound is optimal, at least when is large.
The smallest valuation of a coefficient of is It is for and it is then clear that can be obtained as the valuation of a coefficient of and the smallest that can be achieved this way for in a precision lattice of flat precision. Hence the optimality of the bound given, at least when is large [caruso-roe-vaccon:14a]*Lem. 3.4.
Now, the computation of the Smith normal form was described in [Vaccon-these]. From known at flat precision we can obtain an exact , and and known at precision at least , with coefficients in and determinant in realizing an Smith normal form of There is no loss in precision when computing and Since the smallest valuation occurring in is we see that is known at precision at least which concludes the proof.
3.3 The comatrix of
In this section, we compute for a Hessenberg matrix using the Smith normal form computation of the previous section. The entries of lie in , which is not a CDVF, so we may not directly apply the methods of the previous section. However, we may relate to , whose entries lie in the CDVF . In this way, we compute using an SNF method, with no division in .
First, we need a lemma relating comatrices of similar matrices:
If and are such that then:
The second ingredient we need is reciprocal polynomials. We extend its definition to matrices of polynomials.
Let and of degree at most We define the reciprocal polynomial of order of as Let a matrix of polynomials of degree at most We denote by the matrix with .
We then have the following result :
It all comes down to the following result: let a matrix of polynomials of degree at most then Indeed, one can use multilinearity of the determinant on to prove this result. It directly implies the second part of the lemma; the first part follows from the fact that the entries of and of are determinants of size .
This lemma allows us to compute instead of This has a remarkable advantage: the pivots during the computation of the SNF of are units of and are known in advance to be on the diagonal. This leads to a very smooth precision and complexity behaviour when the input matrix lives in
Algorithm 2: Approximate
Input: an approximate Hessenberg matrix in
1. While updating , track and so that is always satisfied.
2. for do
3. Eliminate, modulo the coefficients for using the invertible pivot (with ).
4. for do
5. Eliminate, modulo the coefficients using the invertible pivot
7. Rescale to get
8. 111The product should be implemented by sequential row operations corresponding to the eliminations in Step 5 in order to avoid a product of two matrices in .
Let be an approximate Hessenberg matrix. Then, using Algorithm 2, one can compute and in operations in at the precision given by
First, the operations of the lines 2 and 3 use operations in at the precision given by Indeed, since is an approximate Heisenberg matrix, when we use as pivot the only other nonzero coefficient in its column is . As a consequence, when performing this column-pivoting, only two rows ( and ) lead to operations in other than checking precision. Hence, line 3 costs for the computation of Following line 1, the computation of is done by operations on rows, starting from the identity matrix. The order in which the entries of are cleared implies that is just filled in as an upper triangular matrix: no additional operations in are required. Thus the total cost for lines 2 and 3 is indeed operations.
For lines 4 and 5, there are only eliminations, resulting in a cost for the computation of . Rather than actually construct , we just track the eliminations performed in order to do the corresponding row operations on , since we only need the product .
Line 6 is in and 7 in
Thanks to the fact that the only corresponds to the product of shear matrices, the product on line 8 is in We emphasize that no division has been done throughout the algorithm. Line 9 is costless, and the result is then proved.
If does not have coefficients in we may apply Algorithms 1 and 2 to in operations in , and then divide the coefficient of in the resulting polynomial by .
We will see in Section 5 that for an entry matrix with coefficients known at flat precision, Algorithms 1 and 2 are enough to know the optimal jagged precision on
3.4 The comatrix of
In this section, we combine Proposition 2.7 with Algorithm 2 to compute the comatrix of when is squarefree. Note that this condition on is equivalent to being diagonalizable under the assumption that is surjective. The result is the following algorithm, where the only divisions are for gcd and modular inverse computations.
Algorithm 3: Approximate
Input: an approx. with
0. Find and approximate Hessenberg, such that using Algorithm 1.
1. Compute and using Algorithm 2.
2. Do for random by doing for some Compute
3. Similarily compute for corresponding to adding a random linear combination of the columns of index to the first column of
4. If then go to 2.
5. Let be the inverse of .
6. Let and .
For such that Algorithm 3 computes in average complexity operations in . The only divisions occur in taking gcds and inverses modulo .
As we have already seen, completing Steps 0 and 1 is in Multiplying by or or their inverse corresponds to operations on rows or columns over a matrix with coefficients in of degree at most Thus, it is in Step 5 is in Step 6 in and Step 7 in . All that is to prove is that the set of and to avoid is of dimension at most The idea is to work modulo for a root of (in an algebraic closure) and then apply Chinese Remainder Theorem. The goal of the Step is to ensure the first row of contains an invertible entry modulo Since is of rank one, the
’s have to avoid an affine hyperplane so thatis a non-zero vector. Hence for to contain an invertible coefficient, a finite union of affine hyperplane is to avoid. Similarly, the goal of Step 3 is to put an invertible coefficient (modulo ) on and again, only a finite union of affine hyperplane is to avoid. Hence, the set that the ’s have to avoid is a finite union of hyperplane, and hence, is of dimension at most Thus, almost any choice of leads to a matrix passing the test in Step 4. This concludes the proof.
As in the previous section, it is possible to scale so as to get coefficients in and apply the previous algorithm.
We refer to [caruso:15a] for the handling of the precision of gcd and modular inverse computations. In this article, ways to tame the loss of precision coming from divisions are explored, following the methods of [caruso-roe-vaccon:14a].
via Frobenius form
The algorithm designed in the previous section computes the differential of at a given matrix for a cost of operations in . This seems to be optimal given that the (naive) size of the is : it is a matrix of size . It turns out however that improvements are still possible! Indeed, thanks to Proposition 2.7, the matrix of admits a compact form which can be encoded using only coefficients. The aim of this short section is to design a fast algorithm (with complexity ) for computing this short form. The price to pay is that divisions in
appear, which can be an issue regarding to precision in particular cases. In this section, we only estimate the number of operations inand not their behaviour on precision.
From now on, we fix a matrix for which is surjective. Let be the quadruple encoding the short form of ; we recall that they are related by the relations:
An approximation to can be computed in operations in (e.g. as a by-product of [storjohann:01a]).
The matrix can be computed as follows. Pick . Define for all . The ’s can be computed in operations in e.g. using the first algorithm of [keller-gehrig:85a]. Let be the matrix whose rows are the ’s for . Remark that is invertible if and only if is a basis of if and only if is a cyclic vector. Moreover after base change to the basis , the matrix takes the shape (3). In other words, if is invertible, then is a solution of , where is the companion matrix similar to . Moreover, observe that the condition “
is invertible” is open for the Zariski topology. It then happens with high probability as soon as it is not empty, that is as soon asadmits a cyclic vector, which holds by assumption.
The characteristic polynomial can be recovered thanks to the relation .
Now, instead of directly computing , we first compute a matrix with the property that . To do so, we apply the same strategy as above except that we start with the vector (and not with a random vector). A simple computation shows that, for , the vector has the shape:
with starting zeros. Therefore the ’s form a basis of , i.e. is always a cyclic vector of . Once has been computed, we recover using the relation .
It remains to compute the scaling factor . For this, we write the relation:
which comes from Eq. (4) after multiplication on the left by and multiplication on the right by . We observe moreover that the first row of is . Evaluating the top left entry of Eq. (5), we end up with the relation:
No further computation are then needed to derive the value of . We summarize this section with the following theorem:
Given such that is surjective, then one can compute in such that in operations in
5 Optimal jagged precision
In the previous Sections, 3 and 4, we have proposed algorithms to obtain the comatrix of Our motivation for these computations is to then be able to understand what is the optimal precision on In this section, we provide some answers to this question, along with numerical evidence. We also show that it is then possible to derive optimal precision of eigenvalues of
5.1 On the characteristic polynomial
For , let be the mapping taking a polynomial to its coefficients in . By applying [caruso-roe-vaccon:14a]*Lem. 3.4 to the composite , one can figure out the optimal precision on the -th coefficient of the characteristic polynomial of (at least if is given at enough precision).
Let us consider more precisely the case where is given at jagged precision: the entry of is given at precision for some integers . Lemma 3.4 of [caruso-roe-vaccon:14a] then shows that the optimal precision on the -th coefficient of is where is given by the formula:
where is the entry of the comatrix .
If is given at (high enough) jagged precision, then we can compute the optimal jagged precision on in operations in .
We have seen in §3 and §4 that the computation of the matrix can be carried out within operations in (either with the Hessenberg method or the Frobenius method). We conclude by applying Eq. (6) which requires no further operation in (but evaluations of valuations and manipulations of integers).
If is given at (high enough) flat precision, then we can avoid the final base change step in the Hessenberg method. Indeed, observe that, thanks to Lemma 3.6, we can write:
where lies in . Moreover, the latter condition implies that runs over when runs over . As a consequence, the integer giving the optimal precision on the -th coefficient of is also equal to where is the entry of , where is the Hessenberg form of .
As a consequence of the previous discussion, once the optimal jagged precision is known, it is possible to lift the entries of to a sufficiently large precision, rescale them to have entries in and then use Algorithm 2 to compute the characteristic polynomial. The output might then need to be rescaled and truncated at the optimal precision. This requires operations in and unfortunately, for several instances, may require to increase a lot the precision.
Numerical experiments. We have made numerical experiments in SageMath [sage] in order to compare the optimal precision obtained with the methods explained above with the actual precision obtained by the software. For doing so, we picked a sample of random matrices in where all the entries are given at the same relative precision. We recall that, in SageMath, random elements are generated as follows. We fix an integer — the so-called relative precision — and generate elements of of the shape
where is a random integer generated according to the distribution:
and is an integer in the range , selected uniformly at random.
Once this sample has been generated, we computed, for each , the three following quantities:
the optimal precision on the -th coefficient of the characteristic polynomial of given by Eq. (6)
in the capped relative model222Each coefficient carries its own precision which is updated after each elementary arithmetical operation., the precision gotten on the -th coefficient of the characteristic polynomial of computed via the call:
in the model of floating-point arithmetic (see[caruso:17a]*§2.3), the number of correct digits of the -th coefficient of the characteristic polynomial of .
The keyword algorithm="df" forces SageMath to use the division free algorithm of [seifullin:02a]. It is likely that, proceeding so, we limit the loss of precision.
The table of Figure 1 summarizes the results obtained.
|Average loss of accuracy|
Results for a sample of instances
It should be read as follows. First, the acronyms CR and FP refers to “capped relative” and “floating-point” respectively. The numbers displayed in the table are the average loss of relative precision. More precisely, if
is the relative precision at which the entries of the input random matrixhave been generated and is the valuation of the -th coefficient of , then:
the column “Optimal” is the average of the quantities (where is defined by Eq. (6)):