Computations with p-adic numbers

01/24/2017 ∙ by Xavier Caruso, et al. ∙ 0

This document contains the notes of a lecture I gave at the "Journées Nationales du Calcul Formel" (JNCF) on January 2017. The aim of the lecture was to discuss low-level algorithmics for p-adic numbers. It is divided into two main parts: first, we present various implementations of p-adic numbers and compare them and second, we introduce a general framework for studying precision issues and apply it in several concrete situations.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

Introduction

The field of -adic numbers, , was first introduced by Kurt Hensel at the end of the 19th century in a short paper written in German [36]. From that time, the popularity of -adic numbers has grown without interruption throughout the 20th century. Their first success was materialized by the famous Hasse–Minkowski’s Theorem [75] that states that a Diophantine equation of the form where is a polynomial of total degree at most has a solution over if and only if it has a solution over and a solution over for all prime numbers . This characterization is quite interesting because testing whether a polynomial equation has a -adic solution can be carried out in a very efficient way using analytic methods just like over the reals. This kind of strategy is nowadays ubiquitous in many areas of Number Theory and Arithmetic Geometry. After Diophantine equations, other typical examples come from the study of number fields: we hope deriving interesting information about a number field by studying carefully all its -adic incarnations . The ramification of , its Galois properties, etc. can be — and are very often — studied in this manner [69, 65]. The class field theory, which provides a precise description of all Abelian extensions222An abelian extension is a Galois extension whose Galois group is abelian. of a given number field, is also formulated in this language [66]. The importance of -adic numbers is so prominent today that there is still nowadays very active research on theories which are dedicated to purely -adic objects: one can mention for instance the study of -adic geometry and -adic cohomologies [6, 58], the theory of -adic differential equations [50], Coleman’s theory of -adic integration [24], the -adic Hodge theory [14], the -adic Langlands correspondence [5], the study of -adic modular forms [34], -adic -functions [52] and -functions [22], etc. The proof of Fermat’s last Theorem by Wiles and Taylor [81, 78] is stamped with many of these ideas and developments.

Over the last decades, -adic methods have taken some importance in Symbolic Computation as well. For a long time, -adic methods have been used for factoring polynomials over  [56]. More recently, there has been a wide diversification of the use of -adic numbers for effective computations: Bostan et al. [13] used Newton sums for polynomials over to compute composed products for polynomials over ; Gaudry et al. [32] used -adic lifting methods to generate genus 2 CM hyperelliptic curves; Kedlaya [49], Lauder [54] and many followers used -adic cohomology to count points on hyperelliptic curves over finite fields; Lercier and Sirvent [57] computed isogenies between elliptic curves over finite fields using -adic differential equations.

The need to build solid foundations to the algorithmics of -adic numbers has then emerged. This is however not straightforward because a single -adic number encompasses an infinite amount of information (the infinite sequence of its digits) and then necessarily needs to be truncated in order to fit in the memory of a computer. From this point of view, -adic numbers behave very similarly to real numbers and the questions that emerge when we are trying to implement -adic numbers are often the same as the questions arising when dealing with rounding errors in the real setting [62, 26, 63]. The algorithmic study of -adic numbers is then located at the frontier between Symbolic Computation and Numerical Analysis and imports ideas and results coming from both of these domains.

Content and organization of this course.

This course focuses on the low-level implementation of -adic numbers (and then voluntarily omits high-level algorithms making use of -adic numbers) and pursues two main objectives. The first one is to introduce and discuss the most standard strategies for implementing -adic numbers on computers. We shall detail three of them, each of them having its own spirit: (1) the zealous arithmetic which is inspired by interval arithmetic in the real setting, (2) the lazy arithmetic with its relaxed improvement and (3) the

-adic floating-point arithmetic

, the last two being inspired by the eponym approaches in the real setting.

The second aim of this course is to develop a general theory giving quite powerful tools to study the propagation of accuracy in the -adic world. The basic underlying idea is to linearize the situation (and then model the propagation of accuracy using differentials); it is once again inspired from classical methods in the real case. However, it turns out that the non-archimedean nature of (i.e. the fact that is bounded in ) is the source of many simplifications which will allow us to state much more accurate results and to go much further in the -adic setting. As an example, we shall see that the theory of -adic precision yields a general strategy for increasing the numerical stability of any given algorithm (assuming that the problem it solves is well-conditioned).

This course is organized as follows. §1 is devoted to the introduction of -adic numbers: we define them, prove their main properties and discuss in more details their place in Number Theory, Arithmetic Geometry and Symbolic Computation. The presentation of the standard implementations of -adic numbers mentioned above is achieved in §2. A careful comparison between them is moreover proposed and supported by many examples coming from linear algebra and commutative algebra. Finally, in §3, we expose the aforementioned theory of -adic precision. We then detail its applications: we will notably examine many very concrete situations and, for each of them, we will explain how the theory of -adic precision helps us either in quantifying the qualities of a given algorithm regarding to numerical stability or, even better, in improving them.

Acknowledgments.

This document contains the (augmented) notes of a lecture I gave at the “Journées Nationales du Calcul Formel333(French) National Computer Algebra Days” (JNCF) on January 2017. I heartily thank the organizers and the scientific committee of the JNCF for giving me the opportunity to give these lectures and for encouraging me to write down these notes. I am very grateful to Delphine Boucher, Nicolas Brisebarre, Claude-Pierre Jeannerod, Marc Mezzarobba and Tristan Vaccon for their careful reading and their helpful comments on an earlier version of these notes.

Notation.

We use standard notation for the set of numbers: is the set of natural integers (including ), is the set of relative integers, is the set of rational numbers and is the set of real numbers. We will sometimes use the soft- notation for writing complexities; we recall that, given a sequence of positive real numbers , is defined as the union of the sets for varying in .

Throughout this course, the letter always refers to a fixed prime number.

1 Introduction to -adic numbers

In this first section, we define -adic numbers, discuss their basic properties and try to explain, by selecting a few relevant examples, their place in Number Theory, Algebraic Geometry and Symbolic Computation. The presentation below is voluntarily very summarized; we refer the interested reader to [2, 35] for a more complete exposition of the theory of -adic numbers.

1.1 Definition and first properties

-adic numbers are very ambivalent objects which can be thought of under many different angles: computational, algebraic, analytic. It turns out that each point of view leads to its own definition of -adic numbers: computer scientists often prefer viewing a -adic number as a sequence of digits while algebraists prefer speaking of projective limits and analysts are more comfortable with Banach spaces and completions. Of course all these approaches have their own interest and understanding the intersections between them is often the key behind the most important advances.

In this subsection, we briefly survey all the standard definitions of -adic numbers and provide several mental representations in order to try as much as possible to help the reader to develop a good -adic intuition.

1.1.1 Down-to-earth definition

Recall that each positive integer can be written in base , that is as a finite sum:

where the ’s are integers between and , the so-called digits. This writing is moreover unique assuming that the most significant digit does not vanish. A possible strategy to compute the expansion in base goes as follows. We first compute by noting that it is necessarily the remainder in the Euclidean division of by : indeed it is congruent to modulo and lies in the range by definition. Once is known, we compute , which is also the quotient in the Euclidean division of by . Clearly and we can now compute repeating the same strategy. Figure 1.1 shows a simple execution of this algorithm.

Figure 1.1: Expansion of in base

By definition, a -adic integer is an infinite formal sum of the shape:

where the ’s are integers between and . In other words, a -adic integer is an integer written in base with an infinite number of digits. We will sometimes alternatively write as follows:

or simply

when no confusion can arise. The set of -adic integers is denoted by . It is endowed with a natural structure of commutative ring. Indeed, we can add, subtract and multiply -adic integers using the schoolbook method; note that handling carries is possible since they propagate on the left.

                 

Figure 1.2: Addition and multiplication in

The ring of natural integers appears naturally as a subring of : it consists of -adic integers for which when is large enough. Note in particular that the integer writes in and more generally writes with ending zeros. As a consequence, a -adic integer is a multiple of if and only if it ends with (at least) zeros. Remark that negative integers are -adic integers as well: the opposite of is, by definition, the result of the subtraction .


Similarly, we define a -adic number as a formal infinite sum of the shape:

where is an integer which may depend on . Alternatively, we will write:

and, when no confusion may arise, we will freely remove the bar and the trailing . A -adic number is then nothing but a “decimal” number written in base with an infinite number of digits before the decimal mark and a finite amount of digits after the decimal mark. Addition and multiplication extend to -adic numbers as well.

The set of -adic numbers is denoted by . Clearly . We shall see later (cf Proposition 1.1, page 1.1) that is actually the fraction field of ; in particular it is a field and , which is the fraction field of , naturally embeds into .

1.1.2 Second definition: projective limits

From the point of view of addition and multiplication, the last digit of a -adic integer behaves like an integer modulo , that is an element of the finite field . In other words, the application taking a -adic integer to the class of modulo is a ring homomorphism. More generally, given a positive integer , the map:

is a ring homomorphism. These morphisms are compatible in the following sense: for all , we have (and more generally provided that ). Putting the ’s all together, we end up with a ring homomorphism:

where is by definition the subring of consisting of sequences for which for all : it is called the projective limit of the ’s.

Conversely, consider a sequence . In a slight abuse of notation, continue to write for the unique integer of the range which is congruent to modulo and write it in base :

(the expansion stops at since by construction). The condition implies that for all . In other words, when remains fixed, the sequence is constant and thus converges to some . Set:

We define this way an application which is by construction a left and a right inverse of . In other words, and are isomorphisms which are inverses of each other.

The above discussion allows us to give an alternative definition of , which is:

The map then corresponds to the projection onto the -th factor. This definition is more abstract and it seems more difficult to handle as well. However it has the enormous advantage of making the ring structure appear clearly and, for this reason, it is often much more useful and powerful than the down-to-earth definition of §1.1.1. As a typical example, let us prove the following proposition.

Proposition 1.1.
  1. An element is invertible in if and only if does not vanish.

  2. The ring is the fraction field of ; in particular, it is a field.

Proof.

(a) Let . Viewing as , we find that is invertible in if and only if is invertible in for all . The latest condition is equivalent to requiring that and are coprime for all . Noting that is prime, this is further equivalent to the fact that does not vanish in .

(b) By definition . It is then enough to prove that any nonzero -adic integer can be written as a product where is a nonnegative integer and is a unit in . Let be the number of zeros at the end of the -adic expansion of (or, equivalently, the largest integer such that ). Then can be written where is a -adic integer whose last digit does not vanish. By the first part of the proposition, is then invertible in and we are done. ∎

We note that the first statement of Proposition 1.1 shows that the subset of non-invertible elements of is exactly the kernel of . We deduce from this that is a local ring with maximal ideal .

1.1.3 Valuation and norm

We define the -adic valuation of the nonzero -adic number

as the smallest (possibly negative) integer for which does not vanish. We denote it or simply if no confusion may arise. Alternatively can be defined as the largest integer such that . When , we put . We define this way a function . Writing down the computations (and remembering that is prime), we immediately check the following compatibility properties for all :

  1. ,

  2. .

Note moreover that the equality does hold as soon as . As we shall see later, this property reflects the tree structure of (see §1.1.5).

The -adic norm is defined by for . In the sequel, when no confusion can arise, we shall often write instead of . The properties (1) and (2) above immediately translate as follows:

  1. and equality holds if ,

  2. .

Remark that (1’) implies that satisfies the triangular inequality, that is for all . It is however much stronger: we say that the -adic norm is ultrametric or non Archimedean. We will see later that ultrametricity has strong consequences on the topology of (see for example Corollary 1.3 below) and strongly influences the calculus with -adic (univariate and multivariate) functions as well (see §3.1.4). This is far from being anecdotic; on the contrary, this will be the starting point of the theory of -adic precision we will develop in §3.

The -adic norm defines a natural distance on as follows: we agree that the distance between two -adic numbers and is . Again this distance is ultrametric in the sense that:

Moreover the equality holds as soon as : all triangles in are isosceles! Observe also that takes its values in a proper subset of (namely ) whose unique accumulation point is . This property has surprising consequences; for example, closed balls of positive radius are also open balls and vice et versa. In particular is open (in ) and compact according to the topology defined by the distance. From now on, we endow with this topology.

Clearly, a -adic number lies in if and only if its -adic valuation is nonnegative, that is if and only if its -adic norm is at most . In other words, appears as the closed unit ball in . Viewed this way, it is remarkable that it is stable under addition (compare with ); it is however a direct consequence of the ultrametricity. Similarly, by Proposition 1.1, a -adic integer is invertible in if and only if it has norm , meaning that the group of units of is then the unit sphere in . As for the maximal ideal of , it consists of elements of positive valuation and then appears as the open unit ball in (which is also the closed ball of radius ).

1.1.4 Completeness

The following important proposition shows that is nothing but the completion of according to the -adic distance. In that sense, arises in a very natural way… just as does .

Proposition 1.2.

The space equipped with its natural distance is complete (in the sense that every Cauchy sequence converges). Moreover is dense in .

Proof.

We first prove that is complete. Let be a -valued Cauchy sequence. It is then bounded and rescaling the ’s by a uniform scalar, we may assume that (i.e. ) for all . For each , write :

with . Fix an integer and set . Since is a Cauchy sequence, there exists a rank with the property that for all . Coming back to the definition of the -adic norm, we find that is divisible by . Writing and computing the sum, we get for all . In particular the sequence is ultimately constant. Let denote its limit. Now define and consider again . Let be an integer such that . By construction, there exists a rank for which whenever and . For , the difference is then divisible by and hence has norm at most . Hence converges to .

We now prove that is dense in . Since , it is enough to prove that is dense in . Pick and write . For a nonnegative integer , set . Clearly is an integer and the sequence converges to . The density follows. ∎

Corollary 1.3.

Let be a sequence of -adic numbers. The series converges in if and only if its general term converges to .

Proof.

Set . Clearly for all . If converges to a limit , then converges to . We now assume that goes to . We claim that is a Cauchy sequence (and therefore converges). Indeed, let and pick an integer for which for all . Given two integers and with , we have:

thanks to ultrametricity. Therefore and we are done. ∎

1.1.5 Tree representation

height

Figure 1.3: Tree representation of

Geometrically, it is often convenient and meaningful to represent as the infinite full -ary tree. In order to explain this representation, we need a definition.

Definition 1.4.

For and , we set:

An interval of is a subset of of the form for some and .

If decomposes in base as , the interval consists exactly of the -adic integers whose last digits are in this order. On the other hand, from the analytic point of view, the condition is equivalent to . Thus the interval is nothing but the closed ball of centre and radius . Even better, the intervals of are exactly the closed balls of .

Clearly if and only if . In particular, given an interval of , there is exactly one integer such that . We will denote it by and call it the height of . We note that there exist exactly intervals of of height since these intervals are indexed by the classes modulo (or equivalently by the sequences of digits between and ).

From the topological point of view, intervals behave like : they are at the same time open and compact.

We now define the tree of , denoted by , as follows: its vertices are the intervals of and we put an edge whenever and . A picture of is represented on Figure 1.3. The labels indicated on the vertices are the last digits of . Coming back to a general , we observe that the height of an interval corresponds to the usual height function in the tree . Moreover, given two intervals and , the inclusion holds if and only if there exists a path from to .

Elements of bijectively correspond to infinite paths of starting from the root through the following correspondence: an element is encoded by the path

Under this encoding, an infinite path of starting from the root

corresponds to a uniquely determined -adic integer, which is the unique element lying in the decreasing intersection . Concretely each new determines a new digit of ; the whole collection of the ’s then defines entirely. The distance on can be visualized on as well: given , we have where is the height where the paths attached to and separate.

The above construction easily extends to .

Definition 1.5.

For and , we set:

A bounded interval of is a subset of of the form for some and .

Similarly to the case of , a bounded interval of of height is a subset of consisting of -adic numbers whose digits at the positions are fixed (they have to agree with the digits of at the same positions).

The graph is defined as follows: its vertices are the intervals of while there is an edge if and . We draw the attention of the reader to the fact that is a tree but it is not rooted: there does not exist a largest bounded interval in . To understand better the structure of , let us define, for any integer , the subgraph of consisting of intervals which are contained in . From the fact that is the union of all , we derive that . Moreover, for all , is a rooted tree (with root ) which is isomorphic to except that the height function is shifted by . The tree is thus obtained by juxtaposing copies of and linking the roots of them to a common parent (which then becomes the new root).

1.2 Newton iteration over the -adic numbers

Newton iteration is a well-known tool in Numerical Analysis for approximating a zero of a “nice” function defined on a real interval. More precisely, given a differentiable function , we define a recursive sequence by:

(1.1)

Under some assumptions, one can prove that the sequence converges to a zero of , namely . Moreover the convergence is very rapid since, assuming that is twice differentiable, we usually have an inequality of the shape for some . In other words, the number of correct digits roughly doubles at each iteration. The Newton recurrence (1.1) has a nice geometrical interpretation as well: the value is the -coordinate of the intersection point of the -axis with the tangent to the curve at the point (see Figure 1.4).

Figure 1.4: Newton iteration over the reals

1.2.1 Hensel’s Lemma

It is quite remarkable that the above discussion extends almost verbatim when is replaced by . Actually, extending the notion of differentiability to

-adic functions is quite subtle and probably the most difficult part. This will be achieved in §

3.1.3 (for functions of class ) and §3.2.3 (for functions of class ). For now, we prefer avoiding these technicalities and restricting ourselves to the simpler (but still interesting) case of polynomials. For this particular case, the Newton iteration is known as Hensel’s Lemma and already appears in Hensel’s seminal paper [36] in which -adic numbers are introduced.

Let be a polynomial in the variable with coefficients in . Recall that the derivative of can be defined in a purely algebraic way as .

Theorem 1.6 (Hensel’s Lemma).

Let be a polynomial with coefficients in . We suppose that we are given some with the property that . Then the sequence defined by the recurrence:

is well defined and converges to with . The rate of convergence is given by:

Moreover is the unique root of in the open ball of centre and radius .

The proof of the above theorem is based on the next lemma:

Lemma 1.7.

Given and , we have:

  1. .

  2. .

Proof.

For any nonnegative integer , define where stands for the -th derivative of . Taylor’s formula then reads:

(1.2)

Moreover, a direct computation shows that the coefficients of are obtained from that of by multiplying by binomial coefficients. Therefore has coefficients in . Hence , i.e. , for all . We deduce that each summand of the right hand side of (1.2) has norm at most . The first assertion follows while the second is proved similarly. ∎

Proof of Theorem 1.6.

Define . We first prove by induction on the following conjunction:

Clearly holds. We assume now that holds for some . We put so that . We write . Observe that the first summand has norm at most by the first assertion of Lemma 1.7, while by the induction hypothesis. The norm of is then the maximum of the norms of the two summands, which is . Now, applying again Lemma 1.7, we get and the induction goes.

Coming back to the recurrence defining the ’s, we get:

(1.3)

By Corollary 1.3, this implies the convergence of the sequence . Its limit is a solution to the equation . Thus has to vanish. The announced rate of convergence follows from Eq. (1.3) thanks to ultrametricity.

It remains to prove uniqueness. For this, consider with and . Since , we deduce as well. Applying Lemma 1.7 with and , we find . Since , this implies , i.e. . Uniqueness is proved. ∎

Remark 1.8.

All conclusions of Theorem 1.6 are still valid for any sequence satisfying the weaker assumption:

(the proof is entirely similar). Roughly speaking, this stronger version allows us to work with approximations at each iteration. It will play a quite important role for algorithmic purpose (notably in §2.1.3) since computers cannot handle exact -adic numbers but always need to work with truncations.

1.2.2 Computation of the inverse

A classical application of Newton iteration is the computation of the inverse: for computing the inverse of a real number , we introduce the function and the associated Newton scheme. This leads to the recurrence with initial value where is sufficiently close to .

In the -adic setting, the same strategy applies (although it does not appear as a direct consequence of Hensel’s Lemma since the mapping is not polynomial). Anyway, let us pick an invertible element , i.e. , and define the sequence by:

where is any -adic number whose last digit is the inverse modulo of the last digit of . Computing such an element reduces to computing a modular inverse and thus is efficiently feasible.

Proposition 1.9.

The sequence defined above converges to . More precisely, we have for all .

Proof.

We prove the last statement of the proposition by induction on . By construction of , it holds for . Now observe that . Taking norms on both sides, we get and the induction goes this way. ∎

Quite interestingly, this method extends readily to the non-commutative case. Let us illustrate this by showing how it can be used to compute inverses of matrices with coefficients in . Assume then that we are starting with a matrix whose reduction modulo is invertible, i.e. there exists a matrix such that where

is the identity matrix of size

. Note that the computation of reduces to the inversion of an matrix over the finite field and so can be done efficiently. We now define the sequence by the recurrence:

(be careful with the order in the last term). Mimicking the proof of Proposition 1.9, we write:

and obtain this way that each entry of has norm at most . Therefore converges to a matrix satisfying , i.e. (which in particular implies that is invertible). A similar argument works for

-adic skew polynomials and

-adic differential operators as well.

1.2.3 Square roots in

Another important application of Newton iteration is the computation of square roots. Again, the classical scheme over applies without (substantial) modification over .

Let be a -adic number. If the -adic valuation of

is odd, then

is clearly not a square in and it does not make sense to compute its square root. On the contrary, if the -adic valuation of is even, then we can write where is an integer and is a unit in . Computing a square root then reduces to computing a square root of ; in other words, we may assume that is invertible in , i.e. .

We now introduce the polynomial function . In order to apply Hensel’s Lemma (Theorem 1.6), we need a first rough approximation of . Precisely, we need to find some with the property that . From , the above inequality implies as well and therefore can be rewritten as where if and otherwise. We then first need to compute a square root of modulo . If , this can be achieved simply by looking at the table of squares modulo :

Observe moreover that is necessarily odd (since it is assumed to be invertible in ). If it is congruent to modulo , ; otherwise, there is no solution and has no square root in . When , we have to compute a square root in the finite field for which efficient algorithms are known [23, §1.5]. If is not a square in , then does not admit a square root in either.

Once we have computed the initial value , we consider the recursive sequence defined by and

By Hensel’s Lemma, it converges to a limit which is a square root of . Moreover the rate of convergence is given by:

meaning that the number of correct digits of is at least (resp. ) when (resp. ).

1.3 Similarities with formal and Laurent series

According to the very first definition of we have given (see §1.1.1), -adic integers have formal similitudes with formal series over : they are both described as infinite series with coefficients between and , the main difference being that additions and multiplications involve carries in the -adic case while they do not for formal series. From a more abstract point of view, the parallel between and is also apparent. For example is also endowed with a valuation; this valuation defines a distance on for which