# A formal proof of Hensel's lemma over the p-adic integers

The field of p-adic numbers Q_p and the ring of p-adic integers Z_p are essential constructions of modern number theory. Hensel's lemma, described by Gouvêa as the "most important algebraic property of the p-adic numbers," shows the existence of roots of polynomials over Z_p provided an initial seed point. The theorem can be proved for the p-adics with significantly weaker hypotheses than for general rings. We construct Q_p and Z_p in the Lean proof assistant, with various associated algebraic properties, and formally prove a strong form of Hensel's lemma. The proof lies at the intersection of algebraic and analytic reasoning and demonstrates how the Lean mathematical library handles such a heterogeneous topic.

## Authors

• 8 publications
• ### Formalizing the Ring of Witt Vectors

The ring of Witt vectors 𝕎 R over a base ring R is an important tool in ...
10/06/2020 ∙ by Johan Commelin, et al. ∙ 0

• ### Calcium: computing in exact real and complex fields

Calcium is a C library for real and complex numbers in a form suitable f...
11/03/2020 ∙ by Fredrik Johansson, et al. ∙ 0

• ### Algebraic and algorithmic aspects of radical parametrizations

09/30/2016 ∙ by J. Rafael Sendra, et al. ∙ 0

• ### Formalized Haar Measure

We describe the formalization of the existence and uniqueness of Haar me...
02/04/2021 ∙ by Floris van Doorn, et al. ∙ 0

• ### A Formal Proof of PAC Learnability for Decision Stumps

We present a machine-checked, formal proof of PAC learnability of the co...
11/01/2019 ∙ by Joseph Tassarotti, et al. ∙ 0

• ### New relations and separations of conjectures about incompleteness in the fnite domain

Our main results are in the following three sections: 1. We prove new ...
04/02/2019 ∙ by Erfan Khaniki, et al. ∙ 0

• ### Verifier Theory and Unverifiability

Despite significant developments in Proof Theory, surprisingly little at...
09/01/2016 ∙ by Roman V. Yampolskiy, et al. ∙ 1

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1. Introduction

It has long been a goal of the formalized mathematics community to verify the typical undergraduate mathematics curriculum in a unified way using a single proof assistant. Various systems have achieved this goal to varying degrees (Bancerek et al., 2018; Harrison, [n. d.]; Mahboubi and Tassi, [n. d.]; Megill, 2006; Nipkow, [n. d.]), and it now seems reasonable to say that most components of this curriculum have been formalized in one system or another. Undeniably, though, some fields have received much heavier attention than others; in particular, it appears that formal developments in number theory and geometry have lagged behind those in other domains. This imbalance becomes even greater when one looks beyond undergraduate mathematics. While a few landmark projects have verified deep mathematical results (Gonthier, 2008; Gonthier et al., 2013; Hales et al., 2017), the associated theory developments have been thin and specific to the target theorems. Research-level theorems have been formalized, but research-level theories remain largely untouched.

A recent project begun at the Vrije Universiteit Amsterdam aims to address this imbalance by bringing together traditional mathematicians, formalizers, and tool developers to work toward modern results in number theory. With researchers working at all levels of the theorem proving pipeline, the project will search for technology and design decisions that make it plausible to formalize deep mathematics that spans across fields. This paper breaks some initial ground to build toward this goal by constructing the -adic numbers and the -adic integers in the Lean proof assistant and verifying Hensel’s lemma, a foundational result about these numbers.

The -adics are a fundamental object of study in number theory with both theoretic and numeric applications. Their construction involves a mix of analytic and algebraic methods. For this reason, they make an excellent (or even necessary) point from which to embark on a project to formalize number theory. Hensel’s lemma, an analogue of Newton’s method for approximating roots, holds a prominent place in the study of the -adics. Its computational applications make it of interest to number theorists and computer scientists alike.

In Section 2, we give an informal overview of the -adic numbers and Hensel’s lemma, outlining the construction and proof followed in our formalization. Section 3 briefly describes the Lean theorem prover and the mathematical library on which this project depends. Sections 4 and 5 explain the formal construction of and and the formal proof of Hensel’s lemma, respectively, focusing on design decisions made during the formalization process. In Sections 6 and 7, we consider related work and reflect on the project.

The formalization described in this paper is incorporated into the Lean mathematical library, available on GitHub. Since this library is regularly changing, we preserve a snapshot of its status at the time this paper was submitted. This snapshot, and a map between this paper and the formalization, can be found on the author’s website. The code blocks presented in this paper should be read as schematic, not literal: we sometimes change names, omit universe levels, and swap implicit and explicit arguments for the sake of presentation.

## 2. The p-adic Numbers and Hensel’s Lemma

Readers who have seen the construction of the real numbers via Cauchy sequences of rationals will find the construction of familiar. To motivate stepping from to , we traditionally point to the fact that is incomplete: there are sequences of rational numbers that seem to approach a value but do not converge to any rational number. As an example, consider the sequence where the entry at index is the greatest -digit rational number whose square is less than 2. This sequence has no limit in , since is irrational. We obtain by creating points to represent the limit of and similarly “convergent” sequences.

More precisely, we say that a sequence is Cauchy if for every positive , there exists a number such that for all , . Two sequences and are equivalent, written , if for every positive , there exists an such that for all , . Morally, a Cauchy sequence is one whose terms eventually get arbitrarily close to each other, and should converge to a (possibly irrational) value; two Cauchy sequences are equivalent if they should converge to the same value. The set of real numbers is defined to be the quotient of the set of Cauchy sequences with respect to , which is to say that a real number is a set of equivalent Cauchy sequences. It can be shown that Cauchy sequences inherit the field operations from , and that these operations respect , so they can be lifted to the quotient.

We often call the completion of because it is the smallest extension of in which all Cauchy sequences converge. But it is more accurate to say that is a completion of , since the notion of a Cauchy sequence is parametrized by a notion of closeness. A function on is a (generic) absolute value if it is positive-definite ( and otherwise), subadditive (), and multiplicative (). We can replay the construction above, replacing the standard absolute value on with any generic absolute value. If we use the trivial absolute value , then a sequence of rationals is Cauchy if and only if it is eventually constant, and so the completion process adds no new points to .

There are infinitely many absolute values on , even identifying scalar multiples: every prime number induces a unique absolute value. Fix with , and for with , define the -adic valuation to be the largest such that , which we read as “ divides .” This valuation extends to by setting , where and are coprime. The -adic norm on is defined by with . If is prime, this function is an absolute value. Surprisingly, this exhausts the list of possibilities. Ostrowski’s theorem (Ostrowski, 1916) states that any absolute value on is (a positive scalar power of) either the standard absolute value, the trivial absolute value, or the -adic norm for some prime . The -adic numbers are the completion of with respect to the -adic norm. Since a positive scalar power of an absolute value induces an isomorphic completion, we can focus our attention on and the family .

To get an intuition for how the -adic numbers behave, consider what it means for two rationals to be close under the -adic norm. A rational has small -adic norm if is small, that is, if is large, which means that is divisible by a large power of . Thus is small if and are separated by a multiple of for some large . The elements of a Cauchy sequence, then, are separated by multiples of larger and larger powers of .

The traditional decimal expansion allows us to write any nonzero real number in the form , where is a (finite, possibly negative) integer and each with . We can analogously define a -adic expansion allowing us to write any nonzero element of uniquely in the form , with and (Figure 1). Such a series does not necessarily converge to a real number, but it does converge under the -adic norm. For , the difference between the th and th partial sums is divisible by , so its norm is bounded by , which tends to 0 as grows.

The -adic norm on extends to a norm on , such that . These norms share a useful (if counterintuitive) property. The familiar triangle inequality states that for any and . But we can show something stronger for the -adic norm, namely the nonarchimedean property, which states that

 |x+y|_p≤max(|x|_p,|y|_p).

In fact, if , it holds that .

Under the standard absolute value, any Cauchy sequence of integers is eventually constant, but this is not the case under the -adic norm. It can be shown that such a sequence converges to a -adic number with , and so we define the -adic integers (Figure 2). Equivalently, a -adic integer is a -adic number whose -adic expansion has no nonzero values to the right of the decimal point. Because of the nonarchimedean property, the sum of two -adic integers is still an integer, and thus forms a ring.

As a complete structure with a nonarchimedean norm, is a natural setting to develop a theory of analysis. Many of the familiar notions of calculus over are simplified in this setting. For instance, deciding whether an infinite series of real numbers converges can be a subtle problem, and calculus students traditionally learn a list of convergence tests to answer it. Over , the nonarchimedean property guarantees that such a series converges if and only if . Mahler’s theorem (Mahler, 1958) describes a remarkably straightforward characterization of continuous functions on .

Applications of -adic analysis arise in many areas of number theory, including in the studies of Diophantine equations and arithmetic progressions (Lech, 1953). In computer science, the -adics can be used to implement efficient rational arithmetic (Hehner and Horspool, 1979). The -adic integers are particularly useful for establishing facts about divisibility and modularity. Just as analysis over is in some sense simpler than analysis over , the algebraic structure on makes these results comparatively easy to obtain. Another application is in the method of Chabauty–Coleman (McCallum and Poonen, 2012), which can often be used to determine rational points on algebraic varieties. This method is used in the resolution of certain generalized Fermat equations (Dahmen and Siksek, 2014), closely related to the mathematics that the Lean Forward project will address.

Gouvêa (Gouvêa, 1997) cites Hensel’s lemma (Hensel, 1908) as the “most important algebraic property of the -adic numbers.” This result, which establishes a connection between the number-theoretic properties and the analysis of polynomial functions over , is the backbone of the study of the -adics. It is often applied to prove the (non)existence of solutions to polynomial equations over various rings; in computer science, it appears in floating point rounding algorithms. Hensel’s lemma is stated in the literature in many forms. The central idea is that for any univariate polynomial over , if one can find a point such that and satisfy certain requirements, then has a unique root within a neighborhood of . (We state the hypotheses explicitly in Section 5.)

Hensel’s lemma can be used to reduce the problem of finding roots of a polynomial over to the (finite) problem of finding roots over , typically for small . The local-global principle, also known as the Hasse or Hasse-Minkowski principle, is one of the central principles of Diophantine geometry (Serre, 1973). It describes a general system that aims translating questions about roots over to questions about roots over and , which are often easier to answer. A striking application of this principle shows that a quadratic form over has nontrivial roots in if and only if it has nontrivial roots in and for all prime . The scope and applications of the local-global principle are actively explored in number theory today; Browning (Browning, 2018) gives a survey of recent results.

## 3. The Lean Mathematical Library

The Lean proof assistant, developed principally by Leonardo de Moura, was first released in 2014 (de Moura et al., 2014). Lean implements a version of the calculus of inductive constructions (CIC) (Coquand and Paulin, 1990) with support for quotient types and classical reasoning. Since the release of the most recent version in 2017 (de Moura et al., 2017), there has been a concerted effort to develop mathlib, a comprehensive library for use in mathematics and computer science (Carneiro, 2018).

Lean’s mathlib is younger and smaller than similar libraries in other systems, such as Coq’s Mathematical Components (Mahboubi and Tassi, [n. d.]) or Isabelle’s Archive of Formal Proofs (Nipkow, [n. d.]), but it contains developments in many important areas of mathematics. It notably includes a proof of the law of quadratic reciprocity, a model of ZFC, and the construction of the Lebesgue measure on .

The datatypes available in mathlib include the concrete types commonly found in mathematics, among them , , , , and ; finite sets and multisets over a base type; and embeddings and isomorphisms between types. The algebraic hierarchy of mathlib is designed using type classes, which endow a base type with extra structure in the forms of operations, properties, and notation (Spitters and van der Weegen, 2011; Wadler and Blott, 1989). Lean’s type class resolution mechanism automatically manages inheritance between type classes (Figure 3). If a type class T extends (directly or by transitivity) a type class T, any theorem proved over T will apply to any type that instantiates T. The algebraic hierarchy begins with semigroups and monoids and extends to rich structures including fields, Noetherian rings, and principal ideal domains. Van Doorn, von Raumer, and Buchholz (van Doorn et al., 2017) give a more detailed explanation of how type classes are used to define an algebraic hierarchy in Lean.

Topological structure is also managed using type classes. In particular, topologies on metric spaces, normed spaces, and similar structures are inherited from the topology defined on uniform spaces (Bourbaki, 1998), of which all of these structures are instances. Topological notions such as limits and continuity are defined using filters (Hölzl et al., 2013), which specialize to more familiar definitions on metric or normed spaces.

In contrast to many other libraries for CIC-based systems, mathlib does not focus on constructive mathematics. Most of the core datatypes are defined computably, making them able to be reduced in the kernel or virtual machine. But the more abstract mathematical theories freely use classical logic; these theories are mostly noncomputable. Since the system can easily track the computability of a declaration, terms that do not depend on additional axioms will still compute.

Lean features a powerful metaprogramming framework that allows users to write custom tactics in the language of Lean itself (Ebner et al., 2017). There are a number of such tactics included in mathlib. Relevant to this project are linarith, which proves linear inequality goals using certified Fourier-Motzkin elimination; ring, a tactic based on Gregoire and Mahboubi’s work in Coq (Gregoire and Mahboubi, 2005) which normalizes expressions in the language of (semi)rings; and wlog, which reduces symmetric goals to a single case.

The development described in this paper uses a large portion of mathlib. In particular, it makes use of the concrete datatypes and , along with many lemmas concerning divisibility and modular arithmetic; the topology library, for properties about continuity and limits; the analysis library, for the definitions of normed rings and fields and the topological properties of these structures; the abstract algebra library, to derive additional algebraic structure on ; and the polynomial library, which is needed even to state Hensel’s lemma. This project has led to contributions to mathlib in all of these domains.

Readers unused to Lean syntax should note that explicit arguments to declarations are enclosed in parentheses (), implicit arguments are enclosed in curly brackets \{\}, and type class arguments are enclosed in square brackets []. Only explicit arguments are given by the user when applying a declaration. For instance, writing a theorem as

lemma one_mul  : Type} [group α] (a : α) :

specifies that the type α is supposed to be inferred automatically (say, from the argument a). The group structure on α, which is introduced anonymously, should be inferred by type class resolution. In the context z : , Lean will confirm that one_mul z is a proof that 1 * z = z.

Another important feature of Lean syntax is its projection notation. Suppose S is a structure (or record) type with a field val, and t : S. The typical way to access the val field of t is by S.val t; here S.val is a compound name, with val living in the namespace S. Lean also admits the abbreviation t.val, using the period to separate a term and a name. This notation is not restricted to projections, although it is most commonly used there. In general, if a term named T.op has been defined and t : T, then t.op abbreviates T.op where t is inserted as the first argument of type T. For a concrete example, consider the type polynomial α and the operator

polynomial.eval : α  polynomial α  α

which evaluates a univariate polynomial at an argument. If we have F : polynomial α and a : α, we can use the notation F.eval a in place of polynomial.eval a F. This notation can be nested, e.g. to replace

polynomial.eval a (polynomial.derivative F)

with F.derivative.eval a.

## 4. Formalizing the p-adic Numbers

In this section we describe the formal construction of and and the proofs of their associated algebraic properties. We approximately follow the presentation from Gouvêa (Gouvêa, 1997), although many of the ideas here are canonical in the mathematical literature. Broadly, our construction goes by the following plan:

1. Define the -adic valuation on , extend it to , and use this to define the -adic norm.

2. Show that the -adic norm on is a non-archimedean absolute value.

3. Define as the completion of with respect to the -adic norm.

4. Show that inherits field operations and a norm from .

5. Define as a subtype of , and show that it instantiates various algebraic structures.

Throughout this development, we will fix a natural number . Some proofs in step 2 assume only that . In the rest of the development, we work under the assumption that is prime. We manage this primality assumption using type classes, so such arguments never need to be given explicitly. In the code snippets below, we typically assume that these arguments have been fixed as parameters, and only include them in the signatures of our functions when we wish to highlight them.

The valuation and norm functions defined in step 1 are total: instead of taking proofs (e.g. that is prime) as arguments, they return the value 0 when their arguments are not in the intended domain. This approach to defining partial functions is common in logics that support only total functions. Proofs of properties of these functions assume that the arguments are in the intended domain; these proofs are often inferred by the type class mechanism and are thus transparent to the user.

### 4.1. The p-adic Valuation and Norm on Q

The -adic valuation of an integer is the largest such that . This extends to by setting when and are coprime. We define these functions in Lean using the operator nat.find_greatest P b, which returns the greatest n  b satisfying the predicate P. Recall that z.nat_abs, q.num, and q.denom are projection notation for int.nat_abs z, rat.num q, and rat.denom q respectively.

def padic_val (p : ℕ) (z : ℤ) :  :=
if z = 0 then 0
else if p > 1 then
nat.find_greatest  k, (p ^ k)  z) z.nat_abs
else 0
def padic_val_rat (p : ℕ) (q : ℚ) :  :=
def padic_norm (p : ℕ) (q : ℚ) :  :=
if q = 0 then 0
else (p : ℚ)^(-(padic_val_rat p q))

These definitions are computable and can thus be evaluated on closed inputs. Note that padic_val and padic_norm both require the natural number p as an explicit argument. In general, p cannot be inferred from context. This makes it difficult to introduce generic notation for these functions, or to use to instantiate type classes that depend on the norm, such as normed_field. This complication will be resolved once we define , since p will be an argument to the type of -adic numbers.

### 4.2. Properties of the p-adic Norm

Proving the essential properties of is similarly straightforward, under the assumption that . The only lemmas that require to be prime are the multiplicative properties, e.g.:

lemma mul {m n : ℤ} (hm : m  0) (hn : n  0) :

For the most part, the properties of padic_norm follow from analogous properties of padic_val_rat, which themselves follow from analogous properties of padic_val. Lifting proofs requires some care with casts between , , and .

The most involved proof in this section is the core of the later proof that the -adic norm is nonarchimedean.

(hq : q  0) (hr : r  0) (hqr : q + r  0) :

Proving this fact requires an elementary but subtle computation. Once it is completed, the proof that padic_norm p instantiates the is_absolute_value type class (Figure 4) follows quickly. This instance depends on the primality of , which is inferred by type class resolution.

### 4.3. Completing Q

There are many related notions of Cauchy completions in the mathematical literature, varying in the level of abstraction and in the structure on the base space. We considered a number of options for constructing .

The first and most generic option was to perform the uniform completion of with respect to the uniform structure generated by the -adic norm (James, 1999). A uniform space is an abstraction that falls somewhere in between a metric space, in which every two points are separated by a real-valued distance, and a topological space, which provides a generic but unquantified notion of “separatedness.” A uniform structure allows one to consider relative distances between points without assigning concrete values to these distances. Any uniform space can be completed by considering the space of Cauchy filters over , where a Cauchy filter is a topological generalization of a Cauchy sequence. When the uniform structure on is induced by a metric (or norm), this construction reduces to the completion described in Section 2.

The uniform completion process has been formalized in Lean and could, in principle, be immediately specialized to obtain . However, the generality of this construction is not so amenable to a “concrete” number type like . It is quite difficult to lift operations that are not uniformly continuous, such as multiplication, from the base type to the completion space. Furthermore, one must reconcile the filter-based notion of completeness with a sequential notion in order to prove Hensel’s lemma, which depends on finding a limit of a sequence of -adic integers. These complications created by the generality of the uniform completion came with few upsides; it seemed more prudent to take a different approach. Regardless of the initial construction, is easily instantiated as a complete uniform space after the fact.

A second option was to specialize the uniform completion to the completion of a normed structure. (We rejected the idea of using the metric completion, which falls in between, since it comes with all the disadvantages of the norm completion and more.) Under this specialization, the interface for lifting operations looks more familiar. Norm completions are not uncommon in mathematics, so although implementing an interface for this would take some initial effort, it could be reused in the future.

Two downsides discouraged us from this approach. First, the norm referred to in an arbitrary normed space is typically real-valued. Defining would thus depend on , which is already a completion of (using a different completion process). While logically sound, this approach would be pedagogically dubious, since it removes the direct analogy between and ; it would also obscure the fact that the -adic norm only takes rational values. A second, more practical, concern with this approach was related to the necessary type class inference. The norm on a space is generally inferred automatically, and the default instance for is the traditional absolute value. To use the -adic norm instead, we would have to locally override this instance. Doing this would be possible but could lead to complications in future developments that use the -adic norm and traditional absolute value on simultaneously.

The option we elected to follow is to directly define the Cauchy sequence completion of a type , with respect to an absolute value on taking values in an arbitrary ordered field (Figure 4). If has a ring or field structure, this structure lifts immediately to the completion. This generalizes the former definition of in mathlib; we implement both and as instances of this general construction. It has the added benefit that the ring operations on are computable, although this property is not used in the current development. The downside to this approach is that some work must be done to connect the concrete -definition of convergence to more general topological notions. This extra work can be minimized by instantiating as a normed field, which we do in step 4.

def padic (p : ℕ) [prime p] : Type :=

The completion function takes two explicit arguments, a type and a field-valued function on that type. An implicit argument, inferred by type class resolution, shows that the field-valued function is an absolute value. Since the -adic norm is only an absolute value if is prime, it is essential to assume primality in the definition of . This hypothesis is also an implicit type class argument and normally will be inferred automatically.

### 4.4. Operations on Qp

The field operations on are obtained for free through the Cauchy sequence construction. It also follows directly that is embedded in . (Any constant sequence of rationals is Cauchy and is not equivalent to any other constant sequence of rationals, so each rational induces a unique -adic number .)

It takes more effort to lift the -adic norm on to a norm on . Intuitively, one might be tempted to claim that “the norm of the limit is the limit of the norms,” that is, that we should define the norm of a Cauchy sequence to be the limit of the norms of each entry. But since the -adic norm is rational-valued, and is not complete, this would unhelpfully produce a -valued norm. Note the contrast with the real numbers: because the linear order on lifts to , a real-valued norm makes sense. Instead, we exploit an important property of the -adic norm, derived from the fact that its values lie in the set . Because these values are separated (except at 0), the norms of the entries of any (nonzero) Cauchy sequence are eventually constant.

lemma stationary {f : cau_seq  (padic_norm p)}
(hf : ¬ f  0) :  N,  m n, m  N  n  N
def norm (f : cau_seq  (padic_norm p)) :  :=
if hf : f  0 then 0
else padic_norm p (f (stationary_point hf))
quotient.lift norm norm_respects_equiv

Thus, the limit is indeed rational valued, and we can define a rational-valued norm on the type of Cauchy sequences. This norm respects equivalence, so it can be lifted to the quotient . With some work, the norm can be shown to preserve the essential properties of the norm on , including the nonarchimedean property. We can also check that the norm is indeed an extension of the norm on , meaning that for any , .

Since we have defined as the completion of with respect to the -adic norm, and the -adic norm extends to , it is important to check that is in fact complete with respect to its norm. We again highlight the difference here between and . Since the absolute value on is real-valued, as opposed to rational-valued, the arguments that and are complete differ in significant ways. We do not use a general proof to cover both cases, since despite some structural similarity, the generalization is rather convoluted. We also prove that is dense in —meaning that every is arbitrarily close to for some —similarly to how we prove the analogous statement in , with a separate implementation to account for the different absolute value.

We have thus established that is a complete field, densely embedding , with a nonarchimedean norm that extends the -adic norm on . These properties uniquely characterize : any structure with these properties is isomorphic to . (We have not yet formalized this statement.)

Finally, we instantiate as a normed field. From this instance, inherits a topology and uniform structure. The only complication, mentioned above, is that the generic norm of a normed ring is real-valued instead of rational-valued. But since the essential properties of the -adic norm are already established, casting to is less troublesome here; similarly, the pedagogical concerns about using in the construction of are no longer relevant.

From the normed field instance, we inherit the generic notation x for the norm of a -adic number x. Unlike for the -adic norm on , there is no ambiguity here about the parameter , since it can be inferred from the type of x.

### 4.5. Defining Zp

The -adic integers are traditionally defined as the subset . This is equivalent to the completion of using the -adic norm, but for formalization purposes, the former definition is much simpler.

def padic_int (p : ℕ) [prime p] : Type :=
{z : _[p] // z  1}

The notation here is for Lean’s subtype data structure, meaning that a term z : _[p] is a dependent pair of a term x : _[p] with a proof that x  1. Note that this is not a “strict” subtype, in the sense that the term z does not have type _[p]; rather z.val, the first projection of z, has this type. We can move between the two types with little friction by defining a coercion from _[p] to _[p]. However, it is still convenient to minimize this kind of context shift, as we will discuss in Section 5.

From the properties of the -adic norm, we obtain that is closed under sums and products and show that it forms a subring of . This subring has algebraic structure that make it a fruitful object of study. Most fundamentally, we instantiate as a normed commutative local ring with maximal ideal . We also show it is complete—meaning that any Cauchy sequence of -adic integers converges to a -adic integer—and that it densely embeds .

As with , the topology on is inherited from its norm. The open sets are generated by the family of balls , ranging over and .

## 5. Formalizing Hensel’s Lemma

Hensel’s lemma establishes another fundamental algebraic property of . This result provides simple criteria for locating -adic integer roots of a polynomial; it is widely applied in -adic analysis, and is also used in approximation algorithms in computer science (Martin-Dorel et al., 2015). The general notion of a Henselian local ring, defined to be a local ring for which Hensel’s lemma holds, appears in algebraic geometry. Weaker analogues of Hensel’s lemma hold over other structures, including the standard integers , but the hypotheses of these analogues are harder to satisfy than those for .

The formal proof of Hensel’s lemma follows a writeup by Conrad (Conrad, [n. d.]). Conrad’s description is more concrete than Gouvêa’s (Gouvêa, 1997) and avoids unnecessary detours into the group , although the approaches are schematically identical. We slightly modify Conrad’s proof to perform as much computation as possible inside , without stepping into .

Conrad (Conrad, [n. d.], Theorem 4.1) states Hensel’s lemma as follows. Here, is the ring of univariate polynomials over a variable with coefficients in . The derivative is the formal polynomial derivative, which does not rely on the notion of a limit.

###### Theorem 0 ().

Suppose that and satisfy . There exists a unique such that and . Furthermore, it holds that and .

Hensel’s lemma is sometimes stated with the requirements and . This is a weaker corollary of what we state here. The statement we have proven in Lean is a direct translation of the stronger version.

theorem hensels_lemma {p : ℕ} [hp : prime p]
{F : polynomial _[p]} {a : _[p]} :
F.eval a < F.derivative.eval a∥^2
z : _[p], F.eval z = 0
z - a < F.derivative.eval a
z - a = F.eval a / F.derivative.eval a
F.derivative.eval z = F.derivative.eval a
z : _[p], F.eval z = 0
z - a < F.derivative.eval a  z = z

While Hensel’s lemma can be proved in various different settings at different levels of generality, nearly all proofs follow the same approach: starting with the seed point , they recursively define a sequence of approximations to the desired root, and argue that this sequence converges. This argument is typically seen as an analogy to Newton’s method for finding roots of real functions.

Our proof goes by the following sketch:

1. Establish two generic polynomial identities that will be used at multiple points of the proof.

2. Define a constant, depending only on , that will be used to bound various quantities.

3. Define a recursive sequence , simultaneously proving bounds on the values of along this sequence.

4. Show that this sequence is Cauchy.

5. Show that the limit of this Cauchy sequence has the desired properties, in particular that it is a root of .

6. Show that this root is unique within a neighborhood of .

This sketch comes directly from Conrad. Our approach diverges slightly in step 3, where we reconfigure the recursion to avoid unnecessary casts between and . We also assume that for much of the proof, and handle this later as a (simple) degenerate case. Conrad does not make this special case explicit, but the argument fails at a crucial point if .

### 5.1. Polynomial Identities

Two polynomial identities are used in the proof to rewrite expressions into forms that we can more easily bound. These identities are not specific to the -adic numbers.

The first identity allows us to separate components of and from the expansion of .

lemma binom_exp (f : polynomial α) (x y : α) :
k : α, f.eval (x + y) = f.eval x +
(f.derivative.eval x) * y + k * y^2

This identity follows from a similar statement on commutative semirings.

def binom_exp {α} [comm_semiring α] (x y : α) :
n : ℕ,  k : α,
(x + y)^n = x^n + n*x^(n-1)*y + k * y^2

After inducting on , this proof follows nearly automatically using Lean’s ring tactic. The only manual input is the value to instantiate k. This value was computed using computer algebra software, and using a link to such software (e.g. (Lewis, 2017)), even this step could potentially be automated.

The second identity shows that divides .

lemma eval_sub (f : polynomial α) (x y : α) :
z : α, f.eval x - f.eval y = z*(x - y)

This also follows from a similar algebraic statement, which is proved by induction and ring evaluation.

### 5.2. A Bounding Value

In the subsequent steps we will fix F : polynomial _[p] and a : _[p], and assume that the inequality

F.eval a < F.derivative.eval a∥^2

holds. (These assumptions are taken as parameters in Lean, which are automatically inserted into declarations throughout the duration of a section.) To establish bounds on the terms of the sequence we will define in step 3, we define an auxiliary constant T.

def T :  :=
∥(F.eval a).val / ((F.derivative.eval a).val)^2∥

The division must take place in , since is not a field. However, our hypothesis guarantees that T < 1, so the quotient is in fact an integer. It is trivial to prove the following alternate characterization of T (which uses the norm on ), along with various simple facts about T that will be useful for establishing bounds.

lemma T_def :
T = F.eval a / F.derivative.eval a∥^2

### 5.3. Defining the Newton Sequence

The core step of the proof of Hensel’s lemma is to define a sequence of values that converge to the desired solution. The recursion is typically given by

 a_0 =a a_n+1 =a_n−f(a_n)f′(a_n)

in informal texts. But without further information, this sequence lives in instead of ; we must establish properties about and before concluding that is an integer. In an informal presentation, it is not a problem to first define the sequence in and show integrality afterward. But doing so in our formal development would introduce another layer of casts, one which we would prefer to avoid. We pay a cost to avoid it: the recursion to build must incorporate the properties needed to prove integrality, making it slightly clumsier.

We define our induction hypothesis as follows:

def ih (n : ℕ) (z : _[p]) : Prop :=
F.derivative.eval z = F.derivative.eval a
F.eval z  F.derivative.eval a∥^2 * T ^ (2^n)

To construct our Newton sequence, we must (1) provide a value satisfying ih , and (2) assuming we have a value z : _[p] satisfying ih n z, produce a value z : _[p] satisfying ih (n+1) z. The informal recursion indicates that our base value should be a, and it is no trouble to prove ih 0 a. Under the assumption ih n z, we can check that

F.eval z / F.derivative.eval z  1

and so the recursive value

z := z - F.eval z / F.derivative.eval z

is indeed an integer.

The more difficult part of this induction is to show that ih (n+1) z holds. While there is no deep theory needed to do this, we must calculate some chains of inequalities that, while relatively straightforward, are long and nonlinear. These computations invoke the inductive hypothesis on z, the nonarchimedean property of the -adic norm, and both polynomial identities described in step 1. It takes roughly 70 lines of Lean code to perform these computations, compared to roughly 10 in the informal presentation. Many of these computations fall under the scope of the tool Polya (Avigad et al., 2016) developed by the author. In the future, such a tool could be used to significantly condense this portion of our proof.

These computations are sufficient to define the following:

def newton_seq (n : ℕ) : {z : _[p] // ih n z}

Projecting the first components, we obtain a sequence of -adic integers satisfying the induction hypothesis.

### 5.4. The Newton Sequence is Cauchy

The sequence we have defined should lead us to the root of F promised by Hensel’s lemma. To reach it, we must show that newton_seq is Cauchy, so that the completeness of guarantees that a limit exists. We first establish the following lemma, which follows from another inequality computation:

lemma newton_seq_dist {n k : ℕ} (hnk : n  k) :
newton_seq k - newton_seq n
F.derivative.eval a * T^(2^n)

(It is here that the special case diverges from the general argument.) Since 0  T < 1, Lean’s analysis library makes it easy to show that the right hand side tends to 0, from which we can deduce (from general properties of sequences) that newton_seq is Cauchy. We can thus define soln : _[p] to be the limit of newton_seq.

### 5.5. Properties of the Limit

From our induction hypothesis, we see that the values of F.eval (newton_seq n)∥ tend to 0 as n grows. It follows from the continuity of the norm (proved generally over normed spaces) that F.eval soln = , and thus F.eval soln = , so we have found a root. The equation

F.derivative.eval soln = F.derivative.eval a

similarly follows from the induction hypothesis and the continuity of the norm and polynomial evaluation. A third limit argument shows that

soln - a = F.eval a / F.derivative.eval a

which implies the (less precise but sometimes more useful) bound soln - a < F.derivative.eval a, the last property we sought.

The limit arguments in this section, when unfolded into the language of metric spaces, appear as frustrating manipulations of small numbers and large numbers . We work to avoid as much frustration as possible by making these arguments topologically. Establishing general results about Cauchy sequences on topological spaces lets us keep the manipulations largely isolated.

### 5.6. Uniqueness of the Solution

Hensel’s lemma does more than just locate a root of the polynomial : it guarantees that the root is the only one within a neighborhood of the seed point . The uniqueness proof follows from a short computation using the first polynomial identity from step 1.

lemma soln_unique (z : _[p]) (he : F.eval z = 0)
(hlt : z - a < F.derivative.eval a∥) :
z = soln

When is already a root of , uniqueness follows even more directly. We can thus show Hensel’s lemma by a case distinction on whether , providing as a witness in the special case and soln in the general case.

## 6. Related Work

Although number theory is underrepresented in proof assistant libraries compared to other fields of mathematics, various projects have formalized results in this area. The following incomplete list indicates the depth and flavor of such projects.

The prime number theorem has been a popular target for formalization, verified first in Isabelle/HOL by Avigad, Donnelly, Gray, and Raff (Avigad et al., 2007) and subsequently by Harrison in HOL Light (Harrison, 2009), Carneiro in Metamath (Carneiro, 2016), and others. Isabelle’s Archive of Formal Proofs contains a number of related entries, including Eberl’s proof of Dirichlet’s theorem (Eberl, 2017). Elliptic curves and their number theoretic consequences have been addressed in multiple formalizations, including by Bartzia and Strub (Bartzia and Strub, 2014). The transcendence of and , a result that is at least adjacent to number theory, was first formalized in Coq by Bernard, Bertot, Rideau, and Strub (Bernard et al., 2016). Analyses of the solutions to Pell’s equations have been formalized in various systems, as has the proof of Fermat’s little theorem; these and other classical results appear on Wiedijk’s list of formalizations of 100 fundamental theorems (Wiedijk, [n. d.]).

The only formal construction of and found in the literature is by Pelayo, Voevodsky, and Warren (Pelayo et al., 2015), carried out in the Coq UniMath library (Voevodsky et al., [n. d.]). Because of the univalent foundations of UniMath, it is difficult to compare their approach with ours. One immediate difference is that Pelayo et al. begin with an algebraic construction of rather than an analytic construction of . This construction defines as a quotient on the ring of formal power series of , and goes on to define as the field of fractions on . The algebraic approach is perhaps more appropriate in a univalent setting. A complete theory of the -adics ultimately requires both analytic and algebraic structure. No matter which is chosen for the initial construction, the properties of the other must eventually be derived. The UniMath development ends soon after the ring and field structures are defined, and does not prove any theorems about or .

An undocumented construction of by Harrison is also found in the HOL Light repository (Harrison, [n. d.]), where it is written that the development is “meant as an example of using metric space completion.” The development defines as the metric completion of , which we believe is better avoided for pedagogical reasons (Section 4.3). Since metric space completion does not preserve the field operations on , much of the construction is dedicated to redefining these operations on . This development ends once the field structure on is established, and does not prove any results about the type. It is interesting to note how the construction in a simply typed logic differs from those in dependent type theory. HOL Light does not allow one to define the type depending on (nor on a proof that is prime). Instead, Harrison defines a general type padic that contains the image of for each . Such an approach is common in HOL-based systems, but is rarely used in systems like Lean, where the dependencies pose no problems.

Martin-Dorel, Hanrot, Mayero, and Théry describe a Coq formalization of Hensel’s lemma for the standard integers, and show some of its applications in verifying bounds on rounding errors (Martin-Dorel et al., 2015). Their approach is more explicitly algorithmic than ours, as their applications involve computing solutions to polynomials modulo powers of (a process known as Hensel lifting). For this purpose, it is reasonable to operate over the standard integers, since there are complications in defining as a computable field. The statement of Hensel’s lemma is rather less elegant than over , however, and its import as an algebraic property is hidden.

## 7. Concluding Thoughts

The -adic numbers and Hensel’s lemma are an important tool of modern number theory, including the study of Diophantine equations. A recent project begun at the Vrije Universiteit Amsterdam aims to formalize results in this area. Constructing and are essential steps toward this goal. We plan to pursue consequences of Hensel’s lemma in future work, beginning with applications of the local-global principle. The -adic numbers can be abstracted in different ways, and Hensel’s lemma proved in more general contexts; we plan to explore these possibilities. More concretely, it is often useful to consider the alternative characterization of as the inverse limit of the rings . We plan to show the connection between this algebraic approach and our analytic approach.

This development has also served as a case study for using Lean for such a project. The mix of analysis, algebra, and concrete computation make the -adic numbers an interesting target; we found that Lean and its libraries were up to the task. Similar developments could certainly be made using other proof assistants, but we found the classical library and notational features of Lean to be quite helpful here; dependent type theory is a convenient logical foundation to use, since is naturally a dependent type. Two particularly painful parts of the project were managing casts between various number structures and proving long but straightforward nonlinear inequalities. The former is especially likely to occur in further number theoretic developments, since it is often necessary to move between , , , , , , and other number structures. We hope to develop tools to assist with this movement using Lean’s metaprogramming capabilities.

As always, it is difficult to directly compare the lengths of our formalization and the informal proofs we followed, since they begin at different levels of background knowledge. The portion of our formalization that proves Hensel’s lemma is around 400 lines of code, corresponding to 1.5 pages of text in Conrad’s informal proof; tools mentioned in the previous paragraph should significantly decrease this ratio. This development has added around 4500 lines of code to the mathlib repository.

###### Acknowledgements.
We would like to thank Jeremy Avigad, Alexander Bentkamp, Jasmin Blanchette, Kevin Buzzard, Sander Dahmen, Johannes Hölzl, Petar Vukmirović, and the Lean mathlib community for their support, advice, and comments. We acknowledge support from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation program (grant agreement No. 713999, Matryoshka).

## References

• (1)
• Avigad et al. (2007) Jeremy Avigad, Kevin Donnelly, David Gray, and Paul Raff. 2007. A formally verified proof of the prime number theorem. ACM Trans. Comput. Logic 9, 1 (2007), 2.
• Avigad et al. (2016) Jeremy Avigad, Robert Y. Lewis, and Cody Roux. 2016.

A heuristic prover for real inequalities.

Journal of Automated Reasoning

(2016).
• Bancerek et al. (2018) Grzegorz Bancerek, Czesław Byliński, Adam Grabowski, Artur Korniłowicz, Roman Matuszewski, Adam Naumowicz, and Karol Pak. 2018. The Role of the Mizar Mathematical Library for Interactive Proof Development in Mizar. Journal of Automated Reasoning 61, 1 (01 Jun 2018), 9–32.
• Bartzia and Strub (2014) Evmorfia-Iro Bartzia and Pierre-Yves Strub. 2014. A Formal Library for Elliptic Curves in the Coq Proof Assistant. In Interactive Theorem Proving, Gerwin Klein and Ruben Gamboa (Eds.). Springer International Publishing, Cham, 77–92.
• Bernard et al. (2016) Sophie Bernard, Yves Bertot, Laurence Rideau, and Pierre-Yves Strub. 2016. Formal Proofs of Transcendence for e and Pi As an Application of Multivariate and Symmetric Polynomials. In Proceedings of the 5th ACM SIGPLAN Conference on Certified Programs and Proofs (CPP 2016). ACM, New York, NY, USA, 76–87.
• Bourbaki (1998) Nicolas Bourbaki. 1998. General topology. Chapters 1–4. Springer, Berlin. vii+437 pages. Translated from the French, Reprint of the 1989 English translation.
• Browning (2018) T. D. Browning. 2018. How often does the Hasse principle hold? In Algebraic geometry: Salt Lake City 2015. Proc. Sympos. Pure Math., Vol. 97. Amer. Math. Soc., Providence, RI, 89–102.
• Carneiro (2016) Mario Carneiro. 2016. Formalization of the prime number theorem and Dirichlet’s theorem. In Proceedings of the 9th Conference on Intelligent Computer Mathematics (CICM 2016). 10–13.
• Carneiro (2018) Mario Carneiro. 2018. The Lean 3 Mathematical Library (presentation).
• Coquand and Paulin (1990) Thierry Coquand and Christine Paulin. 1990. Inductively defined types. In COLOG-88 (Tallinn, 1988). Lec. Notes in Comp. Sci., Vol. 417. Springer, Berlin, 50–66.
• Dahmen and Siksek (2014) Sander R. Dahmen and Samir Siksek. 2014. Perfect powers expressible as sums of two fifth or seventh powers. Acta Arith. 164, 1 (2014), 65–100.
• de Moura et al. (2017) Leonardo de Moura, Gabriel Ebner, Jared Roesch, and Sebastian Ullrich. 2017. The Lean Theorem Prover (presentation).
• de Moura et al. (2014) Leonardo de Moura, Soonho Kong, Jeremy Avigad, Floris van Doorn, and Jakob von Raumer. 2014. The Lean Theorem Prover.
• Eberl (2017) Manuel Eberl. 2017. Dirichlet L-Functions and Dirichlet’s Theorem. Archive of Formal Proofs (Dec. 2017). http://isa-afp.org/entries/DirichletL.html, Formal proof development.
• Ebner et al. (2017) Gabriel Ebner, Sebastian Ullrich, Jared Roesch, Jeremy Avigad, and Leonardo de Moura. 2017. A metaprogramming framework for formal verification. Proceedings of the ACM on Programming Languages 1, ICFP (2017), 34.
• Gonthier (2008) Georges Gonthier. 2008. Formal proof–the four-color theorem. Notices of the AMS 55, 11 (2008), 1382–1393.
• Gonthier et al. (2013) Georges Gonthier, Andrea Asperti, Jeremy Avigad, Yves Bertot, Cyril Cohen, François Garillot, Stéphane Le Roux, Assia Mahboubi, Russell O’Connor, Sidi Ould Biha, Ioana Pasca, Laurence Rideau, Alexey Solovyev, Enrico Tassi, and Laurent Théry. 2013.

A Machine-Checked Proof of the Odd Order Theorem. In

Interactive Theorem Proving, Sandrine Blazy, Christine Paulin-Mohring, and David Pichardie (Eds.). Springer, Berlin, 163–179.
• Gouvêa (1997) Fernando Q. Gouvêa. 1997. -adic Numbers (second ed.). Springer, Berlin. vi+298 pages.
• Gregoire and Mahboubi (2005) Benjamin Gregoire and Assia Mahboubi. 2005. Proving Equalities in a Commutative Ring Done Right in Coq. In Theorem Proving in Higher Order Logics (TPHOLs 2005), LNCS 3603. Springer, 98–113.
• Hales et al. (2017) Thomas Hales, Mark Adams, Gertrud Bauer, Tat Dat Dang, John Harrison, Hoang Le Truong, Cezary Kaliszyk, Victor Magron, Sean McLaughlin, Tat Thang Nguyen, et al. 2017. A formal proof of the Kepler conjecture. In Forum of Mathematics, Pi, Vol. 5. Cambridge University Press.
• Harrison ([n. d.]) John Harrison. [n. d.]. The HOL Light theorem prover.
• Harrison (2009) John Harrison. 2009. Formalizing an analytic proof of the Prime Number Theorem. Journal of Automated Reasoning 43 (2009), 243–261.
• Hehner and Horspool (1979) Eric C. R. Hehner and RNS Horspool. 1979. A new representation of the rational numbers for fast easy arithmetic. SIAM J. Comput. 8, 2 (1979), 124–134.
• Hensel (1908) Kurt Hensel. 1908. Theorie der algebraischen Zahlen. Number v. 1 in Cornell University Library historical math monographs. B. G. Teubner.
• Hölzl et al. (2013) Johannes Hölzl, Fabian Immler, and Brian Huffman. 2013. Type classes and filters for mathematical analysis in Isabelle/HOL. In International Conference on Interactive Theorem Proving. Springer, 279–294.
• James (1999) Ioan James. 1999. Topologies and uniformities. Springer-Verlag London, Ltd., London. xvi+230 pages. Revised version of ıt Topological and uniform spaces [Springer, New York, 1987; MR0884154 (89b:54001)].
• Lech (1953) Christer Lech. 1953. A note on recurring series. Ark. Mat. 2, 5 (08 1953), 417–421.
• Lewis (2017) Robert Y. Lewis. 2017. An extensible ad hoc interface between Lean and Mathematica. Electronic Proceedings in Theoretical Computer Science 262 (Dec 2017), 23–37.
• Mahboubi and Tassi ([n. d.]) Assia Mahboubi and Enrico Tassi. [n. d.]. Mathematical Components.
• Mahler (1958) K. Mahler. 1958.

An interpolation series for continuous functions of a