Distant decimals of π

09/06/2017 ∙ by Yves Bertot, et al. ∙ 0

We describe how to compute very far decimals of π and how to provide formal guarantees that the decimals we compute are correct. In particular, we report on an experiment where 1 million decimals of π and the billionth hexadecimal (without the preceding ones) have been computed in a formally verified way. Three methods have been studied, the first one relying on a spigot formula to obtain at a reasonable cost only one distant digit (more precisely a hexadecimal digit, because the numeration basis is 16) and the other two relying on arithmetic-geometric means. All proofs and computations can be made inside the Coq system. We detail the new formalized material that was necessary for this achievement and the techniques employed to guarantee the accuracy of the computed digits, in spite of the necessity to work with fixed precision numerical computation.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The number has been exciting the curiosity of mathematicians for centuries. Ingenious formulas to compute this number manually were devised since antiquity with Archimede’s exhaustion method and a notable step forward achieved in the eighteenth century, when John Machin devised the famous formula he used to compute one hundred decimals of .

Today, thanks to electronic computers, the representation of in fractional notation is known up to tens of trillions of decimal digits. Establishing such records raises some questions. How do we know that the digits computed by the record-setting algorithms are correct? The accepted approach is to perform two computations using two different algorithms. In particular, with the help of a spigot formula, it is possible to perform a statistical verification, simply checking that a few randomly spread digits are computed correctly.

In this article, we study the best known spigot formula, an algorithm able to compute a faraway digit at a cost that is much lower than computing all the digits up to that position. We also study two algorithms based on arithmetic geometric means, which are based on iterations that double the number of digits known at each step. For these algorithms, we perform all the proofs in real analysis that show that they do converge towards , giving the rate of convergence, and we then show that all the computations in a framework of fixed precision computations, where computations are only approximated by rational numbers with a fixed denominator, are indeed correct, with a formally proved bound on the difference between the result and . Last we show how we implement the computations in the framework of our theorem prover.

The first algorithm, due to Bailey, Borwein, and Plouffe relies on a formula of the following shape, known as the BBP formula BBP97 .

Because each term of the sum is multiplied by it appears that approximately terms of the infinite sum are needed to compute the value of the nth hexadecimal digit. Moreover, if we are only interested in the value of the nth digit, the sum of terms can be partitioned in two parts, where the first contains the terms such that and the second contains terms that will only contribute when carries need to be propagated.

We shall describe how this algorithm is proved correct and what techniques are used to make this algorithm run inside the Coq theorem prover.

The second and third algorithms rely on a process known as the arithmetic-geometric mean. This process considers two inputs and and successively computes two sequences and such that , , and

In the particular case where and , the values and are functions of that are easily shown to be continuous and differentiable and it is useful to consider the two functions

A first computation of is expressed by the following equality:

Truncations of this infinite product are shown to approximate with a number of decimals that doubles every time a factor is added. This is the basis for the second algorithm.

The third algorithm also uses the arithmetic geometric mean for and , but performs a sum and a single division:

It is sensible to use index in the numerator and in the sum of the denominator, because this gives approximations with comparable precisions of their respective limits. This is the basis for the third algorithm. This third algorithm was introduced in 1976 independently by Brent and Salamin Brent76 ; Salamin76 . It is the one implemented in the mpfr library for high-precision computation mpfr to compute .

In this paper, we will recapitulate the mathematical proofs of these algorithms (sections 2 and 3), and show what parts of existing libraries of real analysis we were able to reuse and what parts we needed to extend.

For each of the algorithms, we study first the mathematical foundations, then we concentrate on implementations where all computations are done with a single-precision fixed-point arithmetic, which amounts to forcing all intermediate results to be rational numbers with a common denominator. This framework imposes that we perform more proofs concerning bounds on accumulated rounding errors.

Context of this work.

All the work described in this paper was done using the Coq proof assistant coq . This system provides a library describing the basic definition of real analysis, known as the standard Coq library for reals, where the existence of the type of real numbers as an ordered, archimedian, and complete field with decidable comparison is assumed. This choice of foundation makes that mathematics based on this library is inherently classical, and real numbers are abstract values which cannot be exploited in the programming language that comes in Coq’s type theory.

The standard Coq library for reals provides notions like convergent sequences, series, power series, integrals, and derivatives. In particular, the sine and cosine functions are defined as power series, is defined as twice the first positive root of the cosine function, and the library provides a first approximation of as being between and . It also provides a formal description of Machin formulas, relating computation of to a variety of computations of arctangent at rational arguments, so that it is already possible to compute relatively close approximations of , as illustrated in BertotAllais14 .

The standard Coq library implements principles that were designed at the end of the 1990s, where values whose existence is questionable should always be guarded by a proof of existence. These principles turned out to be impractical for ambitious formalized mathematics in real analysis, and a new library called Coquelicot BLM15 was designed to extend the standard Coq library and achieve a more friendly and regular interface for most of the concepts, especially limits, derivatives, and integrals. The developments described in this paper rely on Coquelicot.

Many of the intermediate level steps of these proofs are performed automatically. The important parts of our working context in this respect are the Psatz library, especially the psatzl tactic DBLP:conf/types/Besson06 , which solves reliably all questions that can be described as linear arithmetic problems in real numbers and lia DBLP:conf/types/Besson06 , which solves similar problems in integers and natural numbers. Another tool that was used more and more intensively during the development of our formal proofs is the interval tactic interval , which uses interval arithmetic to prove bounds on mathematical formulas of intermediate complexity. Incidentally, the interval tactic also provides a simple way to prove that belongs to an interval with rational coefficients.

Intensive computations are performed using a library for computing with very large integers, called BigZ GregoireTheryIJCAR2006 . It is quite notable that this library contains an implementation of an optimized algorithm to compute square roots of large integers RacineCarreeGMP .

2 The BBP formula

In this section we first recapitulate the main mathematical formula that makes it possible to compute a single hexadecimal at a low cost BBP97 .

Then, we describe an implementation of an algorithm that performs the relevant computation and can be run directly inside the Coq theorem prover.

2.1 Proof of the BBP formula

2.1.1 The mathematical Proof

We give here a detailed proof of the formula established by David Bayley, Peter Borwein and Simon Plouffe. The level of detail is chosen to mirror the difficulties encountered in the formalization work.

(1)

We first study the properties of the sum for a given such that :

(2)

By using the notation and the laws of integration, we get

(3)

Thanks to uniform convergence, the series and the integral can be exchanged and we can then factor out and recognize a geometric series in .

(4)

Now replacing the values in the right hand side of (1), we get:

(5)

Then, with the variable change and algebraic calculations on the integrand

(6)

We recognize here the respective derivatives of , and . Most of these functions have null or compensating values at the bounds of the integral, leaving only one interesting term:

2.1.2 The formalization of the proof

The current version of our formal proof, compatible with Coq version 8.5 and 8.6 coq is available on the world-wide web PlouffeAGMSources . To formalize this proof, we use the Coquelicot library intensively. This library deals with series, power series, integrals and provides some theorems linking these notions that we need for our proof. In Coquelicot, series (named Series) are defined as in standard mathematics as the sum of the terms of an infinite sequence (of type in our case) and power series (PSeries) are the series of terms of the form . The beginning of the formalisation follows the proof (steps  (2) to (3)). Then, one of the key arguments of the proof is the exchange of the integral sign and the series allowing the transition from equation (3) to equation (4). The corresponding theorem provided by Coquelicot is the following:

Lemma RInt_PSeries (a : nat -> R) (x : R) :
   Rbar_lt (Rabs x) (CV_radius a) ->
   RInt (PSeries a) 0 x = PSeries (PS_Int a) x.

where (PSeries (PS_Int )) is the series whose (n+1)-th term is coming from the equality: . We use this lemma as a rewriting rule from right to left.

Note that the RInt_PSeries theorem assumes that the integrated function is a power series (not a simple series), that is, a series whose terms have the form . In our case, the term of the series is , that is . To transform it into an equivalent power series we have first to transform the series into a power series. For that purpose, we define the hole function.

Definition hole (n : nat) (a : nat -> R) (i : nat) :=
  if n mod k =? 0  then a (i / n)  else 0.

and prove the equality given in the following lemma.

Lemma fill_holes k a x :
  k <> 0 -> ex_pseries a (x ^ k) ->
  PSeries (hole k a) x = Series (fun n => a n * x ^ (k * n)).

The premise written in the second line of fill_holes expresses that the series converges. This equality expresses that the series of term is equivalent to the power series which terms are when n is a multiple of and 0 otherwise.

Then by combining fill_holes with the Coquelicot function (PS_incr_n a n), that shifts the coefficients of the series to transform it into that is a power series, we prove the PSeries_hole lemma.

Lemma PSeries_hole x a d k :
  0 <= x < 1 ->
  Series (fun i : nat => a * x ^ (d + S n * i)) =
  PSeries (PS_incr_n (hole (S k) (fun _ : nat => a)) n) x

Moreover, the RInt_PSeries theorem contains the hypothesis that the absolute value of the upper bound of the integral, that is , is less than the radius of convergence of the power series associated to . This is proved in the following lemma.

Lemma PS_cv x a :
  (forall n : nat, 0 <= a n <= 1) ->
  0 <= x -> x < 1 -> Rbar_lt (Rabs x) (CV_radius a)

It should be noted that in our case is either or and the hypothesis forall n : nat, 0 <= a n <= 1 is easily satisfied.

In summary, the first part of the proof is formalized by the Sk_Rint lemma:

Lemma Sk_Rint k (a := fun i => / (16 ^ i * (8 * i + k))) :
  0 < k ->
  Series a =
   sqrt 2 ^ k  *
       RInt (fun x => x ^ (n - 1) / (1 - x ^ 8)) 0 (/ sqrt 2).

that computes the value of given by (4) from the definition (2) of Sk.

The remaining of the formalized proof follows closely the mathematical proof described in the previous section. We first perform an integration by substitution (starting from equation (5)), replacing the variable by , by rewriting (from right to left) with the RInt_comp_lin Coquelicot lemma.

Lemma RInt_comp_lin f u v a b :
  RInt (fun y : R => u * f (u * y + v)) a b =
  RInt f (u * a + v) (u * b + v)

This lemma assumes that the substitution function is a linear function, which is the case here.

Then we decompose S into three parts (by computation) to obtain equation (6), actually decomposed into three integrals that are computed in lemmas RInt_Spart1, RInt_Spart2, and RInt_Spart3 respectively. For instance:

Lemma RInt_Spart3 :
  RInt (fun x => (4 * x) / (x ^ 2 - 2)) 0 1 = 2 * (ln 1 - ln 2).

Finally, we obtain the final result, based on the equality .

2.2 Computing the nth decimal of using the Plouffe formula

We now describe how the formula (1) can be used to compute a particular decimal of effectively. This formula is a summation of four terms where each term has the form for some . Digits are then expressed in hexadecimal (base 16). Natural numbers strictly less than are used to simulate a modular arithmetic with bits, where is the precision of computation. We first explain how the computation of for a given is performed. Then, we describe how the four computations are combined to get the final digit.

We want to get the digit at position . The first operation is to scale the sum by a factor to be able to use integer arithmetic. In what follows, we need that is greater than four. If we consider (the integer part of ), the digit we are looking for is composed of its bits , , , that can be computed using basic integer operations: . Using integer arithmetic, we are going to compute an approximation of by splitting the sum into three parts

(7)

In the first part, the inner term can be rewritten as where both divisor and dividend are natural numbers. The division can be performed in several stages. To understand this, it is worth comparing the fractional and integer part of with the bits of .

For illustration, let us consider the case where , , , and . The number we wish to compute is

and we only need to know the first 4 bits, that is we need to know this number modulo . The ratio is , and modulo 16 this is 5. Now, we can look at the number . If we note and the quotient and the remainder of the division on the left (when viewed as an integer division), we have

Since we eventually want to take this number modulo , the left part of the sum, , does not impact the result and we only need to compute , in other words . In our illustration case, we have and , so we do recover the right 4 bits. Also, because we are only interested in bits that are part of the integral part of the result, we can use integer division to perform the last operation.

These computations are performed in the following Coq function, that progresses by modifying a state datatype containing the current index and the current sum. In this function, we also take care of keeping the sum under , because we are only concerned with this sum modulo .

Inductive NstateF := NStateF (i : nat) (res : nat).

Doing an iteration is performed by

Definition NiterF k (st : NstateF) :=
  let (i, res) := st in
  let r := 8 * i + k in
  let res := res + (2 ^ p * (16 ^ (d - 1 - i) mod r)) / r in
  let res := if res < 2 ^ p then res else res - 2 ^ p in
  NStateF (i + 1) res.

The summation is performed by iterations:

Definition NiterL k := iter d  (NiterF k)  (NStateF 0 0).

The result of NiterL is a natural number. What we need to prove is that it is a modular result and it is not so far from the real value. As we have turned an exact division into a division over natural numbers, the error is at most 1. After iterations, it is at most . This is stated by the following lemma.

Lemma sumLE k (f := fun i => ((16 ^ d / 16) * 2 ^ p) /
                              (16 ^ i * (8 * i + k))) :
  0 < k ->
  let (_, res) := NiterL p d k in
  exists u : nat, 0 <= sum_f_R0 f (d - 1) - res - u * 2 ^ p < d.

where sum_f_R0 f n represents the summation .

Let us now turn our attention to the second part of the iteration of formula (7).

All the terms of this sum are less than . As terms get smaller by a factor of at least 16, we consider only terms. We first build a datatype that contains the current index, the current shift and the current result:

Inductive NstateG := NStateG (i : nat) (s : nat) (res : nat).

We then define what is a step:

Definition NiterG k (st : NstateG) :=
  let (i, s, res) := st in
  let r := 8 * i + k in
  let res := res + (s / r) in
  NStateG (i + 1) (s / 16) res.

and we iterate times:

Definition NiterR k :=
  iter (p / 4)  (NiterG k) (NStateG d (2 ^ (p - 4)) 0).

Here we do not need any modulo since the result fits in bits and as the contribution of each iteration makes an error of at most one unit with the division by , the total error is then bounded by . This is stated by the following lemma.

Lemma sumRE k (f := fun i =>
                      ((16 ^ d / 16) * 2 ^ p) /
                       (16 ^ (d + i) * (8 * (d + i) + k))) :
  0 < k -> 0 < p / 4 ->
  let (_, _, s1) := NiterR k in
  0 <= sum_f_R0 f (p / 4 - 1) - s1 < p / 4.

The last summation is even simpler. We do not need to perform any computation. all the terms are smaller than 1 and quickly decreasing. It is then easy to prove that this summation is strictly smaller than 1.

Adding the two computations, we get our approximation.

Definition NsumV k :=
  let (_, res1) := NiterL k in
  let (_, _, res2) := NiterR k in res1 + res2.

We know that it is an under approximation and the error is less than .

We are now ready to define our function that extracts the digit:

Definition NpiDigit :=
  let delta := d + p / 4 + 1 in
  if (3 < p) then
    if 8 * delta < 2 ^ (p - 4) then
      let Y := 4 * (NsumV 1) +
             (9 * 2^ p -
             (2 * NsumV 4 + NsumV 5 + NsumV 6 + 4 * delta)) in
      let v1 := (Y + 8 * delta) mod 2 ^ p / 2 ^ (p - 4) in
      let v2 := Y mod 2 ^ p / 2 ^ (p - 4) in
      if v1 = v2 then Some v2 else None
    else None
  else None.

This deserves a few comments. In this function, the variable delta represents the error that is done by one application of NsumV. When adding the different sums, we are then going to make an overall error of 8 * delta. Moreover, we know that NsumV is an under approximation. The variable Y computes an under approximation of the result: for those sums that appear negatively, the under approximation is obtained adding delta to the sum before taking the opposite. This explains the fragment ... + 4 * delta that appears on the seventh line. Each of the sums obtained by NsumV actually is a natural number smaller than , when it is multiplied by a negative coefficient, this should be represented by . Accumulating all the compensating instances of leads to the fragment 9 * 2 ^ p - ... that appears on the sixth line.

After all these computations, Y + 8 * delta is an over approximation. If both Y and Y + 8 * delta give the same digit, we are sure that this digit is valid.

The correctness of the NpiDigit function is proved with respect to the definition of what is the digit at place in base of a real number , i.e. we take the integer part of and we take the modulo :

Definition Rdigit (b : nat) (d : nat) (r : R) :=
  (Int_part ((Rabs r) * (b ^ d))) mod b.

The correctness is simply stated as

Lemma NpiDigit_correct k :
  NpiDigit = Some k -> Rdigit 16 d PI = k.

Note that this is a partial correctness statement. A program that always returns None also satisfies this statement. If we look at the actual program, it is clear that one can precompute a that fulfills the first two tests, the equality test is another story. A long sequence of 0 (or F) may require a very high precision.

This program is executable but almost useless since it is based on a Peano representation of the natural numbers. Our next step was to derive an equivalent program using a more efficient representation of natural numbers, provided by the type BigN GregoireTheryIJCAR2006 . This code also receives some optimizations to implement faster operations of multiplications and divisions by powers of 2 and fast modular exponentiations.

Computing within Coq that 2 is the millionth decimal in hexadecimal of with a precision of 28 bits (27 are required for the first two tests and 28 for the equality test) takes less than 2 minutes. In order to reach the billionth decimal, we implement a very naive parallelization for a machine with at least four cores: each sum is computed on a different core generating a theorem then the final result is computed using these four theorems. With this technique, we get the millionth decimal, 2, in 25 seconds and the billionth decimal, 8, in 19 hours. Note that we could further parallelize inside the individual sums to compute partial sums and then use Coq theorems to glue them together.

3 Algorithms to compute based on arithmetic geometric means

In principle, all the mathematics that we had to describe formally in our study of arithmetic geometric means and the number are available from the mathematical litterature, essentially from the monograph by J. M. Borwein and P. B. Borwein BorweinAGM and the initial papers by R. Brent Brent76 , E. Salamin Salamin76 . However, we had difficulties using these sources as a reference, because they rely on an extensive mathematical culture from the reader. As a result, we were actually guided by a variety of sources on the world-wide web, including an exam for the selection of French high-school mathematical teachers CapesAGM95 . It feels useful to repeat these mathematical facts in a first section, hoping that they are exposed at a sufficiently elementary level to be understood by a wider audience. However, some details may still be missing from this exposition and they can be recovered from the formal development itself.

This section describes two algorithms, but their mathematical justification has a lot in common. The first algorithm that we present came to us as the object of an exam for high-school teachers CapesAGM95 , but in reality this algorithm is neither the first one to have been designed by mathematicians, nor the most efficient of the two. However, it is interesting that it brings us good tools to help proving the second one, which is actually more traditional (that second algorithm dates from 1976 Brent76 ; Salamin76 , and it is the one implemented in the mpfr library mpfr ) and more efficient (we shall see that it requires much less divisions).

In a second part of our study, we concentrate on the accumulation of errors during the computations and show that we can also prove bounds on this. This part of our study is more original, as it is almost never covered in the mathematical litterature, however it re-uses most of the results we exposed in a previous article Bertot:CPP15 .

3.1 Mathematical basics for arithmetic geometric means

Here we enumerate a large collection of steps that make it possible to go from the basic notion of arithmetic-geometric means to the computation of a value of

, together with estimates of the quality of approximations.

This is a long section, consisting of many simple facts, but some of the detailed computations are left untold. Explanations given between the formulas should be helpful for the reader to recover most of the steps. However, missing information can be found directly in the actual formal development PlouffeAGMSources .

The arithmetic-geometric process.

As already explained in section 1, the arithmetic-geometric mean of two numbers and is obtained by defining sequences and such that , and

A few tests using high precision calculators show that the two sequences and converge rapidly to a common value , with the number of common digits doubling at each iteration. The sequence provides over approximations and the sequence under approximations. Here is an example computation (for each line, we stopped printing values at the first differing digit between and ).

0 1 0.5
1 0.75 0.70…
2 0.7285… 0.7282…
3 0.72839552… 0.72839550…
4 0.7283955155234534… 0.7283955155234533…

The function also benefits from a scalar multiplication property:

(8)

For the sake of computing approximations of , we will mostly be interested in the sequences and stemming from and .

Elliptic integrals.

We will be interested in complete elliptic integrals of the first kind, noted . The usual definition of these integrals has the following form

(9)

But it can be proved that the following equality holds, when setting and , and using a change of variable (we only use the form ):

(10)

Note that the integrand in is symmetric, so that is also half of the integral with infinities as bounds. With the change of variables , then reasoning by induction and taking the limit, we also have the following equalities

(11)
Equivalence when and derivatives.

Another interesting property for elliptic integrals of the first kind can be obtained by the variable change on the integral on the right-hand side of equation (10).

(12)

Studying this integral when tends to gives the equivalences for and :

(13)

For the rest of this section, we will assume that is a value in the open interval and that and . Coming back to the sequences and , the following property can be established.

(14)

We can repeat times and use the fact that .

(15)

Still under the assumption of and , we define as follows:

(16)

Through separate calculation, involving Equation (13) and the definition of , we establish the following properties.

(17)

These derivatives converge uniformly to their limit. Moreover, the sequence of derivatives of is growing and converges uniformly. This guarantees that is also differentiable and and its derivative is the limit of the derivatives of . We can then obtain the following two equations, the second is our main central formula.

(18)

We define the functions and . These sequence satisfy

(19)

and the following important chain of comparisons.

(20)
Computing with and (the Borwein algorithm).

The first algorithm we will present, proposed by J. M. Borwein and P. B.Borwein, consists in approximating using the sequences and . The value is approximated using and using , all values being taken in .

From the definition, we can easily derive the following properties:

(21)

Repeating the products, we get the following definition of a sequence and the proof of its limit:

(22)
Convergence speed.

For an arbitrary in the open interval , using a Taylor expansion of the function of order two, and then reasoning by induction, we get the following results:

(23)

For , we obtain the following bound:

(24)

Using the comparisons of line (20) and then reasoning by induction we obtain our final error estimate:

(25)
Computing one million decimals.

The first element of the sequence that is close to with an error smaller than is obtained for satisfying the following comparison.

(26)

For one million hexadecimals, only needs to be larger than .

Computing with an infinite sum (the Brent-Salamin algorithm).

The formula described in this section probably appears in Gauss’ work and is repeated by King

King1924 . It was published and clarified for implementation on modern computers by Brent Brent76 and Salamin Salamin76 . A good account of the historical aspects is given by Almkvist and Berndt AlmkvistBerndt88 . Our presentation relies on a mathematical exposition given by Gourevitch Gourevitch99 .

In the variant proposed by Brent and Salamin, we compute the right-hand side of the main central formula by computing and the ratio . We first introduce a third function .

(27)

The derivative of function can be expressed with and after combination with equation 17, this gives a formula for the derivative of at .

(28)

The derivative of this ratio can be compared to the difference of the ratio of over at two successive indices, which can be repeated times.

(29)

We can then use equations (27) and (18), where is the limit of to obtain the final definition and limit.

(30)
Speed of convergence.

We can link the Brent-Salamin algorithm with the Borwein algorithm in the following manner:

(31)

Combining bounds (20), (24), and (25) we obtain this first approximation.

(32)

This first approximation is too coarse, as it gives the impression that is needed when is enough (the exponent of 2 in bound (32) is while it is in bound (25)). We can make it better by noting that the difference between and is and the difference between and is significantly smaller than , while not being .

(33)

For one million decimals of , we can still use .

Each algorithm computes square roots to compute or . However, the first one uses division to obtain value , while the second one only performs divisions by , which are less costly, and a single full division at the end of the computation to compute . In our experiments computing these algorithms inside Coq, the second one is twice as fast.

3.2 Formalization issues for arithmetic geometric means

In this section, we describe the parts of our development where we had to proceed differently from the mathematical exposition in section 3.1. Many difficulties arose from gaps in the existing libraries for real analysis.

The arithmetic geometric mean functions.

For a given and , the functions and actually are functions of and that are defined mutually recursively. Instead of a mutual recursion between two functions, we chose to simply describe a function ag that takes three arguments and returns a pair of two arguments. This can be written in the following manner:

Fixpoint ag (a b : R) (n : nat) :=
  match n with
    0 => (a, b)
  | S p => ag ((a + b) / 2) (sqrt (a * b)) p
  end.

This functions takes three arguments, two of which are real numbers, and the third one is a natural number. When the natural number is , then the result is the pair of the real numbers, thus expressing that and . When the natural number is the successor of some , then the two real number arguments are modified in accordance to the arithmetic-geometric mean process, and then the -th argument of the sequence starting with these new values is computed.

As an abbreviation we also use the following definitions, for the special case when the first input is .

Definition a_ (n : nat) (x : R) := fst (ag 1 x n).

Definition b_ (n : nat) (x : R) := snd (ag 1 x n).

The function ag_step seems to perform the operation in a different order, but in fact we can really show that and as expected, thanks to a proof by induction on . This is expressed with theorems of the following form:

Lemma a_step n x : a_ (S n) x = (a_ n x + b_ n x) / 2.

Lemma b_step n x : b_ (S n) x = sqrt (a_ n x * b_ n x).
Limits and filters.

Cartan Filtres-cartan proposed in 1937 a general notion that made it possible to develop notions of limits in a uniform way, whether they concern limits of continuous function of limits of sequences. This notion, known as filters is provided in formalized mathematics in Isabelle Isabelle-filters and more recently in the Coquelicot library BLM15 . It is also present in a simplified form as convergence nets in Hol-Light Harrison98 .

Filters are not real numbers, but objects designed to represent ways to approach a limit. There are many kinds of filters, attached to a wide variety of types, but for our purposes we will mostly be interested in seven kinds of filters.

  • eventually represents the limit towards , but only for natural numbers,

  • locally represents a limit approaching a real number from any side,

  • at_point represents a limit that is actually not a limit but an exact value: you approach because you are bound to be exactly ,

  • at_right represents a limit approaching from the right, that is, only taking values that are greater than (and not itself),

  • at_left represents a limit approaching from the left,

  • Rbar_locally p_infty describes a limit going to ,

  • Rbar_locally m_infty describes a limit going to .

There is a general notion called filterlim to express that the value returned by tends to a value described by the filter when its input is described by . For instance, we constructed formal proofs for the following two theorems:

Lemma lim_atan_p_infty :
  filterlim atan (Rbar_locally p_infty) (at_left (PI / 2)).

Lemma lim_atan_m_infty :
  filterlim atan (Rbar_locally m_infty) (at_right (-PI / 2)).

In principle, filters make it possible to avoid the usual proofs of topology and analysis, using faster techniques to relate input and output filters for continuous functions Isabelle-filters . In practice, for precise proofs like the ones above (which use the at_right and at_left filters), we still need to revert to a traditional framework.

Improper integrals.

The Coq standard library of real numbers has been providing proper integrals for a long time, more precisely Riemann integrals. The Coquelicot library adds an incomplete treatement of improper integrals on top of this. For improper integrals the bounds are described as limits rather than as direct real numbers. For the needs of this experiment, we need to be able to cut improper integrals into pieces, perform variable changes, and compute the improper integral

(34)

The Coquelicot library provides two predicates to describe improper integrals, the first one has the form111the name can be decomposed in is R for Riemann, Int for Integral, and gen for generalized.

is_Rint_gen \(f\) \(B_{1}\) \(B_{2}\) \(v\)
The meaning of this predicate is “the improper integral of function between bounds and converges and has value ”. The second predicate is named ex_Rint_gen and it simply takes the same first three arguments as is_Rint_gen, to express that there exists a value such that is_Rint_gen holds. The Coquelicot library does not provide a functional form, but there is a general tool to construct functions from relations where one argument is uniquely determined by the others, called iota in that library.

Concerning elliptic integrals, as a first step we need to express the convergence of the improper integral in equation (10). For this we need a general theorem of bounded convergence, which is described formally in our development, because it is not provided by the library. Informally, the statement is that the improper integral of a positive function is guaranteed to converge if that function is bounded above by another function that is known to converge. Here is the formal statement of this theorem:

Lemma ex_RInt_gen_bound (g : R -> R) (f : R -> R) F G
  {PF : ProperFilter F} {PG : ProperFilter G} :
  filter_Rlt F G ->
  ex_RInt_gen g F G ->
  filter_prod F G
    (fun p => (forall x, fst p < x < snd p -> 0 <= f x <= g x) /\
       ex_RInt f (fst p) (snd p)) ->
    ex_RInt_gen f F G.

This statement exhibits a concept that we needed to devise, the concept of comparison between filters on the real line, which we denote filter_Rlt. This concept will be described in further detail in a later section. Three other lines in this theorem statement deserve more explanations, the lines starting at filter_prod. These lines express that a property must ultimately be satisfied for pairs of real numbers whose components tend simultaneously to the limits described by the filters F and G, which here also serve as bounds for two generalized Riemann integrals. This property is the conjunction of two facts, first for any argument between the pair of numbers, the function f is non-negative and less than or equal to g at that argument, second the function f is Riemann-integrable between the pair of numbers.

Using this theorem of bounded convergence, we can prove that the function

is integrable between and as soon as both and are positive, using the function

as the bounding function, where , and then proving that this one is integrable by showing that its integral is related to the arctangent function.

Having proved the integrability, we then define a function that returns the following integral value:

The definition is done in the following two steps:

Definition ellf (a b : R) x :=
   /sqrt ((x ^ 2 + a ^ 2) * (x ^ 2 + b ^ 2)).

Definition ell (a b : R) :=
  iota (fun v => is_RInt_gen (ellf a b)
             (Rbar_locally m_infty) (Rbar_locally p_infty) v).

The value of ell is properly defined when and are positive. This is expressed with the following theorems, and will be guaranteed in all other theorems where ell occurs.

Lemma is_RInt_gen_ell a b : 0 < a -> 0 < b ->
  is_RInt_gen (ellf a b)
     (Rbar_locally m_infty) (Rbar_locally p_infty) (ell a b).

Lemma ell_unique a b v : 0 < a -> 0 < b ->
  is_RInt_gen (ellf a b)
     (Rbar_locally m_infty) (Rbar_locally p_infty) v ->
  v = ell a b.
An order on filters.

On several occasions, we need to express that the bounds of improper integrals follow the natural order on the real line. However, these bounds may refer to no real point. For instance, there is no real number that corresponds to the limit , but it is still clear that this limit represents a place on the real line which is smaller than or . This kind of comparison is necessary in the statement of ex_RInt_gen_bound, as stated above, because the comparison between functions would be vacuously true when the bounds of the interval are interchanged.

We decided to introduce a new concept, written filter_Rlt to express that when tends to and tends to , we know that ultimately . To be more precise about the definition of filter_Rlt, we need to know more about the nature of filters.

Filters simply are sets of sets. Every filter contains the complete set of elements of the type being considered, it is stable by intersection, and it is stable by the operation of taking a superset. Moreover, when a filter does not contain the empty set, it is called a proper filter. For instance, the filter Rbar_locally p_infty contains all intervals of the form and their supersets, the filter locally x contains all open balls centered in x and their supersets, and the filter at_right x contains the intersections of all members of locally x with the interval .

With two filters and on types and , it is possible to construct a product filter on , which contains all cartesian products of a set in and a set in and their supersets. This corresponds to pairs of points which tend simultaneously towards the limits described by and .

To define a comparison between filters on the real line, we state that is less than if there exists a middle point , so that the product filter accepts the set of pairs such that . In other words, this means that as tends to and to , it ultimately holds that . In yet other words, if there exists an such that the filter contains and contains , then is less than . These are expressed by the following definition and the following theorem:

Definition filter_Rlt F1 F2 :=
  exists m, filter_prod F1 F2 (fun p => fst p < m < snd p).

Lemma filter_Rlt_witness m (F1 F2  : (R -> Prop) -> Prop) :
  F1 (Rgt m) -> F2 (Rlt m) ->  filter_Rlt F1 F2.

We proved a few comparisons between filters, for instance at_right is smaller than Rbar_locally p_infty for any real , at_left is smaller than at_right if , but at_right is only smaller than at_left when .

We can reproduce for improper integrals the results given by the Chasles relations for proper Riemann integrals. Here is an example of a Chasles relation: if is integrable between and and , then is integrable between and and between , and the integrals satisfy the following relation:

This theorem is provided in the Coquelicot library for , , and taken as real numbers. With the order of filters, we can simply re-formulate this theorem for and being arbitrary filters, and being a real number between them. This is expressed as follows:

Lemma ex_RInt_gen_cut (a : R) (F G: (R -> Prop) -> Prop)
   {FF : ProperFilter F} {FG : ProperFilter G} (f : R -> R) :
   filter_Rlt F (at_point a) -> filter_Rlt (at_point a) G ->
   ex_RInt_gen f F G -> ex_RInt_gen f (at_point a) G.

We are still considering whether this theorem should be improved, using the filter locally instead of at_point for the intermediate integration bound.

The theorem ex_RInt_gen_cut is used three times, once to establish equation (11) and twice to establish equation (12) at page 11.

From improper to proper integrals.

Through variable changes, improper integrals can be transformed into proper integrals and vice-versa. For instance, the change of variable leading to equation (11) actually leads to the correspondence.

The lower bounds of the two integrals correspond to each other with respect to the variable change , but the first lower bound needs to be considered proper for later uses, while the lower bound for the second integral is necessarily improper. To make it possible to change from one to the other, we establish a theorem that makes it possible to transform a limit bound into a real one.

Lemma is_RInt_gen_at_right_at_point (f : R -> R) (a : R) F
  {FF : ProperFilter F v} :
  locally a (continuous f) -> is_RInt_gen f (at_right a) F v ->
  is_RInt_gen f (at_point a) F v.

This theorem contains an hypothesis stating that should be well behaved around the real point being considered, the lower bound. In this case, we use an hypothesis of continuity around this point, but this hypothesis could probably be made weaker.

Limit equivalence.

Equations (13.1) and (13.2) at page 13 rely on the concept of equivalent functions at a limit. For our development, we have not developed a separate concept for this, instead we expressed statements as the ratio between the equivalent functions having limit 1 when the input tends to the limit of interest. For instance equation (13.1) is expressed formally using the following lemma:

Lemma M1x_at_0 : filterlim (fun x => M 1 x / (- PI / (2 * ln x)))
                  (at_right 0) (locally 1).

In this theorem, the fact that tends to on the right is expressed by using the filter (at_right 0).

We did not develop a general library of equivalence, but we still gave ourself a tool following the transitivity of this equivalence relation. This theorem is expressed in the following manner:

Lemma equiv_trans F {FF : Filter F} (f g h : R -> R) :
  F (fun x => g x <> 0) -> F (fun x => h x <> 0) ->
  filterlim (fun x => f x / g x) F (locally 1) ->
  filterlim (fun x => g x / h x) F (locally 1) ->
  filterlim (fun x => f x / h x) F (locally 1).

The hypotheses like F (fun x => g x <> 0) express that in the vicinity of the limit denoted by F, the function should be non-zero. The rest of the theorem expresses that if is equivalent to and is equivalent to , then is equivalent to . To perform this proof, we need to leave the realm of filters and fall back on the traditional framework.

Uniform convergence and derivatives.

During our experiments, we found that the concept of uniform convergence does not fit well in the framework of filters as provided by the Coquelicot library. The sensible approach would be to consider a notion of balls on the space of functions, where a function is inside the ball centered in if the value of is never further from the value of than the ball radius, for every in the input type. One would then need to instantiate the general structures of topology provided by Coquelicot to this notion of ball, in particular the structures of UniformSpace and NormedModule. In practice, this does not provide all the tools we need, because we actually want to restrict the concept of uniform convergence to subsets of the whole type. In this case the structure of UniformSpace is still appropriate, but the concept of NormedModule is not, because two functions that differ outside the considered subset may have distance 0 when only considering their values inside the subset.

The alternative is provided by a treatment of uniform convergence that was developed in Coq’s standard library of real numbers at the end of the 1990’s, with a notion denoted CVU , where is a sequence of functions from to , is a function from to , is a number in and is a positive real number. The meaning is that the sequence of function converges uniformly towards inside the ball centered in of radius . We needed a formal description of a theorem stating that when the derivatives of a convergent sequence of functions tend uniformly to a limit function , this function is the derivative of the limit of the sequence .

There is already a similar theorem in Coq’s standard library, with the following statement:

derivable_pt_lim_CVU :
forall fn fn’ f g x c r,
Boule c r x ->
(forall y n, Boule c r y ->
                          derivable_pt_lim (fn n) y (fn’ n y)) ->
(forall y, Boule c r y -> Un_cv (fun n : nat => fn n y) (f y)) ->
CVU fn’ g c r ->
(forall y : R, Boule c r y -> continuity_pt g y) ->
derivable_pt_lim f x (g x)

However, this theorem is sometimes impractical to use, because it requires that we already know the limit derivative to be continuous, a condition that can actually be removed. For this reason, we developed a new formal proof for the theorem, with the following statement222It turns out that the theorem derivable_pt_lim_CVU was already introduced by a previous study on the implementation of in the Coq standard library of real numbers BertotAllais14 .

Lemma CVU_derivable :
forall f f’ g g’ c r,
 CVU f’ g’ c r ->
 (forall x, Boule c r x -> Un_cv (fun n => f n x) (g x)) ->
 (forall n x, Boule c r x ->
                      derivable_pt_lim (f n) x (f’ n x)) ->
 forall x, Boule c r x -> derivable_pt_lim g x (g’ x).

In this theorem’s statement, the third line expresses that the derivatives f’ converge uniformly towards the function g’, the fourth line expresses that the functions f converge simply towards the function g inside the ball of center c and radius r, the fifth and sixth line express that the functions f are differentiable everywhere inside the ball and the derivative is f’, and the seventh line concludes that the function g is differentiable everywhere inside the ball and the derivative is g’. While most of the theorems we wrote are expressed using concepts from the Coquelicot library, this one is only expressed with concepts coming from Coq’s standard library of real numbers, but all these concepts, apart from CVU, have a Coquelicot equivalent (and Coquelicot provides the foreign function interface): Boule c r x is equivalent to Ball c r x in Coquelicot, Un_cv is equivalent to filterlim Eventually (locally ), and derivable_pt_lim is equivalent to is_derive.

We used the theorem CVU_derivable twice in our development, once to establish that function is differentiable everywhere in the open interval and the sequence of derivatives of the functions converges to its derivative, and once to establish that the derivatives of the functions converge to , as in equation (18).

Automatic proofs.

In this development, we make an extensive use of divisions and square root. To reason about these functions, it is often necessary to show that the argument is non-zero (for division), or positive (for square root). There are very few automatic tools to establish this kind of results in general about real numbers, especially in our case, where we rely on a few transcendental functions. For linear arithmetic formulas, there exists a tool call psatzl R DBLP:conf/types/Besson06

, that is very useful and robust in handling of conjunctions and its use of facts from the current context. Unfortunately, we have many expressions that are not linear. We decided to implement a semi-automatic tactic for the specific purpose of proving that numbers are positive, with the following ordered heuristics:

  • Any positive number is non-zero,

  • all exponentials are positive,

  • , , and are positive,

  • the power, inverse, square root of positive numbers is positive,

  • the product of positive numbers is positive,

  • the sum of an absolute value or a square and a positive number is positive,

  • the sum of two positive numbers are positive,

  • the minimum of two positive numbers is positive,

  • a number recognized by the psatzl R tactic to be positive is positive.

This semi-automatic tactic can easily be implemented using Coq’s tactic programming language Ltac. We named this tactic lt0 and it is used extensively in our development.

Given a function like , the Coquelicot library provides automatic tools (mainly a tactic called auto_derive) to show that this function is differentiable under conditions that are explicitly computed. For this to work, the tool needs to rely on a database of facts concerning all functions involved. In this case, the database must of course contain facts about exponentiation, square roots, and the inverse function. As a result, the tactic auto_derive produces conditions, expressing that must be positive and the whole square root expression must be non zero.

The tactic auto_derive is used more than 40 times in our development, mostly because there is no automatic tool to show the continuity of functions and we rely on a theorem that states that any differentiable function is continuous, so that we often prove derivability only to prove continuity.

When proving that the functions and are differentiable, we need to rely on a more elementary proof tool, called auto_derive_fun. When given a function to derive, which contains functions that are not known in the database, it builds an extra hypothesis, which says that the whole expression is differentiable as soon as the unknown functions are differentiable. This is especially useful in this case, because the proof that is differentiable is done recursively, so that there is no pre-existing theorem stating that and are differentiable when studying the derivative of . For instance, we can call the following tactic:

auto_derive_fun (fun y => sqrt (a_ n y * b_ n y)); intros D.

This creates a new hypothesis named D with the following statement:

 D : forall x : R,
    ex_derive (fun x0 : R => a_ n x0) x /\
    ex_derive (fun x0 : R => b_ n x0) x /\
    0 < a_ n x * b_ n x /\ True ->
    is_derive (fun x0 : R => sqrt (a_ n x0 * b_ n x0)) x
      ((1 * Derive (fun x0 : R => a_ n x0) x * b_ n x +
        a_ n x * (1 * Derive (fun x0 : R => b_ n x0) x)) *
       / (2 * sqrt (a_ n x * b_ n x)))

Another place where automation provides valuable help is when we wish to find good approximations or bounds for values. The interval tactic interval works on goals consisting of such comparisons and solves them right away, as long as it knows about all the functions involved. Here is an example of a comparison that is easily solved by this tactic:

   (1 + ((1 + sqrt 2)/(2 * sqrt (sqrt 2))))
          / (1 + / sqrt (/ sqrt 2)) < 1

An example of expression where interval fails, is when the expressions being considered are far too large. In our case, we wish to prove that

The numbers being considered are too close to 0 for interval to work.

The solution to this problem is to first use monotonicity properties of either the logarithm function (in the current version of our development) or the exponential function (in the first version), thus resorting to symbolic computation before finishing off with the interval tactic.

The interval tactic already knows about the constant, so that it is quite artificial to combine our result from formula (25) and this tactic to obtain approximations of but we can still make this experiment and establish that the member of the sequence is a good enough approximation to know all first 10 digits of . Here is the statement:

Lemma first_computation :
   3141592653/10 ^ 9 < agmpi 3 /\
     agmpi 3 + 4 * agmpi 0 * Rpower 531 (- 2 ^ 2)
    < 3141592654/10 ^ 9.

We simply expand fully agmpi, simplify instances of and using the equations (21), and then ask the interval tactic to finish the comparisons. We need to instruct the tactic to use 40 bits of precision. This takes some time (about a second for each of the two comparisons), and we conjecture that the expansion of all functions leads to sub-expression duplication, leading also to duplication of work. When aiming for more distant decimals, we will need to apply another solution.

4 Computing large numbers of decimals

Theorem provers based on type theory have the advantage that they provide computation capabilities on inductive types. For instance, the Coq system provides a type of integers that supports comfortable computations for integers with size going up to . Here is an example computation, which feels instantaneous to the user.

Compute (2 ^331)%Z.
     = 174980057982640953949800178169409709228253554471456994914
  06164851279623993595007385788105416184430592
     : Z

By their very nature, real numbers cannot be provided as an inductive datatype in type theory. Thus the Compute command will not perform any computation for the similar expression concerning real numbers. The reason is that while some real numbers are defined like integers by applying simple finite operations on basic constants like 0 and 1, other are only obtained by applying a limiting process, which cannot be represented by a finite computation. Thus, it does not make sense to ask to compute an expression like in the real numbers, because there is no way to provide a better representation of this number than its definition. On the other hand, what we usually mean by computing is to provide a suitable approximation of this number. This is supported in the Coq system by the interval tactic, but only when we are in the process of constructing a proof, as in the following example:

Lemma anything : 12 / 10 < sqrt 2.
Proof.
interval_intro (sqrt 2).


1 subgoal

  H : 759250124 * / 536870912 <= sqrt 2 <= 759250125 * / 536870912
  ============================
   12 / 10 < sqrt 2

What we see in this dialog is that the system creates a new hypothesis (named H) that provides a new fact giving an approximation of . In this hypothesis, the common numerator appearing in both fractions is actually the number . Concerning notations, readers will have to know that Coq writes / a for the inverse of a, so that 3 * / 2 is 3 times the inverse of 2. A human mathematician would normally write 3 / 2 and Coq also accepts this syntax.

One may argue that 759250124 * / 536870912 is not much better than sqrt 2 to represent that number, and actually this ratio is not exact, but it can be used to help proving that is larger or smaller than another number.

Direct computation on the integer datatype can also be used to approximate computations in real numbers. For instance, we can compute the same numerator for an approximation of by computing the integer square root of .

Compute (Z.sqrt (2 * (2 ^ 29) ^ 2)).
     = 759250124%Z
     : Z

This approach of computing integer values for numerators of rational numbers with a fixed denominator is the one we are going to exploit to compute the first million digits of , using three advantages provided by the Coq system:

  1. The Coq system provides an implementation of big integers, which can withstand computations of the order of .

  2. The big integers library already contains an efficient implementation of integer square roots.

  3. The Coq system provides a computation methodology where code is compiled into OCaml and then into binary format for fast computation.

4.1 A framework for high-precision computation

If we choose to represent every computation on real numbers by a computation on corresponding approximations of these numbers, we need to express how each operation will be performed and interpreted. We simply provide five values and functions that implement the elementary values of and the elementary operations: multiplication, addition, division, the number 1, and the number 2.

We choose to represent the real number by the integer where is a scaling factor that is mostly fixed for the whole computation. For readability, it is often practical to use a power of 10 as a scaling factor, but in this paper, we will also see that we can benefit from also using scaling factors that are powers of 2 or powers of 16. Actually, it is not even necessary that the scaling factor be any power of a small number, but it turns out that it is the most practical case.

Conversely, we shall note the real value represented by the integer . Simply, this number is .

When is the scaling factor, the real number is represented by the integer and the real number is represented by the number . So , . So, we define the following two functions to describe the representations of and with respect to a given scaling factor, in Coq syntax where we use the name magnifier for the scaling factor.

Definition h1 (magnifier : bigZ) := magnifier.
Definition h2 magnifier := (2 * magnifier)%bigZ.

When multiplying two real numbers and , we need to multiply their representations and take care of the scaling. To understand how to handle the scaling, we should look at the following equality:

To obtain the integer that will represent this result, we need to multiply the product of the represented numbers by and then take the largest integer below. This is

The combination of the division operation and taking the largest integer below is performed by integer division. So we define our high-precision multiplication as follow.

Definition hmult (magnifier x y : bigZ) :=
  (x * y / magnifier)%bigZ.

For division and square root, we reason similarly.

For addition, nothing needs to be implemented, we can directly use integer computation. The scaling factor is transmitted naturally (and linearly from the operands to the result). Similarly, multiplication by an integer can be represented directly with integer multiplication, without having to first scale the integer.

Here are a few examples. To compute to a precision of , we can run the following computation.

Compute let magnifier := (10 ^ 5)%bigZ in
  hdiv magnifier magnifier (3 * magnifier).
     = 33333%bigZ
     : BigZ.t_

The following illustrates how to compute to the same precision.

Compute let magnifier := (10 ^ 5)%bigZ in
  hsqrt magnifier (2 * magnifier).
     = 141421%bigZ
     : BigZ.t_

In both examples, the real number of interest has the order of magnitude of and is represented by a 5 or 6 digit integer. When we want to compute one million decimals of we should handle integers whose decimal representation has approximately one million digits. Computation with this kind of numbers takes time. As an example, we propose a computation that handles the 1 million digit representation of

and avoids displaying this number (it only checks that the millionth decimal is odd).

Time Eval  native_compute in
   BigZ.odd (BigZ.sqrt (2 * 10 ^ (2 * 10 ^ 6))).
     = true
     : bool
Finished transaction in 91.278 secs (90.218u,0.617s) (successful)

This example also illustrates the use of a different evaluation strategy in the Coq system, called native_compute. This evaluation strategy relies on compiling the executed code in OCaml and then on relying on the most efficient variant of the OCaml compiler to produce a code that is executed and whose results are integrated in the memory of the Coq system full-throttle . This strategy relies on the OCaml compiler and the operating system linker in ways that are more demanding than traditional uses of Coq. Still it is the same compiler that is being used as the one used to compile the Coq system, so that the trusted base is not changed drastically in this new approach.

When it comes to time constraints, all scaling factors are not as efficient. In conventional computer arithmetics, it is well-known that multiplications by powers of 2 are less costly, because they can simply be implemented by shifts on the binary representation of numbers. This property is also true for Coq’s implementation of big integers. If we compare the computation of when the scaling factor is or , we get a performance ratio of 1.5, the latter setting is faster even though the scaling factor and the intermediate values are slightly larger.

It is also interesting to understand how to stage computations, so that we avoid performing the same computation twice. For this problem, we have to be careful, because values that are precomputed don’t have the same size as their original description, and this may not be supported by the native_compute chain of evaluation. Indeed, the following experiment fails.

Require Import BigZ.

Definition mag := Eval native_compute in (10 ^ (10 ^ 6))%bigZ.

Time Definition z1 := Eval native_compute in
   let v := mag in (BigZ.sqrt (v * BigZ.sqrt (v * v * 2)))%bigZ.

This examples makes Coq fail, because the definition of mag with the pragma Eval native_compute in makes that the value is precomputed, thus creating a huge object of the Gallina language, which is then passed as a program for the OCaml compiler to compile when constructing z1. The compiler fails because the input program is too large.

On the other hand, the following computation succeeds:

Eval native_compute in
   let v := (10 ^ (10 ^ 6))%bigZ in
   (BigZ.sqrt (v * BigZ.sqrt (v * v * 2))).

4.2 The full approximating algorithm

Using all elementary operations described in the previous section, we can describe the recursive algorithm to compute approximations of in the following manner.

Fixpoint hpi_rec (magnifier : bigZ)
  (n : nat) (s2 y z prod : bigZ) {struct n} : bigZ :=
  match n with
  | 0%nat =>
      hmult magnifier (h2 magnifier + s2) prod
  | S p =>
      let sy := hsqrt magnifier y in
      let ny := hdiv magnifier (h1 magnifier + y) (2 * sy) in
      let nz :=
        hdiv magnifier (h1 magnifier + hmult magnifier z y)
          (hmult magnifier (h1 magnifier + z) sy) in
      hpi_rec magnifier p s2 ny nz
        (hmult magnifier prod
           (hdiv magnifier (h1 magnifier + ny)
                           (h1 magnifier + nz)))
  end.

This function takes as input the scaling factor magnifier, a number of iteration n, the integer s2 representing , the integer y representing for some natural number larger than 0, the integer z representing , and the integer prod representing the value

It computes an integer approximating , but not exactly this number. The number s2 is passed as an argument to make sure it is not computed twice, because it is already needed to compute the initial values of y, z, and prod. This recursive function is wrapped in the following functions.

Definition hs2 (magnifier : bigZ) :=
  hsqrt magnifier (h2 magnifier).

Definition hsyz (magnifier : bigZ) :=
  let hs2 := hs2 magnifier in
  let hss2 := hsqrt magnifier hs2 in
  (hs2, (hdiv magnifier (h1 magnifier + hs2) (2 * hss2)), hss2).

Definition hpi (magnifier : bigZ) (n : nat) :=
match n with
| 0%nat =>
    (h2 magnifier + (hs2 magnifier))%bigZ
| S p =>
    let ’(s2, y1 , z1) := hsyz magnifier in
    hpi_rec magnifier p s2 y1 z1
      (hdiv magnifier (h1 magnifier + y1)
         (h1 magnifier + z1))
end.

We can use this function hpi to compute approximations of at a variety of precisions. Here is a collection of trials performed on a powerful machine.

scale(iterations)
time 9s 4s 5m30s 2m30s

This table illustrates the advantage there is to compute with a scaling factor that is a power of 2. Each column where the scaling factor is a power of 2 gives an approximation that is slightly more precise than the column to its left, at a fraction of the cost in time. Even if our objective is to obtain decimals of , it should be efficient to first perform the computations of all the iterations with a magnifier that is a power of 2, only to change the scaling factor at the end of the computation, this is the solution we choose eventually.

There remains a question about how much precision is lost when so many computations are performed with elementary operations that each provide only approximations of the mathematical operation. Experimental evidence shows that when computing 17 iterations with a magnifier of the last two digits are wrong. The next section shows how we prove bounds on the accumulated error in the concrete computation.

5 Proofs about approximate computations

When proving facts about approximate computations, we want to abstract away from the fact that the computations are performed with a datatype that provides fast computation with big integers. What really matters is that we approximate each operation on real numbers by another operation on real numbers and we have a clear description of how the approximation works. In the next section, we describe the abstract setting and the proofs performed in this setting. In a later section, we show how this abstract setting is related to the concrete setting of computing with integers and with the particular datatype of big integers.

5.1 Abstract reasoning on approximate computations

In the case of fixed precision computation as we described in the previous section, we know that all operations are approximated from below by a value which is no further than a fixed allowance . This does not guarantee that all values are approximated from below, because one of the approximated operations is division, and dividing by an approximation from below may yield an approximation from above.

For this reason, most of our formal proofs about approximations are performed in a section where we assume the existence of a collection of functions and their properties.

The header of our working section has the following content.

Variables (e : R) (r_div : R -> R -> R) (r_sqrt : R -> R)
           (r_mult : R -> R -> R).

Hypothesis ce : 0 < e < /1000.

Hypothesis r_mult_spec :
  forall x y, 0 <= x -> 0 <= y ->
   x * y - e < r_mult x y <= x * y.

In this header, we introduce a constant e, which is used to bound the error made in each elementary operation, we assume that e is positive and suitably small, and then we describe how each rounded operation behaves with respect to the mathematical operation it is supposed to represent. For multiplication, the hypothesis named r_mult_spec describes that the inputs are expected to be positive numbers, and that the result of r_mul x y is smaller than or equal to the product, but the difference is smaller than e in absolute value. We have similar specification hypotheses for the rounded division r_div and the rounded square root r_sqrt. We then use these rounded operations to describe the computations performed in the algorithm.

We can now study how the computation of the various sequences of the algorithm are rounded, and how errors accumulate. Considering the sequence , the computation at each step is represented by the following expression.

r_div (1 + y) (2 * (r_sqrt y))

In this expression, we have to assume that y comes from a previous computation, and for this reason it is tainted with some error h. The question we wish to address has the following form: if we know that is tainted with an error that is smaller that a given allowance , can we show that is tainted with an error that is smaller than for some well-behaved function ? How much bigger than must be?

We were able to answer two questions:

  • if the accumulated error on computing is smaller than , then the accumulated error on computing is also smaller than (so for the sequence , the function is the identity function),

  • the allowance needs to be at least (and not more).

This is quite surprising. Errors don’t really accumulate for this sequence.

In retrospect, there are good reasons for this. Rounding errors in the division operation make the result go down, but rounding errors in the square root make the result go up. On the other hand, the input value may be tainted by an error , but this error is only multiplied by the derivative of the function

It happens that this derivative never exceeds in the region of interest.

As an illustration, let’s assume , we want to compute , and we are working with three digits of precision. The value of is but it is rounded down to . is but the rounded computation give , is . In our computation, we actually compute . This is an over approximation of , but this is rounded down to : the last rounding down compensates the over-approximation introduced when dividing by the previously rounded down square root. If our input representation of is an approximation, for example we compute with or , we still obtain .

In the end, the lemma we are able to prove has the following statement.

Lemma y_error e’ y h :
  e’ < /10 -> e <= e’ / 2 -> 1 <= y <= 71/50 -> Rabs h < e’ ->
  let y1 := (1 + y)/(2 * sqrt y) in
  y1 - e’ < r_div (1 + (y + h)) (2 * (r_sqrt (y + h))) < y1 + e’.

The proof is organized in four parts, where the first part consists in replacing the operations with rounding by expressions where an explicit error ratio is displayed. We basically construct a value e1, taken in the interval , so that the following equality holds.

r_sqrt (y + h) = sqrt (y + h) + e1 *  e’

We prefer to define e1 as a ratio between constant bounds, rather than a value in an interval whose bounds are expressed in e’, because the automatic tactic interval handles values between numeric constants better. We do the same for the division, introducing a ratio e2.

The second part of the proof consists in showing that the propagated error from previous computations has limited impact on the final error. This is stated as follows.

set (y2 := (1 + (y + h)) / (2 * sqrt (y + h))).
assert (propagated_error : Rabs (y2 - y1) < e’ / 14).

This step is proved by applying the mean value theorem, using the derivative of the function , which was already computed during the proof of convergence of the sequence. The interval tactic is practical here to show the absolute value of the derivative of that function at any point between y and y + h is below . The mean value theorem makes it possible to factor out the input error in the comparisons, so that we eventually obtain a comparison of an expression with a constant, which we resolve using the interval tactic.

The other two parts of the proof are concerned with providing a bound for the impact of the rounding errors introduced by the current computation. Each part is concerned with one direction, and in each case only one of the two possible rounding errors need to be considered.

The proof for the lemma y_error is quite long (just under 100 lines), but this is only a preliminary step for the proof of lemma z_error, which shows that the errors accumulated when computing the sequence can also be bounded in a constant fashion. The statement of this lemma has the following shape.

Lemma z_error e’ y z h h’ :
  e’ < /50 -> e <= e’ / 4 -> 1 < y < 51/50 -> 1 < z < 6/5 ->
  Rabs h < e’ -> Rabs h’ < e’ ->
  let v := (1 + z * y)/((1 + z) * sqrt y) in
  v - e’ < r_div (1 + r_mult (z + h’) (y + h))
            (r_mult (1 + (z + h’)) (r_sqrt (y + h))) < v + e’.

In this statement, the fragment

   r_div (1 + r_mult (z + h’) (y + h))
       (r_mult (1 + (z + h’) (r_sqrt (y + h)))

represents the computed expression with rounding operations, using inputs that are tainted by errors h and h’, while the fragment

(1 + z * y) /((1 + z) * sqrt y)

represents the ratio .

This proof is more complex. In this case, we are also able to show that errors do not grow as we compute more elements of the sequence: they stay stable at about 4 times the elementary rounding error introduced by each rounding operation. The proof of this lemma is around 170 lines long.

The next step in the computation is to compute the product of ratios . For each ratio, we establish a bound on the error as expressed by the following lemma.

Lemma quotient_error : forall e’ y z h h’, e’ < / 40 ->
  Rabs h < e’ / 2 -> Rabs h’ < e’ -> e <= e’ / 4 ->
  1 < y < 51 / 50 -> 1 < z < 6 / 5 ->
  Rabs (r_div (1 + (y + h)) (1 + (z + h’)) -
              (1 + y)/(1 + z)) <  13 / 10 * e’.

The difference between the second hypothesis (on Rabs h) and the third hypothesis Rabs h’ handles the fact that we don’t have as precise a bound on error for the computation of and for . The result is that the error on the ratio is bounded at a value just above 5 times the elementary error e.

It remains to prove a bound on the error introduced when computing the iterated product. This is done by induction on the number of iterations. The following lemma is used as the induction step: when p represents the product of terms and v represents one of the ratios, the product of p and v with accumulated errors, adding the error for the rounded multiplication increases by the error on the ratio.

Lemma product_error_step :
  forall p v e1 e2 h h’, 0 <= e1 <= /100 -> 0 <= e2 <= /100 ->
    e < /5 * e2 -> /2 < p < 921/1000 ->
    /2 < v <= 1 -> Rabs h < e1 -> Rabs h’ < e2 ->
    Rabs (r_mult (p + h) (v + h’) - p * v) < e1 + 23/20 * e2.

At this point we write functions rpi_rec and rpi so that they mirror exactly the functions hpi_rec and