1 Exact real computation
Exact real computation is an approach to numerical computation where real numbers appear as first class objects and the semantics of operations over them exactly agrees with the usual mathematical operations. In practice, real number objects support an extraction of approximations to any requested arbitrarily high accuracy. This approach is applied not only to real numbers but also to elements of other continuum spaces, for example, real continuous functions, real differentiable functions, real analytic functions and various subsets of Euclidean spaces. Exact real computation is an implementation of the theory of computable analysis. For background on computability in analysis seee.g., [25, 22, 29, 24].
We start with a brief overview of how real numbers are represented and used in AERN2, our Haskell library for exact real computation.
An approximation to a real number with a known bound of the approximation error is represented by a “ball” with a centre and a non-negative radius , where is the set of dyadic numbers. Let be the set of such balls (equivalently, intervals with dyadic endpoints) and .
The centre and radius are represented in a floating-point-like format111Currently, the centre and radius are MPFR numbers by default.. The centre has an unlimited precision (except for physical computer constraints) and the radius has a mantissa precision fixed at 53 bits. To emphasise the use of multi-precision numbers, the ball type is called MPBall in our Haskell implementation222To reproduce the script, start ghci using stack ghci aern2-fnreps:exe:aern2-fnreps-ops or equivalent.:
An MPBall implicitly holds a floating-point precision, which defines how much rounding is used in operations involving the ball. In binary operations, the higher of the two precisions is used. This is apparent in the following computations:
1.2 Bottom-up type derivations
MPBall implements the standard Haskell numerical type classes Num, Fractional and Floating. Nevertheless, by default AERN2 modules use mixed-types-num333https://hackage.haskell.org/package/mixed-types-num, an altered version of the Haskell standard Prelude, in which the numerical and related type classes are replaced by different ones, inspired by dynamically-typed languages. The type of an expression is derived bottom-up, i.e., integer literals are always of type Integer and rational literals of type Rational, binary operations and relations support operands of different types, defining the result type depending on the operand types:
Partial operations such as division return values in an error-collecting monad, instead of throwing an exception:
The latter example above indicates a potential error instead of a certain error because the ball t1-t1 contains zero but also non-zero values and the ball t1 is treated as a set of values, in which only one is the intended one but we do not know which one. The dependency between the two copies of t1 is lost.
The name CN refers to a type function whose canonical name is much longer, which means that types reported by ghc are harder to read:
For convenient propagation, values in this monad can be used as operands:
When there is no risk of a numerical error, one can use alternative, exception-throwing operations:
1.3 Partial ball comparisons
Interval and ball comparisons can be blurred when balls overlap. To facilitate safe and convenient ball comparisons, comparison relations over MPBall do not return a Bool as in standard Prelude but Maybe Bool:
For convenience, there are also “surely” and “maybe” comparison relations that return a Bool:
1.4 Real numbers
Exact real numbers in AERN2 are represented as functions from an accuracy specification to an MPBall. Accuracy is specified using bitsSG s g where s is a strict bound on the ball radius in terms of bits and g is an indicative guide for the size of the error:
Comparisons for real numbers are not decidable. This means that real number comparisons cannot have a simple Boolean return type. In AERN2, such comparisons return a function from accuracy to Maybe Bool, which allows us to try and decide comparison relation with a certain effort and test whether it succeeded or failed:
Also, due to the infinite nature of real numbers, partial functions into the real numbers usually cannot simply return a CN CauchyReal. Instead, partial real functions usually return a CauchyRealCN, which encodes a function from accuracy to CN MPBall:
The above example also demonstrates that the formatting function show, when applied to a real number, uses a default accuracy.
1.5 Landscape of representations
Various approaches to representing real numbers have been proposed and implemented. Some theoretically oriented works represent real numbers as steams of signed binary digits or generalised digits. For example, Escardó’s RealPCF  uses streams of contractive affine refinements and Potts et al’s IC-Reals  uses streams of contractive linear fractional transformations.
According to the results of the competitions [3, 21], stream-based approaches tend to be relatively slow. The fastest implementations of exact real arithmetic seem to be iRRAM , written in C++. This package first uses interval/ball arithmetic at a fixed precision. If the precision is insufficient to decide branch conditions or final results are not sufficiently accurate, the computation is scrapped and restarted with a higher precision. This approach is repeated until the program succeeds. In this approach there are no real number objects but the program as a whole has a real number semantics. AERN2 allows one to write a program featuring real numbers as an abstract type and execute the same program either in the iRRAM manner or using exact real objects.
There seems to be a consensus that the most practically feasible representation of real numbers is via sequences or nets of interval/ball approximations, whether or not the sequences appear directly as first-class objects or is obtained indirectly via iRRAM-style re-computations with increasing precision.
The situation is much less clear regarding the representation of continuous real functions. There are many candidate representations and comparing their computational complexity and practical performance is a matter of ongoing research. This paper gives an overview of the most prominent candidate representations and their implementations in AERN2, while our other paper  focuses on comparing these representations. We first review which operations involving continuous functions are to be implemented for these representations.
2 Operations of interest on
In this section we switch attention to the space of continuous real functions over the interval and various operations of interest over this space. First of all, it should be possible to evaluate a real function at a point:
Currently, our main applications for the arithmetic over are range computation and integration for unary real functions:
In AERN2 the operation is expressed as follows:
The type class constraint CanMaximiseOverDom f d declares that the specific representation type f can use the function maximumOverDom with domains of type d. The type function MaximumOverDomType specifies how the resulting real number will be represented. The domain is typically an Interval Dyadic and the result is typically either MPBall or CauchyReal. If the function is a ball-like function approximation similar to MPBall, the result is an MPBall, and, if the function is exact, the result is a CauchyReal.
The integral operation is represented in AERN2 analogously, using the function integrateOverDom, type class CanIntegrateOverDom, and type function
Typically, a unary continuous real function is given by a symbolic expression with one real variable, such as:
Nevertheless, sometimes a function is given without a symbolic representation, for example, if it is a solution of a differential equation or it comes from another “black box” or external source. Here we focus on computation that can be applied not only to functions that are given symbolically, but to any (unary, bounded-domain) continuous real function. (In the following section we define a number of representations that can accommodate all functions.) We will therefore not make any use of a symbolic representation even when we have one.
To build new functions from existing functions, we should be able to apply common real operations pointwise to continuous functions:
We also require pointwise applications of common trigonometric functions. In Haskell and AERN2, there is no need to define new syntax for these pointwise function operations. We simply make our function types instances of the numeric type classes and use the standard operation syntax. For example, the parameter x in for our polymorphic function bumpy can be a function as well as a number:
where x_BF is the identity function over real numbers and UnaryBallFun is one of our representations of functions. The result function of type UnaryBallFun is built by the pointwise function operations , , , , and as well as by implicit coercions of integers into constant functions.
The identity function over some interval domain d is typically defined as follows:
is a dummy variable name. The polymorphic functionvarFn requires a variable name so that it can be used also for building multi-variate projection functions . The type of the variable name depends on the function type. In our case the variable type is always the unit type () with the unique dummy value ().
We consider also a few non-pointwise operations, namely composition, primitive function and parametric maximisation, although AERN2 does not yet have implementations of these for all of our representations:
Note that differentiation is not a computable operation on the subset of differentiable functions in .
3 Representations of
In this section we introduce a number of representations of and their AERN2 Haskell implementations. The Haskell implementations actually support also partial real functions using the CN monad for the result values. Nevertheless, we will focus only on total functions in this paper.
Some representations encode a real function as a program-level function that returns approximations of the functions at different points or over small intervals. Other representations use convergent collections of polynomials or similar approximations, each close to the function over the whole of its domain. Finally, locally approximating representations combine features of both of these approaches.
3.1 Point-evaluating representations
Package aern2-fun defines the following representations of 444Actually, these representations support over any compact real interval , not only .:
BFun, Haskell type UnaryBallFun, encodes an by a Haskell function of type CN MPBall -> CN MPBall, with the properties:
DBFun, Haskell type UnaryBallDFun, encodes an by a pair of UnaryBallFun, with representing exactly as in BFun, and representing the (partially defined) derivative in the following sense:
(For convenience, the pair is encoded as two elements of a list.)
Fun, Haskell type UnaryModFun, encodes an by a pair of Haskell functions:
with the properties:
Variants of Fun are typically used in studies that focus on computability. The function is a (localised) modulus of continuity of the encoded function.
Variants of BFun are typically used in implementations of exact real arithmetic. BFun is usually implemented via an arbitrary-precision interval arithmetic. As there are good implementations of interval arithmetic, BFun is a sensible practical choice. DBFun is a simple adaptation of BFun for (almost everywhere) differentiable functions.
3.1.1 Basic Operations
Pointwise operations, constant function and identity function constructions are implemented for BFun and DBFun by the usual interval extensions of the operations and functions (and also their derivatives in case of DBFun). For example, pointwise division for DBFun is the following mapping on pairs of partial ball functions:
Pointwise operations for Fun encodings are more subtle than those for BFun due to the need to define a valid modulus of continuity for the result. For example, the Fun reciprocal is the following mapping of function encodings:
where returns an integer with , and is the smallest or the second-smallest integer with this property.
3.2 Globally approximating representations
The representations Fun, BFun and DBFun all have in common that they are local in the sense that they describe a function a point at a time, or a small neighbourhood at a time. The following representations are global in the sense that they provide information about a function over its complete interval domain at once. The simplest such representation describes a function by an indexed family of “polynomial balls” that converge to the function.
3.2.1 Polynomial balls
A polynomial ball is a pair where the centre is a univariate polynomial with dyadic coefficients and is a non-negative dyadic radius.
Let denote the set of polynomials with dyadic coefficients
and the set of all polynomial balls.
For we write iff
For , let .
We use the AERN2 type ChPoly MPBall to represent . ChPoly t is a type of polynomials in the Chebyshev basis with coefficients of type t555Note that a polynomial has dyadic coefficients in the Chebyshev basis iff it has dyadic coefficients in the monomial basis.. Technically, ChPoly MPBall stands for polynomials with ball coefficients, but we use only a subset which coincides with polynomial balls. To do this, we enforce the invariant that all coefficients except the coefficient of the constant term have zero radius, i.e., they are dyadic numbers. The radius of the constant term is the radius of the polynomial ball. The following example illustrates the basics of working with type ChPoly MPBall:
Note that when constructing a ChPoly MPBall, it is necessary to specify an interval domain for the ball. Since the Chebyshev basis works well only on the domain , the type ChPoly t contains a specification of a dyadic interval domain which is transparently translated to the internal domain . In the above example we used the domain . Repeating it over the domain gives a different degree reduction:
The quality of the polynomial degree reduction is down to using the Chebyshev basis. In the monomial basis, a degree reduction over would replace the quadratic term with , leading to the polynomial [1 <2^0] + [2 0]*x. Over the domain it would lead to [3 <2^1] + [2 0]*x.
Normally, in AERN2 code we do not directly reduce the degree of polynomials. Instead, there is automatic “sweeping”, ie dropping insignificant Chebyshev terms, in most operations. Terms are sweeped as much as possible while respecting a given accuracy guide. The accuracy guide is an accuracy value embedded in each polynomial ball. In the above examples, it is part of the parameter for varFn, namely bits 1, which roughly corresponds to a permitted accuracy loss of . In our simple example, we see the effect of the automatic sweeping only when we make the accuracy guide extremely loose:
3.2.2 Rational function balls
A rational function ball over the domain is a pair of polynomial balls , over and a dyadic such that for each dyadic . Note that is either strictly positive or strictly negative over .
Let denote the set of all rational function balls over the domain .
For we write iff .
The nominal width of is defined as
where , respectively , is the lower bound, respectively upper bound, of over , computed by the maximisation algorithm described in Section 4 with accuracy .
Within our implementation, rational function balls are represented by the Haskell type Frac MPBall. A value of the type Frac MPBall is a record formed of two ChPoly MPBall values, one for the denominator and one for the numerator, and an MPBall which is a lower bound to the denominator on the domain.
3.2.3 Piecewise polynomial balls
A piecewise polynomial ball over the domain comprises:
A dyadic partition ,
A family .
While each is defined over , it is only used over the domain . For example, the width of is
In our implementation, piecewise polynomials are given by the Haskell type PPoly MPBall. A value of PPoly MPBall is a record formed of a value of type DyadicInterval, representing the domain, and a list of pairs of type (DyadicInterval, ChPoly MPBall) representing the partition of the domain together with the polynomial approximations on each piece of the partition.
3.2.4 Cauchy representations
Sequences of polynomials or rational functions are used in the following representations of :
Poly encodes an as a sequence , Haskell type Accuracy -> ChPoly MPBall, with and .
PPoly encodes an as a sequence of piecewise polynomial balls converging to . The sequence has the Haskell type Accuracy -> PPoly, with and .
Frac encodes an as a sequence of rational function balls converging to . The sequence has the Haskell type Accuracy -> Frac, with and .
Note that these sequences are not necessarily fast converging, i.e., the rate of convergence is not necessarily . Although our implementations aim for this fast rate, we have not found an efficient way to guarantee it.
3.3 Locally approximating representations
Now we define representations that combine features of both point-evaluating representations and globally-approximating representations where a polynomial or rational approximation with arbitrarily high accuracy is available over any dyadic sub-interval of the domain .
The representation encodes by
a dependent-type function that maps each to a Poly-name of .
Its Haskell type is
DyadicInterval -> Accuracy -> PPoly.
Representations and are defined analogously.
4 Range computation
4.1 Range computation for evaluation-based representations
The representations , , and use a simple maximisation algorithm based on subdivision. All three representations encode a function via an interval inclusion. Our maximisation algorithm takes as input an interval inclusion and returns as output the global maximum as a real number.
Our maximisation algorithm has to take into account that the types UnaryBallFun and UnaryBallDFun produce outputs of type CN MPBall, which may represent either an interval value or an error. In order to model this mathematically, let us introduce the space of dyadic rational intervals with a bottom element added. This bottom element is intended to represent an undefined value. We say that a point is defined if it is different from . The arithmetic operations on extend to by letting the result of the operation be if any of the operands is . We say that is certainly smaller than if both and are defined and the right endpoint of is smaller than the left endpoint of .
A maximisation segment for a function is a tuple with and . We define a total preorder on the set of all maximisation segments of as follows: If and are maximisation segments then if and only if is undefined or is defined and the right endpoint of is smaller than the right endpoint of .
4.2 Real root counting
Our maximisation algorithm for representations based on (local) polynomial or rational approximations is essentially an improved version of Algorithm 1. It enhances Algorithm 1 with a real root counting technique that allows us to locally identify critical points and regions of monotonicity of the polynomial approximation.
This real root counting technique goes back to Vincent  and Uspensky  and is based on counting sign variations in the Bernstein basis. We follow here the presentation in [1, Chapter 10] (see also the bibliographical notes there).
Let be a polynomial of degree at most with integer coefficients. Let
be some bounded open interval. Our goal is to find a good estimate for the number of real roots ofin (counted with multiplicity).
We start with an elementary observation known as Descartes’s law of signs: the number of sign variations in the coefficients666When counting the sign variations, zeroes are ignored, i.e., the number of sign variations in a list is equal to the number of sign variations of the list with all zeroes removed (with the convention that the number of sign variations in the empty list is zero). of is an upper bound for the number of positive real roots, counted with multiplicity, and the difference between the number of sign variations and the number of positive real roots is even. In particular, if there are no sign variations then has no positive real roots, and if there is exactly one sign variation then has a unique (and simple) positive real root. In order to use this idea to estimate the number of roots in we apply a transformation to to obtain a polynomial whose positive real roots correspond to the roots of in . We define three basic transformations:
Now consider the polynomial
Intuitively, the roots in the open interval are first shifted to the interval , then contracted into the interval , then their reciprocal is taken, sending them to , and finally they are shifted to . Thus, the roots of in correspond to the roots of in . Interestingly, the application of this transformation can be viewed as a change of basis: The Bernstein polynomials of degree for are
These polynomials form a partition of unity and a basis of polynomials of degree at most . One can show that is just the representation of in the Bernstein basis for of degree . Hence, we can estimate the number of real roots of on by first translating into the Bernstein basis and then counting the sign variations. Let us proceed to show that this estimate is sufficiently good to yield a polynomial-time root counting algorithm.
If is a list of numbers, we denote by the number of sign variations in this list. Let us denote the coefficients of in the Bernstein basis of degree for by . Let denote the space of compact intervals with rational endpoints, including degenerate intervals (i.e., points). The following Lemma combines Descartes’s law of signs with the “Theorem of three circles” from .
Let , let be a compact rational interval. Then
If then has no roots in .
If then has a unique root in .
If has no complex roots in the disk with diameter , then .
If has a unique simple complex root in the union of the two disks which circumscribe the equilateral triangles based in , then .
The coefficients of a polynomial in the Bernstein basis can be computed in polynomial time. If the coefficients for an interval are known, the coefficients for subintervals and can be computed from these coefficients in a more efficient manner:
Lemma 2 ().
There exists a polytime algorithm which takes as input a polynomial and an interval and outputs the list of coefficients of .
There exists a polytime algorithm which takes as input an interval , a point (not necessarily in ), and the list of coefficients of a polynomial represented in the Bernstein basis for , and outputs the coefficients of in the Bernstein bases for and for respectively.
The root counting technique we have sketched here can be used to isolate all real roots of a polynomial in an interval in polynomial time. In order to get rid of multiple roots, we first compute the separable part of , which can be done in polynomial time using signed subresultant sequences (see e.g., [1, Algorithm 8.23]). Then we translate to , which by Lemma 2.1 can be done in polynomial time, and count the real roots. If the result is different from or , we use Lemma 2.2 to compute and and apply this idea recursively, removing all intervals with zero sign variations, keeping all intervals with one sign variation, and splitting all intervals with more sign variations. For a proof that this will take polynomial time see [1, Algorithm 10.5].
This in turn yields an algorithm for computing the global maximum of a polynomial on an interval in polynomial time: isolate the real roots of the separable part of the derivative . Use the bisection method to approximate the roots up to sufficient accuracy. Evaluate on the approximate roots and on the endpoints of the interval and take the maximum over the list of results.
Our maximisation algorithm can be viewed as a combination of this idea with the subdivision scheme used in Algorithm 1.
4.3 A generic interface for range computation
Our algorithm provides a generic interface that allows us to use it in several different contexts. For an interval , let denote its radius.
Let be a real function defined on some compact interval. A tuple
is called local maximisation data for on with accuracy if:
for all intervals .
for some and all .
whenever in the Hausdorff metric.
for some polynomial which has the same roots in as the derivative of some function with for all .
and have the same sign for all .
Note that the third condition in particular applies to degenerate intervals, and hence implies that for all .
Let be a real function defined on some compact interval. A maximiser for is a function
such that is local maximisation data for on with accuracy , subject to the following two monotonicity conditions:
If and then the size of the last three components of is smaller than the size of the last three components of .
If and then, if and , then for all .
Note that the inequality for and in the second condition is the opposite of the inequality in the first condition.
The generic interface is used by the representations , , and as well as by their local counterparts. Recall that a -name of a function is a function
for all . A -name can be viewed as a special case of this, where is independent of and .
Given we can compute a maximiser for as follows: Given , compute the polynomial and translate it to the monomial basis. Let
where is the evaluation function of the polynomial
, the vectorrepresents in the Bernstein basis on , where is the of the denominators of the coefficients of in the monomial basis, and . Here, the Bernstein coefficients are computed from a representation in the monomial basis as outlined in the beginning of Section 4.2.
Analogously, a -name of a function is a function
for all . Again, a -name can be viewed as a special case of this, where is independent of and .
Given we can compute a maximiser for as follows: Given , compute the rational function and translate both and into the monomial basis. Let
where is the evaluation function of the rational function , , being the of the denominators of the coefficients of in the monomial basis, and .
4.4 The maximisation algorithm for approximation-based representations
Let be a continuous real function. Let be local maximisation data for , where . A maximisation interval for with data is given by a union type with two variants:
A search interval is a tuple
where , contains the maximum of over the interval , and for some constant .
A critical interval is a tuple
where , and contains the maximum of over the interval
In both cases we call and the endpoints of the maximisation interval, the value, and the accuracy.
Maximisation intervals are endowed with the following total preorder: Let and be maximisation intervals for , not necessarily associated with the same local approximation data. Then if and only if the right endpoint of the value of is smaller than the right endpoint of the value of . Our algorithm relies on two auxiliary algorithms for creating search intervals:
We are now ready to describe our maximisation algorithm.
Algorithm 4 is correct.
The queue will never be empty, for whenever an interval is removed from the priority queue, either the algorithm terminates or at least one new interval is added to the queue. The algorithm will terminate eventually, for if a maximisation interval is removed from the queue, then by the Lipschitz condition on the function (Definition 3.3) and the monotonicity of the maximiser (Definition 4.2) the radius of the values of the maximisation intervals which are added to the queue is at most half the radius of the value of , and the algorithm terminates as soon as an interval with sufficiently small radius is removed.
Let . By construction, in every iteration of the loop, one of the maximisation intervals in the queue contains . We claim that if an interval with value is removed from the queue then has to contain . Since contains a value of , we have . Thus, if does not contain the maximum, then . But there exists some interval in the queue which contains the maximum of . Let denote its value. Then we have and hence , contradicting the fact that is removed first. It follows that if is removed and the radius of its value is smaller than , then is a valid output. ∎
In contrast to Algorithm 4, Algorithm 5 uses a single global approximation rather than local approximations. It processes every element of the priority queue in every step, rather than just looking at the “most promising” element. It thus simulates a certain worst-case scenario for Algorithm 4. By the monotonicity assumption on maximisers, the running time of Algorithm 4 is majorised by the running time of Algorithm 5. In order to estimate the running time of Algorithm 5 we need a more technical estimate on the growth of the bitsize of the coefficients in the Bernstein basis.
Lemma 7 ().
Consider the algorithm from Lemma 2.2 which computes from the Bernstein coefficients , the coefficients and . If is a bound on the bitsize of the elements of and is a bound on the bitsize of and , then the bitsize of the elements of and is bounded by .
Algorithm 5 runs in polynomial time.
Let be the local maximisation data which is used by the algorithm, where . The main thing to show is that the number of intervals the algorithm processes is bounded polynomially in and the size of . The intervals and coefficients considered in the algorithm can be arranged in a binary tree as follows:
Label the root with the interval .
If is the label of a node of the tree…
If has diameter smaller than , then the node is a leaf. Otherwise…
If the node is a leaf.
If the node is a leaf.
If the node has two successors, labelled respectively with and .
It follows from the Lipschitz condition on that the height of the tree is bounded by . At each level of the tree, consider the number of nodes with . By Lemma 1.3, each such node can be associated with a complex root of the polynomial . Hence there are at most nodes which aren’t leaves at each level. It follows that the number of nodes in the tree is bounded polynomially in and . It remains to show that the size of the Bernstein coefficients associated with each interval is bounded polynomially in and the bitsize of . Let us write for the number of terms in . Let denote a bound on the bitsize of the elements of and denote a bound on the bitsize of and . Then by Lemma 7, the size of and is bounded by and the size of is bounded by . It follows that the size of the Bernstein coefficients on the