    # Certified Roundoff Error Bounds using Bernstein Expansions and Sparse Krivine-Stengle Representations

Floating point error is a drawback of embedded systems implementation that is difficult to avoid. Computing rigorous upper bounds of roundoff errors is absolutely necessary for the validation of critical software. This problem of computing rigorous upper bounds is even more challenging when addressing non-linear programs. In this paper, we propose and compare two new algorithms based on Bernstein expansions and sparse Krivine-Stengle representations, adapted from the field of the global optimization, to compute upper bounds of roundoff errors for programs implementing polynomial and rational functions. We also provide the convergence rate of these two algorithms. We release two related software package FPBern and FPKriSten, and compare them with the state-of-the-art tools. We show that these two methods achieve competitive performance, while providing accurate upper bounds by comparison with the other tools.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Theoretical models, algorithms, and programs are often analyzed and designed in real algebra. However, their implementation on computers often uses floating point algebra: this conversion from real numbers and their operations to floating point is not without errors. Indeed, due to finite memory and binary encoding in computers, real numbers cannot be exactly represented by floating point numbers. Moreover, numerous properties of the real algebra are not preserved such as associativity.

The consequences of such imprecisions become particularly significant in safety-critical systems, especially in embedded systems which often include control components implemented as computer programs. When implementing an algorithm designed in real algebra, and initially tested on computers with single or double floating point precision, one would like to ensure that the roundoff error is not too large on more limited platforms (small processor, low memory capacity) by computing their accurate upper bounds.

For programs implementing linear functions, SAT/SMT solvers as well as affine arithmetic are efficient tools to obtain good upper bounds. When extending to programs with non-linear polynomial or rational functions, the problem of determining a precise upper bound becomes substantially more difficult, since polynomial optimization problems are in general NP-hard . We can cite at least three closely related and recent frameworks designed to provide upper bounds of roundoff errors for non-linear programs. FPTaylor  is a tool based on Taylor-interval methods, while Rosa  combines SMT with interval arithmetic. Real2Float  relies on Putinar representations of positive polynomials while exploiting sparsity in a similar way as the second method that we propose in this paper.

The contributions of this paper are two methods, coming from the field of polynomial optimization, to compute upper bounds on roundoff errors of programs involving polynomial or rational functions. The first method is based on Bernstein expansions of polynomials, while the second relies on sparse Krivine-Stengle certificates for positive polynomials. In practice, these methods (presented in Section 3) provide accurate bounds at a reasonable computational cost. Indeed, the size of the Bernstein expansions used in the first method as well as the size of the LP relaxation problems considered in the second method are both linear w.r.t. the number of roundoff error variables.

### 1.1 Overview

Before explaining in detail each method, let us first illustrate the addressed problem on an example. Let be the degree two polynomial defined by:

 f(x):=x2−x ,∀x∈X=[0,1].

When approximating the value of at a given real number , one actually computes the floating point result , with all the real operators ,, being substituted by their associated floating point operators , , , and being represented by the floating point number (see Section 2.1 for more details on floating point arithmetics). A simple rounding model consists of introducing an error term for each floating point operation, as well as for each floating point variable. For instance, corresponds to , where is the error term between and , and is the one associated to the operation . Let

be the vector of all error terms

. Given for all , with being the machine precision, we can write the floating point approximation of as follows:

 ^f(x,\eb)=(((1+e1)x(1+e1)x)(1+e2)−x(1+e1))(1+e3).

Then, the absolute roundoff error is defined by:

 r(x,\eb):=maxx∈[0,1]\eb∈[−ε,ε]3(|^f(x,\eb)−f(x)|)  .

However, we can make this computation easier with a slight approximation: with being the sum of the terms of which are linear in , and the sum of the terms which are non-linear in . The term can then be over-approximated by which is in general negligible compared to , and can be bounded using standard interval arithmetic. For this reason, we focus on computing an upper bound of . In the context of our example, is given by:

 l(x,\eb)=(2x2−x)e1+x2e2+(x2−x)e3. (1)

We divide each error term by , and then consider the (scaled) linear part of the roundoff error with the error terms . For all , and , one can easily compute a valid upper bound of with interval arithmetic. Using the same notation for elementary operations in interval arithmetic, one has , yielding .
Using the first method based on Bernstein expansions detailed in Section 3.1, we obtained as an upper bound of after s of computation using FPBern(b) a rational arithmetic implementation. With the second method based on sparse Krivine-Stengle representation detailed in Section 3.2, we also obtained an upper bound of in s.
Although on this particular example, the method based on sparse Krivine-Stengle representations appears to be more time-efficient, in general the computational cost of the method based on Bernstein expansions is lower. For this example, the bounds provided by both methods are tighter than the ones determined by interval arithmetic. We emphasize the fact that the bounds provided by our two methods can be certified. Indeed, in the first case, the Bernstein coefficients (see Sections 2.2 and 3.1) can be computed either with rational arithmetic or certified interval arithmetic to ensure guaranteed values of upper bounds. In the second case, the nonnegativity certificates are directly provided by sparse Krivine-Stengle representations.

### 1.2 Related Works

We first mention two tools, based on positivity certificates, to compute roundoff error bounds. The first tool, related to , relies on an approach similar to our second method. It uses dense Krivine-Stengle representations of positive polynomials to cast the initial problem as a finite dimensional LP problem. To reduce the size of this possibly large LP, 

provides heuristics to eliminate some variables and constraints in the dense representation. However, this approach has the main drawback of loosing the property of convergence toward optimal solutions of the initial problem. Our second method uses sparse representations and is based on the previous works

 and , allowing to ensure the convergence towards optimal solutions while greatly reducing the computational cost of LP problems. Another tool, Real2Float , exploits sparsity in the same way while using Putinar representations of positive polynomials, leading to solving semidefinite (SDP) problems. Bounds provided by such SDP relaxations are in general more precise than LP relaxations , but their solving cost is higher.
Several other tools are available to compute floating point roundoff errors. SMT solvers are efficient when handling linear programs, but often provide coarse bounds for non-linear programs, e.g. when the analysis is done in isolation . The Rosa  tool is a solver mixing SMT and interval arithmetic which can compile functional SCALA programs implementing non-linear functions (involving operations and polynomials) as well as conditional statements. SMT solvers are theoretically able to output certificates which can be validated externally afterwards. FPTaylor tool  relies on Symbolic Taylor expansion method, which consists of a branch and bound algorithm based on interval arithmetic. Bernstein expansions have been extensively used to handle systems of polynomial equations and inequalities, as well as polynomial optimization (see for example [9, 10, 11, 12]). In , the authors provide a method to extend the range of Bernstein expansions to handle the case of rational function over a box. This approach consists of expanding both numerators and denominators. Yet, to the best of our knowledge, there is no tool based on Bernstein expansions in the context of roundoff error computation. The Gappa tool provides certified bounds with elaborated interval arithmetic procedure relying on multiple-precision dyadic fractions. The static analysis tool FLUCTUAT  performs forward computation (by contrast with optimization) to analyze floating point C programs. Both FLUCTUAT and Gappa use a different rounding model (see Section 2.1), also available in FPTaylor, that we do not handle in our current implementation. Some tools also allow formal validation of certified bounds. FPTaylor, Real2Float , as well as Gappa  provide formal proof certificates, with HOL-Light  for the first case, and Coq  for the two other ones.

### 1.3 Key Contributions

Here is a summary of our key contributions:

• We present two new algorithms to compute upper bounds of floating point roundoff errors for programs involving multivariate polynomials. The first algorithm is based on Bernstein expansions and handle programs implementing rational functions with box constrained input sets. The second algorithm relies on sparse Krivine-Stengle representations and handles programs implementing polynomial functions with input sets defined as conjunctions of finitely many polynomial inequalities. We also propose a theoretical framework to guarantee the validity of upper bounds computed with both algorithms (see Section 3). In addition, we give an alternative shorter proof in Section 2.4 for the existence of Krivine-Stengle representations for sparse positive polynomials (proof of Theorem 5). We study in Section 3.3 the convergence rate of the two algorithms towards the maximal value of the linear part of the roundoff error.

• We release two software packages based on each algorithm. The first one, called FPBern, computes the bounds using Bernstein expansions, with two modules built on top of the C++ software related to : FPBern(a) is a module using double precision floating point arithmetic while the second module FPBern(b) uses rational arithmetic. The second one FPKriSten computes the bounds using Krivine-Stengle representations in Matlab. FPKriSten is built on top of the implementation related to .

• We compare our two methods implemented in FPBern and FPKriSten to three state-of-the-art methods. Our new methods have precisions comparable to that of these tools (Real2Float, Rosa, FPTaylor). At the same time, FPBern(a) and FPBern(b) show an important time performance improvement, while FPKriSten has similar time performances compared with the other tools, yielding promising results.

This work is the follow-up of our previous contribution . The main novelties, both theoretical and practical, are the following: in , we could only handle polynomial programs with box input constraints. For , the extension to rational functions relies on . We brought major updates to the C++ code of  (b). For , an extension to semialgebraic input sets was already theoretically possible in  with the hierarchy of LP relaxations based on sparse Krivine-Stengle representations, and in this current version, we have updated Section 3.2 accordingly to handle this more general case. We have carefully implemented this extension in our software package in order to not compromise efficiency. Additional benchmarks provided in Section 4 illustrate the abilities of both algorithms to tackle a wider range of numerical programs. Another novelty is the complexity analysis of the two algorithms in the case of polynomial programs with box constrained input sets. This study is inspired by the framework presented in , yielding error bounds in the context of polynomial optimization over the hypercube.

The rest of the paper is organized as follows: in Section 2, we give basic background on floating point arithmetic, Bernstein expansions and Krivine-Stengle representations. In Section 3 we describe the main contributions, that is the computation of roundoff error bounds using Bernstein expansions and sparse Krivine-Stengle representations. Finally, in Section 4 we compare the performance and precision of our two methods with the existing tools, and show the advantages of our tools.

## 2 Preliminaries

We first recall useful notation on multivariate calculus. For and the multi-index , we denote by the product . We also define , and .
The notation is the nested sum . Equivalently is equal to the nested product .
Given another multi-index , the inequality (resp. ) means that the inequality holds for each sub-index: (resp. ). Moreover, the binomial coefficient is the product .
Let be the vector space of multivariate polynomials. Given , we associate a multi-degree to , with each standing for the degree of with respect to the variable . Then, we can write , with (also denoted by ) being the coefficients of in the monomial basis and each is a multi-index. The degree of is given by . As an example, if then and . For the polynomial used in Section 1.1, one has and .

### 2.1 Floating Point Arithmetic

This section gives background on floating point arithmetic, inspired from material available in [2, Section 3]. The IEEE754 standard  defines a binary floating point number as a triple of significant, sign, and exponent (denoted by ) which represents the numerical value . The standard describes 3 formats (32, 64, and 128 bits) which differ by the size of the significant and the exponent, as well as special values (such as NaN, the infinities). Let be the set of floating point numbers, the rounding operator is defined by the function which takes a real number and returns the closest floating point number rounded to the nearest, toward zero, or toward . A simple model of rounding is given by the following formula:

 rnd(x)=x(1+e)+u,

with , and . The value is the maximal relative error (given by the machine precision ), and is the maximal absolute error for numbers very close to . For example, in the single (32 bits) format, is equal to while equals . In general is negligible compared to , thus we neglect terms depending on in the remainder of this paper.
Given an operation op : , let be the corresponding floating point operation. An operation is exactly rounded when for all .
In the IEEE754 standard the following operations are defined as exactly rounded: and the fma operation333The fma operator is defined by fma()=.. It follows that for these operations we have the continuation of the simple rounding model .
The previous rounding model is called “simple” in contrast with more improved rounding model. Given the function , then the improved rounding model is defined by: , for all . As the function pc is piecewise constant, this rounding model needs design of algorithms based on successive subdivisions, which is not currently handled in our methods. In the tools FLUCTUAT, Gappa, and FPTaylor , combining branch and bound algorithms with interval arithmetic is adapted to roundoff error computation with such rounding model.

### 2.2 Bernstein Expansions of Polynomials

In this section we give background on the Bernstein expansion, which is important to understand the contribution detailed subsequently in Section 3.1. Given a multivariate polynomial , we recall how to compute a lower bound of where . The next result can be retrieved in [21, Theorem 2]:

###### Theorem 1 (Multivariate Bernstein expansion).

Given a multivariate polynomial and with the multi-degree of , then the Bernstein expansion of multi-degree of is given by:

 f(x)=∑\gbf\gb\xb\gb=∑\ab≤\kbb(f)\abB\kb,\ab(x). (2)

where (also denoted by when there is no confusion) are the Bernstein coefficients (of multi-degree ) of , and are the Bernstein basis polynomials defined by and . The Bernstein coefficients of are given as follows:

 b\ab=∑\bb<\ab(\ab\bb)(\kb\bb)f\bb,\bzero≤\ab≤\kb. (3)

Bernstein expansions having numerous properties, we give only four of them which are useful for Section 3.1. For a more exhaustive introduction to Bernstein expansions, as well as proofs of the basic properties, we refer the interested reader to .

###### Property 1 (Cardinality [10, (3.14)]).

The number of Bernstein coefficients in the Bernstein expansion of multi-degree is equal to

###### Property 2 (Linearity [10, (3.2.3)]).

Given two polynomials and , one has:

 b(cf1+f2)\ab=cb(f1)\ab+b(f2)\ab ,∀c∈\R,

where the above Bernstein expansions are of the same multi-degree.

###### Property 3 (Enclosure [10, (3.2.4)]).

The minimum (resp. maximum) of a polynomial over can be lower bounded (resp. upper bounded) by the minimum (resp. maximum) of its Bernstein coefficients:

 min\ab≤\kbb\ab≤f(\xb)≤max\ab≤\kbb\ab,  ∀\xb∈[0,1]n.
###### Property 4 (Sharpness [10, (3.2.5)]).

If the minimum (resp. maximum) of the is reached at coinciding with a corner of the box , then is the minimum (resp. maximum) of over .

Property 1 gives the maximal computational cost needed to find a lower bound of for a Bernstein expansion of fixed multi-degree . Property 3 is used to bound from below optimal values, while Property 4 allows determining in some cases if the lower bound is optimal.

### 2.3 Bounds of Rational Functions with Bernstein Expansions

In this section we recall how to obtain bounds for the range of multivariate rational functions by using Bernstein expansions. The following result can be found in [13, Theorem 3.1].

###### Theorem 2.

Let of respective multi-degrees and . Given , let us denote by and the Bernstein coefficients of multi-degree for and , respectively. Let us assume that is positive over and that all Bernstein coefficients are positive. Then, for , one has

 min\ab≤\kbb(f1)\abb(f2)\ab≤f(\xb)≤max\ab≤\kbb(f1)\abb(f2)\ab,  ∀\xb∈[0,1]n. (4)

Note that to use the bounds from Theorem 2, one first has to compute the Bernstein coefficients of for sufficiently large multi-degree in order to ensure that the denominators in (4) do not vanish. We emphasize that our assumption on the sign of is equivalent to suppose that for all .

### 2.4 Dense and Sparse Krivine-Stengle Representations

In this section, we first give the necessary background on Krivine-Stengle representations, used in the context of polynomial optimization. Then, we present a sparse version based on . These notions are applied later in Section 3.2.

#### 2.4.1 Dense Krivine-Stengle representations

Krivine-Stengle certificates for positive polynomials can first be found in [22, 23] (see also [24, Theorem 1(b)]). Such certificates give representations of positive polynomials over a compact set , with . We denote . The compact set is called a basic semialgebraic set, that is a set defined by a conjunction of finitely many polynomial inequalities. In the sequel, we assume that and that ( are among the polynomials in the definition of . This implies that the family generates as an -algebra, which is a necessary assumption for Theorem 3.
Given and , let us define the polynomial .
For instance on the two-dimensional unit box, one has , . With and , one has .

###### Theorem 3 (Dense Krivine-Stengle representations).

Let be a positive polynomial over . Then there exist and a finite number of nonnegative weights such that:

 \pfunc(\xb)=∑|\ab+\bb|≤kλ\ab,\bbh\ab,\bb(\xb),∀\xb∈Rn. (5)

We denote by the set of polynomials having a dense Krivine-Stengle representation (of degree at most ) as in (5). It is possible to compute the weights by identifying in the monomial basis the coefficients of the polynomials in the left and right sides of (5). Given and denoting by the monomial coefficients of , with , the fulfill the following equalities:

 \pfunc\gb=∑|\ab+\bb|≤kλ\ab,\bb(h\ab,\bb)\gb,∀\gb∈Nn\kh. (6)

#### 2.4.2 Global optimization using the dense Krivine-Stengle representations

Here we consider the polynomial minimization problem , with a polynomial of degree . We can rewrite this problem as the following infinite dimensional problem:

 f––:= maxt∈R  t, (7) s.t.  f(\xb)−t≥0,∀\xb∈\cSet.

The idea is to look for a hierarchy of finite dimensional linear programming (LP) relaxations by using Krivine-Stengle representations of the positive polynomial involved in Problem (7). Applying Theorem 3 to this polynomial, we obtain the following LP problem for each :

 f––k:=maxt,λ\ab,\bb t, (8) s.t (f−t)\gb=∑|\ab+\bb|≤kλ\ab,\bb(h\ab,\bb)\gb,∀\gb∈Nn\kh, λ\ab,\bb≥0.

Note that . As in [24, (4)], one has:

###### Theorem 4 (Dense Krivine-Stengle LP relaxations).

The sequence of optimal values satisfies as . Moreover each is a lower bound of .

At fixed , the total number of variables of LP (8) is given by the number of and , that is , where is the dimension of . The number of constraints is equal to the cardinality of , which is . We recall that . In the particular case where is an hypercube, the LP has variables and constraints.

#### 2.4.3 Sparse Krivine-Stengle representations

We now explain how to derive less computationally expensive LP relaxations, by relying on sparse Krivine-Stengle representations. For , let be the ring of polynomials restricted to the variables . We borrow the notion of a sparsity pattern from [7, Assumption 1]:

###### Definition 1 (Sparsity Pattern).

Given , , and for all , a sparsity pattern is defined by the four following conditions:

• can be written as: with ,

• for all , for all ,

• and ,

• (Running Intersection Property) for all , there exists s.t. .

As an example, the four conditions stated in Definition 1 are satisfied while considering on the hypercube . Indeed, one has , with , . Taking and , one has for all , .

Let us consider a given sparsity pattern as stated above. By noting , , then the set yields subsets , with . If is a compact subset of then each is a compact subset of . As in the dense case, let us note , for given .

The following result, a sparse variant of Theorem 3, can be retrieved from [7, Theorem 1] but we also provide here a shorter alternative proof by using .

###### Theorem 5 (Sparse Krivine-Stengle representations).

Let be given and assume that there exist and , , which satisfy the four conditions stated in Definition 1. If is positive over , then there exist , such that and over . In addition, there exist and finitely many nonnegative weights , , such that:

 ϕj=∑|\abj+\bbj|≤kλ\abj,\bbjh\abj,\bbj ,j=1,…,m. (9)
###### Proof.

From [6, Lemma 3], there exist , , such that and on . Applying Theorem 3 on each , there exist and finitely many nonnegative weights such that By taking , we complete the representations with as many zero as necessary to obtain the desired result.∎

As in Section 2.4.1, we note the set of functions with sparse Krivine-Stengle representations (of degree at most ) given in Theorem 5.

In Theorem 5, one assumes that can be written as the sum , where each is not necessarily positive. The first result of the theorem states that can be written as another sum , where each is now positive. As in the dense case, the can be computed by equalizing the coefficients in the monomial basis. We also obtain a hierarchy of LP relaxations to approximate the solution of polynomial optimization problems. For the sake of conciseness, we only provide these relaxations as well as their computational costs in the particular context of roundoff error bounds in Section 3.2.

## 3 Two new algorithms to compute roundoff error bounds

This section is dedicated to our main contributions. We provide two new algorithms to compute absolute roundoff error bounds using either Bernstein expansions or sparse Krivine-Stengle representations. The first algorithm, denoted by , takes as input a program implementing the expression of a rational function , with variables satisfying input constraints encoded by the product of closed intervals. After adequate change of variables, we assume without loss of generality that . The second algorithm, denoted by , takes as input a program implementing a polynomial expression , with variables satisfying input constraints encoded by a basic compact semialgebraic set , as in Section 2.4.1.

Following the simple rounding model described in Section 2.1, we denote by the rounded expression of after introduction of the rounding variables (one additional variable is introduced for each real variable or constant as well as for each arithmetic operation ,, or ). For a given machine epsilon , these error variables also satisfy a set of constraints encoded by the box .
As explained in [4, Section 3.1], we can decompose the roundoff error as follows: , where . One obtains an enclosure of using interval arithmetic to bound second-order error terms in the Taylor expansion of w.r.t.  (as in [2, 4]).
Let us note the interval enclosure of , with and . For each , we also define on . After dividing each error variable by , we now consider the optimization of the (scaled) linear part of the roundoff error.

### 3.1 Bernstein expansions of roundoff errors

The first method is the approximation of (resp. ) by using Bernstein expansions of the polynomials involved in .

#### 3.1.1 Polynomial expressions

We start with the simpler case where is the scaled linear part of the roundoff error of a program implementing a polynomial expression . Let be the multi-degree of . Note that is greater than the multi-degree of , appearing in the definition of , for all . Our procedure is based on the following proposition:

###### Proposition 1.

For each , the polynomial can be bounded as follows:

 l–′\kb≤l′(\xb,\eb)≤¯l′\kb,∀(\xb,\eb)∈\X×\E, (10)

with and .

###### Proof.

We write the polynomial for a given . Property 3 provides the enclosure of w.r.t. for a given :

 min\ab≤\kbb(l′\eb)\ab≤l′\eb(\xb)≤max\ab≤\kbb(l′\eb)\ab,∀\xb∈[0,1]n, (11)

where each Bernstein coefficient satisfies by Property 2 (each being a scalar in ). The proof of the left inequality comes from:

 min\eb∈[−1,1]m(min\ab≤\kb(m∑j=1ejb(sj)\ab)) =min\ab≤\kb(min\eb∈[−1,1]m(m∑j=1ejb(sj)\ab)) =min\ab≤\kbm∑j=1−|b(sj)\ab| =−max\ab≤\kbm∑j=1|b(sj)\ab|  .

The proof of the right inequality is similar.∎

###### Remark 1.

By Property 1, the computational cost of is since we need to compute the Bernstein coefficients for each . This cost is polynomial in the degree and exponential in but is linear in .

#### 3.1.2 Rational function expressions

We now consider the more general case where is the scaled linear part of the roundoff error of a program implementing a rational expression with for all . Let and be the multi-degrees of and and . For all , we can write each , appearing in the definition of , as , with and of multi-degrees less than , and for all .
We extend Proposition 1 as follows.

###### Proposition 2.

For each , the rational function can be bounded as follows:

 l–′\kb≤l′(\xb,\eb)≤¯l′\kb,∀(\xb,\eb)∈\X×\E, (12)

with and .

###### Proof.

We first handle the case when for some , there exists such that . In this case, one has and , so both inequalities trivially hold.
Next, let us assume that all considered Bernstein coefficients of are positive. By Theorem 2, one has for all :

 min\ab≤\kbb(pj)\abb(q2j)\ab≤pj(\x)qj(\x)2≤max\ab≤\kbb(pj)\abb(q2j)\ab,∀\xb∈[0,1]n,

yielding

 −max\ab≤\kb|b(pj)\ab||b(q2j)\ab|≤pj(\x)qj(\x)2ej≤max\ab≤\kb|b(pj)\ab||b(q2j)\ab|,∀\xb∈[0,1]n,

which implies the desired result. ∎

#### 3.1.3 Algorithm \fpbern

The algorithm , stated in Figure 1 takes as input , the set of bound constraints over , a rational function expression , the corresponding rounded expression with rounding variable and the set of bound constraints over . From Line line:r to Line line:iabound, the algorithm is implemented exactly as in   as well as in  . The absolute roundoff error is decomposed as the sum of an expression and a remainder and the enclosure of is computed thanks to a subroutine performing basic interval arithmetics. The main difference between and (resp. ) is that the enclosure of is obtained in Line line:blbelow thanks the computation of the Bernstein expansion as in Proposition 2.