Floating-point numbers are widely used to represent real numbers in computer programs since they offer a good trade-off between efficiency and precision. The round-off error of a floating-point expression is the difference between the ideal computation in real arithmetic and the actual floating-point computation. These round-off errors accumulate during numerical computations. Besides having a direct effect on the result of mathematical operations, round-off errors may significantly impact the control flow of a program. This happens when the guard of a conditional statement contains a floating-point expression whose round-off error makes the actual Boolean value of the guard differ from the value that would be obtained assuming real arithmetic. In this case, the conditional statement is called an unstable test. Unstable tests are an inherent feature of floating-point programs. In general, it is not possible to completely avoid them. However, it is possible to mitigate their effect by transforming the original program into another program that conservatively (and soundly) detects and corrects unstable tests.
This paper presents a program transformation technique to transform a given program into a new one that returns either the same result of the original program or a warning when the real and floating-point flows may diverge. This transformation is parametric with respect to two Boolean abstractions that take into consideration the round-off error in the expressions occurring in the guard. The transformation replaces the unstable conditions with more restrictive conditions that are guaranteed to preserve the control flow of stable tests. The correctness of the proposed transformation is formally verified in the Prototype Verification System (PVS) [OwreRS92].
The remainder of the paper is organized as follows. sec:fp_err provides technical background on floating-point numbers and round-off errors. The proposed program transformation technique is presented in sec:transformation. sec:polycarp illustrates this technique by transforming the core logic of an algorithm for polygon containment that is part of a geofencing system developed by NASA. sec:related discusses related work and sec:concl concludes the paper.
2 Round-Off Errors and Unstable Tests
A floating-point number can be formalized as a pair of integers , where is called the significand and the exponent of the float [BoldoMunoz06, Daumas2001]. A floating-point format is defined as a pair of integers , where is called the precision and is called the minimal exponent. For instance, IEEE single and double precision floating-point numbers are specified by the formats and , respectively. A canonical float is a float such that is either a normal or subnormal. A normal float is a float such that the significand cannot be multiplied by the radix and still fit in the format. A subnormal float is a float having the minimal exponent such that its significand can be multiplied by the radix and still fit in the format. Henceforth, will denote the set of floating-point numbers in canonical form and the expression will denote a floating-point number in . A conversion function is defined to refer to the real number represented by a given float, i.e., .
The expression denotes the floating-point number in format closest to . The format will be omitted when clear from the context. Let be a floating-point number that represents a real number , the difference is called the round-off error (or rounding error) of with respect to .
2.1 Unstable tests
Given a set of pre-defined floating-point operations, the corresponding set of operations over real numbers, a finite set of variables representing real values, and a finite set of variables representing floating-point values, where and are disjoint, the sets and of arithmetic expressions over real numbers and over floating-point numbers, respectively, are defined by the following grammar.
where , , , , , , , . It is assumed that there is a function that associates to each floating-point variable a variable representing the real value of . The function converts an arithmetic expression on floating-point numbers to an arithmetic expression on real numbers. It is defined by simply replacing each floating-point operation with the corresponding one on real numbers and by applying and to floating-point values and variables, respectively.
Boolean expressions are defined by the following grammar.
where and . The conjunction , disjunction , negation , , and have the usual classical logic meaning. The symbols and denote the domain of Boolean expressions over real and floating-point numbers, respectively. The function converts a Boolean expression on floating-point numbers to a Boolean expression on real numbers. Given a variable assignment , denotes the evaluation of the real Boolean expression . Similarly, given and , denotes the evaluation of the floating-point Boolean expression .
The expression language considered in this paper contains binary and -ary conditionals, let expressions, arithmetic expressions, and a warning exceptional statement. Given a set of function symbols, the syntax of program expressions in is given by the following grammar.
where , , , and . The notation denotes a list of branches.
A program is a function declaration of the form , where are pairwise distinct variables in and all free variables appearing in are in . The natural number is called the arity of . The set of programs is denoted as .
When if-then-else guards contain floating-point expressions, the output of the considered program is not only directly influenced by rounding errors, but also by the error of taking the incorrect branch in the case of unstable tests.
Definition 1 (Conditional Instability)
A function declaration is said to have an unstable conditional when its body contains a conditional statement of the form and there exist two assignments and such that for all , and . Otherwise, the conditional expression is said to be stable.
In other words, a conditional statement (or test) is unstable when there exists an assignments from the free variables in to such that evaluates to a different Boolean value with respect to its real valued counterpart . In these cases, the program is said to follow an unstable path, otherwise, when the flows coincide, it is said to follow a stable path.
2.2 Floating-Point Denotational Semantics
This section presents a compositional denotational semantics for the expression language of Formula (1) that models both real and floating-point path conditions and outputs. This semantics is a modification of the one introduced in [MoscatoTDM17] and [TitoloFMM18]
. The proposed semantics collects for each combination of real and floating-point program paths: the real and floating-point path conditions, two symbolic expressions representing the value of the output assuming the use of real and floating-point arithmetic, respectively, and a flag indicating if the element refers to either a stable or an unstable path. This information is stored in aconditional tuple.
Definition 2 (Conditional Tuple)
A conditional tuple is an expression of the form , where , , , , and .
Intuitively, indicates that if the condition is satisfied, the output of the ideal real-valued implementation of the program is and the output of the floating-point execution is . The sub-index is used to mark by construction whether a conditional tuple corresponds to an unstable path, when , or to a stable path, when . The element represents the output of the warning construct. Let be the set of all conditional error bounds, and be the domain formed by sets of conditional error bounds.
An environment is defined as a function mapping a variable to a set of conditional tuples, i.e., . The empty environment is denoted as and maps every variable to the empty set .
Given , the semantics of program expressions is defined in fig:sem as a function that returns the set of conditional tuples representing the possible real and floating-point computations and their corresponding path conditions. The operator denotes the least upper bound of the domain of conditional error bounds.
The semantics of a variable consists of two cases. If belongs to the environment, then the variable has been previously bound to a program expression through a let-expression. In this case, the semantics of is exactly the semantics of . If does not belong to the environment, then is a parameter of the function. Here, a new conditional error bound is added with a placeholder representing the real value of . The semantics of a floating-point arithmetic operation is computed by composing the semantics of its operands. The real and floating-point values are obtained by applying the corresponding arithmetic operation to the values of the operands. The new conditions are obtained as the combination of the conditions of the operands. The semantics of the expression updates the current environment by associating with variable the semantics of expression .
The semantics of the conditional uses an auxiliary operator .
Definition 3 (Condition propagation operator)
Given and , if , otherwise it is undefined. The definition of naturally extends to sets of conditional tuples: given , .
The semantics of and are enriched with the information about the fact that real and floating-point control flows match, i.e., both and have the same value. In addition, new conditional tuples are built to model the unstable cases when real and floating-point control flows do not coincide and, therefore, real and floating-point computations diverge. For example, if is satisfied but is not, the branch is taken in the floating-point computation, but the would have been taken in the real one. In this case, the real condition and its corresponding output are taken from the semantics of , while the floating-point condition and its corresponding output are taken from the semantics of . The condition is propagated in order to model that holds but does not. The conditional tuples representing this case are marked with .
Similarly, the semantics of an n-ary conditional is composed of stable and unstable cases. The stable cases are built from the semantics of all the program sub-expressions by enriching them with the information stating that the correspondent guard and its real counter-part hold and all the previous guards and their real counterparts do not hold. All the unstable combinations are built by combining the real parts of the semantics of a program expression and the floating-point contributions of a different program expression . In addition, the operator is used to propagate the information that the real guard of and the floating-point guard of hold, while the guards of the previous branches do not hold.
3 Program Transformation
In this section, a program transformation is proposed for detecting when round-off errors affect the evaluation of floating-point conditionals and for ensuring that when the floating-point control flow diverges from the real one a warning is issued. The proposed transformation takes into account round-off errors by abstracting the Boolean expressions in the guards of the original program. This is done by means of two Boolean abstractions .
Given , let be the set of free variables in . For all , , and such that , and satisfy the following properties.
Property 1 states that for all floating-point Boolean expressions , implies both and its real counterpart. Symmetrically, Property 2 ensures that implies both the negation of and the negation of its real counterpart.
The Boolean abstractions and can be instantiated as follows for conjunctions and disjunction of sign tests. Properties 1 and 2 are formally proven in PVS to hold for the following definitions of and . Let and such that .
The abstractions performed for sign tests are not correct for generic inequalities of the form . In this case, to compensate for the round-off errors of both expressions, additional floating-point operations must be performed. Thus, the round-off error generated by such operations needs to be considered as well to obtain a sound approximation. The naive application of this strategy leads to a non-terminating transformation. The design of an effective approximation for these generic inequalities is left as future work.
The program transformation is defined as follows.
Definition 4 (Program Transformation)
Let be a floating-point program that does not contain any statements, the transformed program is defined as where is defined as follows.
In the case of the binary conditional statement, the then branch of the transformed program is taken when is satisfied. By Property 1, this means that in the original program both and hold and, thus, the then branch is taken in both real and floating-point control flows. Similarly, the else branch of the transformed program is taken when holds. This means, by Property 2, that in the original program the else branch is taken in both real and floating-point control flows. In the case real and floating-flows diverge, neither nor is satisfied and a warning is returned.
In the case of the n-ary conditional statements, the guard of the -th branch is replaced by the conjunction of and for all the previous branches . By properties 1 and 2, it follows that the transformed program takes the -th branch only when the same branch is taken in both real and floating-point control flows of the original program. Additionally, a warning is issued by the transformed program when real and floating-point control flows of the original program differ.
The following theorem states the correctness of the program transformation . If the transformed program returns an output different from , then the original program follows a stable path and returns the floating-point output . Furthermore, in the case the original program presents an unstable behavior, the transformed program returns .
Theorem 3.1 (Program Transformation Correctness)
Given , , and , such that for all , :
for all such that , there exists such that and ;
for all , there exists such that .
The program transformation defined in def:prog_trans has been formalized and th:corr_trans has been proven correct in PVS.111This formalization is available at https://shemesh.larc.nasa.gov/fm/PRECiSA.
It is important to remark that the intended semantics of the floating-point transformed program is the real-valued semantics of the original one, i.e., the real-valued semantics of the transformed program is irrelevant. Therefore, even if the transformed program presents unstable tests, th:corr_trans ensures that its floating-point control flow preserves the control flow of stable tests in the original program.
Consider the program eps_line, which is part of the ACCoRD conflict detection and resolution algorithm [DowekMC05]. This function is used to compute an implicitly coordinated horizontal resolution direction for the aircraft involved in a pair-wise conflict.
where and are floating-point variables. For example, if the values of such variables are assumed to lie in the range , the tool PRECiSA [MoscatoTDM17, TitoloFMM18]
can be used to compute the round-off error estimationfor . PRECiSA is a tool that over-approximates the round-off error of floating-point programs. It is fully automatic and generates PVS proof certificates that guarantee the correctness of the error estimations with respect to the floating-point IEEE-754 standard. The following program is obtained by using the transformation with the Boolean approximations of ex:beta.
The condition never holds since is a positive number. Therefore, the transformed program never returns 0. Indeed, when is close to , the test is unstable. The transformed program detects these unstable cases and returns a warning.
4 Case Study: PolyCARP algorithm
PolyCARP222PolyCARP is available at https://github.com/nasa/polycarp. (Algorithms for Computations with Polygons) [NarkawiczH16, NarkawiczMD17] is a suite of algorithms for geo-containment applications. One of the main applications of PolyCARP is to provide geofencing capabilities to unmanned aerial systems (UAS), i.e., detecting whether a UAS is inside or outside a given geographical region, which is modeled using a 2D polygon with a minimum and a maximum altitude. Another application of PolyCARP is the detection of weather cells, modeled as moving polygons, along an aircraft trajectory.
A core piece of logic in PolyCARP is the polygon containment algorithm, i.e., the algorithm that checks whether or not a point lies in the interior of a polygon. Algorithms for polygon containment have to be carefully implemented since numerical errors may lead to wrong answers, even in cases where the point is far from the boundaries of the polygon. PolyCARP uses several techniques to detect if a point is contained in a polygon. One of these techniques relies on the computation of the winding number. This number corresponds to the number of times the polygon winds around .
Consider two consecutive vertices and of the polygon in the Cartesian plane with the point as the origin. The function winding_number_edge checks in which quadrants and are located and counts how many axes are crossed by the edge . If and belong to the same quadrant, the contribution of the edge to the winding number is 0 since no axis is crossed. If and lie in adjacent quadrants, the contribution is 1 (respectively -1) if moving from to along the edge is in counterclockwise (respectively clockwise) direction. In the case and are in opposite quadrants, the determinant is computed for checking the direction of the edge. If it is counterclockwise the contribution is 2, otherwise it is -2. The winding number is obtained as the sum of the contributions of all the edges of the polygon. If the result is 0 or 4, the point is inside the polygon, otherwise, it is outside.
The function winding_number_edge has been verified in PVS using real arithmetic. However, due to floating-point errors, taking the incorrect branch for one of the edges in the computation of the winding number may result in an incorrect conclusion about the position of the point with respect to the polygon. In order to overcome this problem, the transformation of def:prog_trans is applied to the function resulting in the following function. Given initial bounds for the input variables, PRECiSA [MoscatoTDM17, TitoloFMM18] can be used to compute the round-off error estimations for , , , and the determinant, which are denoted , , , , an