1 Introduction
The demands on precision and the extent of simulations, and other scientific computation applications increase continuously. In contrast to these requirements, Moore’s law for technological advances of processors faces a foreseeable stagnation, which will make a future development of more efficient software unavoidable and necessary in this area. With the requirement to design scientific software with more sophisticated algorithms, using distributed computing, efficient memory technologies, etc., the number of pitfalls grows, by which flaws could accidentally be introduced into the code.
Bugs in scientific software are hard to discover by testing, since they may only occur at inputs arising after a longer (simulation) runtime. This challenge in software development thus appears to be an opportunity for static formal analyses which analyse the code symbolically, and thus cover all possible runtime situations – regardless of the time required to reach the value during an actual run.
In order to verify the hypothesis "Formal methods can be employed successfully in the design process for scientific software", we tested it against two case studies from real scientific applications, for which the location of problematic code had previously been identified. In one case, the challenge was to find a correct implementation which meets the intention of the original code, without repeating its flaws. In the other case, the flaw had already been corrected. The challenge was to verify that the correction solves the existing problem.
The paper continues with a presentation of the two case studies, which is followed by a summary of the observations made during the analysis process, and it concludes with an outlook into future work.
2 Case Study I: Load Distribution
For an efficient use of resources, provided by large computing clusters, load distribution is a crucial challenge. In this case study, we analyse an algorithm to rebalance the load within a cluster, in case the number of tasks to be computed changes. An original version of the algorithm in question has been in use in a scientific computation software, and its correctness could not be established by manual code inspection.
This document reports how formal analyses helped us to formulate a correct load balancing algorithm by

finding a subtle flaw in original implementation, using a custom fuzzer, and by

proving the correctness of an improved implementation, using Why3 [2].
Problem Statement
The starting point for this case study was an algorithm taken from a scientific computation library which was actively in use. The algorithm computes the distribution of tasks onto cluster nodes from a given distribution for tasks. This routine is called within a simulation framework when the number of simulated entities is refined (usually increased).
More formally, the problem can be stated as follows: Given a set with and and a natural number , produce a set such that . The respective values and are the number of tasks to be run on the th node, before and after load balancing. The objective of the algorithm is to obtain a new load distribution that is “close” to the old distribution, i.e., should be close to .
Starting Point
The original algorithm shown in Listing. LABEL:lst:original used floatingpoint numbers to calculate for each . The integral part of this value then was assigned to , while the fractional parts were accumulated until they made up full node, which was then assigned to the current node. This algorithm computes a correct distribution among the nodes, if executed on precise rational numbers. However, when using floatingpoint values to approximate rationals, the imprecisions can result in tasks getting lost in the balancing process, i.e., it can be that .
Automatic discovery of counterexamples
rest  

1048627  524206  1099511627744  0.9998779296875 
32779  536870892  1099511627779  0.999881774187088 
67108824  33554439  1099511627792  0.9998779296875 
Providing concrete counterexamples first requires a definition of the isNearlyEqual method. In the original implementation, this function was implemented as an absolute comparison using FLT_EPSILON as the value for . FLT_EPSILON is defined as the difference between 1 and the smallest floatingpoint number of type float that is greater than 1 [4]. Under the assumption that float refers to the 32bit floatingpoint type defined in IEEE754, as is usually the case, this value is .
A careful analysis leads to the conclusion that it should be possible to find a counterexample with . However, the search space consisting of the three long values , and is still too large to be explored exhaustively. Thus, we first reduced the search space further, and then used random fuzzing to discover counterexamples. To reduce the search space, we drew the values randomly from . The rationale for this choice is that small offsets from large integer powers use a large range of the precision, and thereby are likely to lead to imprecisions during calculations.
The fuzzer finds numerous counterexamples within a matter of seconds, a few of which can be seen in Fig. 1, including the final value of rest which is below , and thereby is well below 1  FLT_EPSILON.
Improved Algorithm
To remedy the problems in the original algorithm, we eliminated all uses of floatingpoint numbers in the original algorithm in favour of integer operations. Then we verified three different properties of this algorithm using Why3:

No tasks are lost, i.e., .

The resulting values are close to . In particular, the following holds:
. 
The integer algorithm is equivalent to the original algorithm, if rationals are used instead of floatingpoint values (i.e., if all computations are exact, no rounding effects).
Property 3 is particularly interesting, since proving the functional equivalence of the original algorithm to the new integer algorithm ensures that the intentions of the original author, which can include domain knowledge not available to the verification engineer, are preserved.
While the verification of property 1 was possible automatically using Why3 after adding lemmas to assist the proof search, property 2 and 3 required reasoning about properties, involving floatingpoints, which is a weakness of most automatic theorem provers. We thus had to resort to timeintensive interactive proofs using the Coq theorem prover [5].
3 Case Study II: Convex Hull
Calculating the convex hull of a set of points is a problem which is commonly found in scientific computation applications. However, while the algorithms for the computation of the hull are mathematically simple and straightforward, numerical errors caused by using floatingpoint values in implementations instead of real numbers, can lead to a wrong result.
In the twodimensional case, the implications are not too severe: It might be that a point close to an edge of the convex hull is wrongly included in (or wrongly excluded from) the hull polygon. But the result is always a valid polygon which is close to the desired result.
The situation is different if threedimensional data is taken as input. The additional dimension requires keeping a record of the facets making up the convex polyhedron. This situation is computationally considerably more sphisticated than the twodimensional case as it requires the calculation of the side of a facet (front or back) that is faced by a point. For points close to the facet, such a calculation may come up with the wrong result due to floatingpoint imprecision. The situation becomes bad if this computation for two facets errs for one point in different directions: Then the convex hull, which relies on these computations, wrongly includes or excludes facets from the hull, with the catastrophic outcome that the result is not only imprecise, but not a (closed) polyhedron at all – which is a far more severe problem.
This problem, which was reported by Barber et. al. [1], when presenting Quickhull, a widely employed algorithm for convex hull computation, has been known for a long time.
The existing implementation given here was known to suffer from such errors, and it was also known that errors had occurred in practice. As workarounds were implemented into the code to mitigate the problem, the number of observed errors decreased, but the effectiveness of the solution for the general case was unclear. The workarounds are focused on a particular method which given a plane spanned by three points and a fourth point decides whether that point is above, below or coplanar to the plane.
As mentioned above, due to floatingpoint imprecision effects, the wrong decision might be made with fatal effects. In an attempt to avoid faulty results due to rounding errors, the original method has been modified to compute the output three times. Each time, a different point is chosen from the three spanning points as the base point of the plane. As final result, the result computed by the majority, is returned.
In this case study, we did not apply formal methods to prove a correctness hypothesis, like in Section 2, but we succeeded in disproving that the introduced workaround works in all cases. We used the stateofthe art SMT solver Z3 [3]
to find inputs were the majority vote computes a different result from the exact result (on real numbers, not floats). Since this verification task heavily depended on the semantics of the floatingpoint arithmetic, which is very difficult to handle in a deductive fashion, we chose to model it using the SMT solver. This technology (in almost all cases) reduces problems on floatingpoint values to problems on corresponding bitvectors according to the standard IEEE 754. These are then resolved into an instance of the propositional SAT problem which can then be solved using a Boolean satisfiability solver.
While the search was successful, it required a careful analysis of the problem to reduce the search space. Floatingpoint problems are known to have the tendency to lead to SAT instances that are difficult to solve. A thorough manual analysis of the situation allowed us to narrow the search space for the floatingpoint values considerably. Even with the reduced search space, Z3 required approx. 100 h^{1}^{1}1on a virtualised Intel Xeon E3 with 2.6 GHz to find a counterexample. On the other hand, was virtually impossible to manually come up with concrete numbers that constitute a counterexample, because the dependencies of individual bits in floatingpoint arithmetic are difficult to follow.
4 Observations
In this report, we have looked at how formal methods can be applied to two orthogonal problems. The formal techniques employed in this study can be separated into finding counterexamples to disprove the fact that existing algorithms satisfy certain properties, and into verifying the correctness of algorithms. The latter can again be separated into using formal verification to prove the correctness, as given by an abstract specification, and into relational techniques which prove the equivalence to an existing algorithm.
While it is still too early to draw definitive conclusions from these early investigations, they give rise to some interesting observations of the potential applicability of formal methods in the field.
One interesting observation is that it is often possible to isolate the problems faced in large projects to relatively small and isolated pieces of code. While the formal analysis of the original project might not be feasible due to size and complexity, analysing these small pieces of code is significantly easier. However, extracting such a piece of code is a process that often requires domain knowledge. The provision of tools which help or automate this extraction, such that it can be done by the domain experts themselves, could reduce the time needed to produce code samples suitable for formal analysis.
While we have been able to find counterexamples automatically using SMT solvers, this relies heavily on an upfront reduction of the search space, which requires experience with formal methods, and cannot be applied by scientists themselves. Automating this process such that it could be applied by the developers of the original algorithm would be highly desirable, but requires further research. When comparing the use of fuzzers and SMT solvers to find counterexamples, fuzzers are significantly better at providing fast feedback which is suitable for interactive use. For intricate problems, random fuzzing is not likely to discover edge cases, on the other hand. A combination of both SMT solvers and fuzzers, while only requiring a single specification, could help to combine the respective benefits.
Overall, our experience shows that formal techniques can be employed successfully to assist scientists. However, this relies on the ability to isolate problems to smaller code samples, which is a tedious process. Furthermore, while the large effort required for a formal verification can be justified for algorithms used for longer periods of time, immediate feedback provided for scientists during development would be highly desirable, but is not easily achievable using existing techniques.
References
 [1] Barber, C.B., Dobkin, D.P., Huhdanpaa, H.: The quickhull algorithm for convex hulls. ACM Trans. Math. Softw. 22(4), 469–483 (Dec 1996), http://doi.acm.org/10.1145/235815.235821
 [2] Bobot, F., Filliâtre, J.C., Marché, C., Paskevich, A.: Why3: Shepherd your herd of provers. In: Boogie 2011: First International Workshop on Intermediate Verification Languages. pp. 53–64. Wrocław, Poland (August 2011), https://hal.inria.fr/hal00790310
 [3] de Moura, L., Bjørner, N.: Z3: An efficient smt solver. In: Ramakrishnan, C.R., Rehof, J. (eds.) Tools and Algorithms for the Construction and Analysis of Systems. pp. 337–340. Springer Berlin Heidelberg, Berlin, Heidelberg (2008)
 [4] glibc development team: https://www.gnu.org/software/libc/manual/html_node/FloatingPointParameters.html, [Online; accessed 01February2017]
 [5] development team, T.C.: The Coq proof assistant reference manual. LogiCal Project (2004), http://coq.inria.fr, version 8.6
Comments
There are no comments yet.