A Verified Timsort C Implementation in Isabelle/HOL

12/08/2018 ∙ by Yu Zhang, et al. ∙ Beihang University 0

Formal verification of traditional algorithms are of great significance due to their wide application in state-of-the-art software. Timsort is a complicated and hybrid stable sorting algorithm, derived from merge sort and insertion sort. Although Timsort implementation in OpenJDK has been formally verified, there is still not a standard and formally verified Timsort implementation in C programming language. This paper studies Timsort implementation and its formal verification using a generic imperative language - Simpl in Isabelle/HOL. Then, we manually generate an C implementation of Timsort from the verified Simpl specification. Due to the C-like concrete syntax of Simpl, the code generation is straightforward. The C implementation has also been tested by a set of random test cases.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Formal verification has been considered as a promising way to the reliability of programs. With development of verification tools, it is possible to perform fully formal verification of large and complex programs in recent years [2, 3]. Formal verification of traditional algorithms are of great significance due to their wide application in state-of-the-art software. The goal of this paper is the functional verification of sorting algorithms as well as generation of C source code. We investigated Timsort algorithm which is a hybrid stable sorting algorithm, derived from merge sort and insertion sort, designed to work well on many kinds of real-world data.

Tim Peters invented Timsort algorithm and applied it in the Python standard library. Afterwards it has also been used to sort arrays of non-primitive type in Java, on the Android platform and in GNU Octave. Gouw et al.[1] have carried our formal verification of the OpenJDK’s Timsort implementation using KeY. Timsort is the main sorting algorithm provided by the Java standard library. KeY is a Java verification tool and a semi-automatic, interactive theorem prover, covering nearly full sequential Java. However, there is still not a standard and formally verified Timsort implementation in C programming language. Tim Peters [5] himself released a C version Timsort, which is actually a part of the implementation of Python’s List data structure in C and has not been verified.

This paper studies Timsort implementation and its formal verification using Simpl [7] in Isabelle/HOL. Different from KeY which mainly focuses on Java programs, Simpl is a generic imperative language embedded into Isabelle/HOL that was designed as an intermediate language for program verification. In Isabelle/HOL, the GHC’s sorting algorithm for lists has been formalized and its correctness and stability have been proved in [9]

. Quicksort algorithm has been verified in Simpl based on split heap model

[8], which cost only less than 100 interactive proofs. Simpl has been deeply used in formal verification of the seL4 OS kernel [2], where the C source code of the kernel is automatically translated into Simpl specification by the verified tools, CParser and Autocorres. Lars Noschinski et al.[4] has formally verified a certifying algorithm checkers for connectedness of graphs written in C from the library LEDA using Simpl and Autocorres. Besides, in order to reason about concurrent programs, Sanan et al. extends Simpl to CSimpl [6] which is an extension of Simpl with concurrency-oriented language features and verification techniques.

In this paper, we specify the Timsort algorithm using Simpl in the Isabelle/HOL theorem prover, and then generate real C source code after its functional verification 111The Isabelle/HOL specification and proof, and the generated C code are available at https://github.com/LVPGroup/TimSort/. As a first step, the C code generation is done manually according to the Simpl specification. Thanks to the C-like concrete syntax of Simpl in Isabelle/HOL, the generation is straightforward and could be easily implemented by a translator in the future. Compare to the post-hoc verification of Timsort algorithms within KeY [1], we use Simpl and Isabelle/HOL to specify and verify Timsort algorithm providing machine-checked proof, and then export the specification into C code. Second, KeY is a proof assistant designed for Java programs, whilst Isabelle/HOL and Simpl are more general. Therefore, it is possible to generate verified a verified Timsort implementation in other imperative languages. Third, Simpl is embedded in Isabelle/HOL and we can make use of its comprehensive libraries and stronger solvers/provers. So we expect the process of verification comes at a lower cost compared with its counterpart in KeY.

2 Preliminaries

2.1 Timsort Algorithm

Timsort is an effective combination of merge sort and insertion sort, which subtly modifies the two classical algorithms to reach better performance. It is a stable sort with complexity of at worst case and at best. Timsort is designed to take advantage of partial ordering that already exists in the data so it’s remarkably fast for nearly sorted data sequence and reverse sorted data. The procedure of Timsort basically follows the pattern of divide-and-conquer:

  • Divide an input array into sub-arrays with a minimal length

  • Sort each sub-array by binary sort(a combination of binary search and insertion sort)

  • Merge all the sorted sub-arrays into a single array using a modified merge sort

The keys of Timsort lie in the details of these steps. We refer to the sub-array as and the minimal length of runs as . The first step is to calculate the parameter . It should not be too large, because insertion sort is only effective for short arrays. It also should not be too small, because it will lead to more merge iterations in the last step. Based on experiments the values work well between 32 and 65. Besides, the optimal value is when is a power of 2 where is the length of input array because merge sort works perfectly on balanced sub-arrays. But there is not always such an integer for every possible value of , so we pick a value in range (32,65) that is a power of 2 or is strictly less than a power of 2.

The second step is to divide the input array into s. We first count the number of continuous increasing of decreasing elements from current pointer. If the number is greater than , then this sorted sub-array will be count as a and if it’s decreasing reverse it in place. Otherwise, we extend this sub-array to the length of and using binary sort to keep it sorted.

The last step is to merge all these sorted sub-arrays. It is always wise to merge the sub-arrays of equal or similar size. To achieve this, Timsort uses stacks to store the indexes and lengths of these sub-arrays. Every time a new is created, the length of the will be pushed to the stack and its index to the stack . More importantly, if consider the top three elements in stack are X, Y and Z, the stack maintains two invariants: and .

These invariants aim at maintaining run lengths as close to each other as possible to ensure balanced merges, which are more efficient. Once a new element is pushed and the rules are broken, Y will be merged with the smaller one between X and Z. The merging continues until both the invariants are satisfied. After the complete input array has been divided into s, the top two s in the stack are merged until there is only one remains, which is the sorted array. Consider the lengths of s in the stack are: 128, 64, 32, 16, 8, 4, 2 and finally the last sub-array comes with length of 2. Then there will be 7 perfectly balanced merges.

2.2 Simpl in Isabelle/HOL

Schirmer introduces in [7] a verification framework for imperative sequential programs developed in Isabelle/HOL. The verification framework includes a generic imperative language, called Simpl, which is composed of the necessary constructors to capture most of the features present in common sequential languages, such as conditional branching, loops, abrupt termination and exceptions, assertions, mutually recursive functions, expressions with side effects, and nondeterminism. Additionally, Simpl can express memory related features like the memory heap, pointers, and pointers to functions. The Simpl verification framework also includes a Floyd/Hoare-like logic to reason about partial and total correctness, and on top of it, the framework implements a verification condition generator (VCG) to ease the verification process.

The syntax of Simpl (shown in Fig. 1) is defined in terms of states, of type ’s; a set of fault types, of type ’f; and a set of procedure names of type ’p. The constructor Skip indicates program termination; Seq s1 s2, Cond b c1 c2, and While b c are respectively the standard constructors for sequential, conditional, and loop statements. Throw and Throw c1 c2 are the complements for abrupt termination of programs of Skip and Seq c1 c2, and they allow to model exceptions. Call p invokes procedure p; Guard f g c represents assertions, where c is executed if the guard g holds in the current state, fault of type ’f is raised otherwise. Finally, Spec r introduces a nondeterministic behavior expressed by relation r, and DynCom cs provides state dependent dynamic command transformation using the function cs which is used to model blocks and functions with arguments. The function call in Simpl is implemented by the dynamic command.

typesynonym s bexp  s setdatatype s p f com   Skip  Throw  Basic s  s  Spec s  s set  Call p   Seq s p f com sp f com       Cond s bexp spf com  spf com   While s bexp spf com  DynCom s  spf com    Guard f s bexp spf com    Catch spf com spf com

Figure 1: Abstract Syntax of Simpl Language

Based on operational semantics, Simpl implement a Hoare proof system for functional correctness of programs. In Simpl, the specification of Hoare logic has the form:

is the procedure environment. is a set of assumptions that contains the specifications we can utilize while verifying the program . are the precondition, postcondition for normal termination and abrupt termination respectively. Both partial and total correctness are defined inductively in Simpl. Moreover, both of them are proved sound and complete with reference to their semantics. The main tool in Simpl to utilize Hoare logic investigating programs is a verification condition generator that is implemented as tactic called . For a specification , applying simplifies the problem to the form , where is the weakest precondition after execution of , and .

Here, we illustrate how to specify and verify programs in Simpl. First, we use the keyword “procedures” to define a procedure and specify its signature and body as follows.

procedures  Fac Nnat  Rnat   IF N  0 THEN R  1   ELSE R  CALL FacN  1        R  N  R    FI

Then, we use pre- and post-condition to define its correctness specification and use Hoare logic to prove its correctness. We prove the specification to show the correctness of the procedure. First, we apply the rule to expand the body of procedure. Then, the method reduces the problem to the level of first order logic. Finally, we solve it automatically.

lemma in Facimpl Facspecshows n  N  n R  PROC FacN R  fac napply hoarerule HoarePartialProcRec1apply vcg  apply clarsimp  apply casetac N  apply auto  done

So far we have got the specification of factorial procedure and now we are able to make use of it. The syntax for procedure call is straightforward and when reaching a procedure call, verification condition generator looks for the specification and applies the rule instantiated with the specification. lemma in Facimplshows  N  3 R  CALL FacN  R  6   apply vcg apply auto simp addnumeral3eq3 done

3 Timsort Implementation by Simpl

As mentioned in Section 1, the first step is to specify Timsort algorithm using Simpl. We develop Simpl specification of Timsort according to the OpenJDK’s implementation. Thanks to the expressiveness of Simpl, the specification is a direct mapping from the implementation for most of the statements. However, some features of Java are different from the general language model, making necessary to introduce additional Simpl specification to model them. In summary, we specify all functions of Timsort as show in Fig. 2.

Figure 2: Timsort functions and their call graph implemented in Simpl Specification

3.1 Instance Variables

Java is an object-oriented programming language where data and methods are encapsulated into classes. The Timsort algorithm is implemented as a class in Java, so it has its own instance variables and methods. In Simpl, we use global variables to model Timsort instance variables, and procedures to model the class methods. Simpl uses hoarestate to store variables and the hoarestate that contains global variables begins with prefix “globals-”. The declaration is as below:

hoarestate globalsvar   stacksize nat runbase  nat list runlen  nat list stacklen  nat  a  int list  globalmingallop  nat

Most of variables are declared in the same way as the Java implementation except the stacks and its size. Because both stacks and store non-negative elements and arrays are modeled as lists in Simpl, they have the type “nat list”, which means list of natural numbers. Similarly, the variable is the size of the two stacks so it is defined as type . Isabelle/HOL itself has the type int, but the advantage of over is that many auxiliary definitions and lemmas are defined using natural numbers and their inductive structures. As a result, we can use these useful definitions and lemmas for free.

3.2 Restate Methods in Simpl

3.2.1 System methods

The Timsort implementation in Java involves copying a part of an array from source position to destinate position when doing binary sort and merge sort in gallop mode. This is achieved by the system methods System.arraycopy() in Java. This method is a native method, which means it is written in other programming language and may be executed differently on different architectures and virtual machines. In most programming languages, the function to copy memory are provided in standard library, which is assumed to be correct for this stage. So we define this method on the Isabelle/HOL level, which is directly used in the Simpl specification and from the view of Simpl it looks like a method from the “library”. Over the Isabelle/HOL specification we prove properties of the system method that are necessary for the correctness of the Simpl specification. Moreover, additional lemmas can be proven easily for future uses. The definition for copying to is as below:

definition listcopy  a list  nat  a list   nat  nat  a list wherelistcopy xs n ys m l  take n xs  take l drop m ys  drop nl xs

And because IndexOutOfBoundException is a throw in Java when source position plus copy length exceeds the length of source array, or when destinate position plus copy length exceeds the length of destinate array, we can use these constraints as assumptions to conclude the correctness of our definition and some useful lemmas. Here we prove that in the result array elements are preserved with reference to original arrays and the length of array does not change.

lemma listcopylensimpmllength ys  nllength xs                 length listcopy xs n ys m l  length xs  by auto simp addlistcopydeflemma listcopyifrontsimpnllength xs  mllength ys          in  listcopy xs n ys m li  xsi  by auto simp addlistcopydef lemma listcopyimidsimpnllength xs  mllength ys          ininl  listcopy xs n ys m li  ysinm  apply auto simp addlistcopydef  apply subgoaltac min length xs n  n   apply simp    apply subgoaltac in  l    by auto simp addaddcommute   lemma listcopyiendsimpnllength xs  mllength ys     inlilength xs  listcopy xs n ys m li  xsi  apply auto simp addlistcopydef  apply subgoaltac min length xs n  min length ys  m l  nl   by auto Procedure abstractions at Simpl level to Isabelle/HOL level is very useful when a piece of code can be assumed correct because in this way we just need to deal with several simple lemmas. Otherwise, the verification condition generator will create many complex pre- and post-conditions to prove.

3.2.2 Deep and shallow copy

For efficiency reasons, Java instance variables are passed as reference arguments to class methods. However, in Simpl all variables are passed by value carrying out a deep copy of the parameter, for which modifications are not returned back to the global variable when the procedure finishes. Although Simpl allows to define pointers, adding pointers would make the verification more complicated. Since Simpl allows returning of multiple variables, instead of using pointers reference-passed arguments in the Java implementation are returned variables in the specification, therefore allowing to reflect modifications over these arguments.

For instance, in OpenJDK’s implementation, when is called to extend a partially sorted sub-array to the minimal length, the field variable is passed to the method as a parameter because the access a local variable is faster than a field variable. In our specification, after we sort a sub-array, the sorted list will be returned to the caller and we assign the returned value to the original list variable. Therefore, the procedure call of binary sort is defined as: a  CALL binarysorta lo loforce lorunleni

3.2.3 Methods with bitwise operations

There are two private methods involve bitwise operations, and . The method will calculate a number , such that and is close to, but strictly less than, an exact power of 2. This return value is the minimum acceptable run length for an array of the specified length where run is an ordered segment of the original array and will be merged later. In general, the purpose of the method is to find a suitable threshold to improve the performance. Therefore, in our Simpl implementation, we can just assign this value to the number 16. Similarly, the method is to ensure a comparatively low extra space to be used. Therefore, we just create enough space for new arrays using function, which simplifies the verification.

3.2.4 An example of translation

The method is defined as follows in Java:

1private void mergeCollapse() {
2        while (stackSize > 1) {
3            int n = stackSize - 2;
4            if (n > 0 && runLen[n-1] <= runLen[n] + runLen[n+1]) {
5                if (runLen[n - 1] < runLen[n + 1])
6                    n–;
7                mergeAt(n);
8            } else if (runLen[n] <= runLen[n + 1]) {
9                mergeAt(n);
10            } else {
11                break; // Invariant is established
12            }
13        }
14    }

We translated it to the Simpl specification as follows. The translation of assignment statements, if-branches and while-loops is straightforward. Try-catch structure is used here to create the same effect as a break statement. procedures imports globalsvar   mergecollapse  where nnat inTRY   WHILE stacksize  1 DO    n  stacksize2    IF n0  runlenn1  runlenn  runlenn1     n1  runlenn2  runlenn1  runlenn THEN       IF runlenn1  runlenn1 THEN         nn1       FI    ELSE       IF n0  runlenn  runlenn1 THEN         THROW       FI    FI    CALL mergeatn   OD CATCH SKIP END

4 Formal Verification by Hoare Logic

For formal verification, a large number of aspects can be verified, such as functional correctness, sorting stability and absence of illegal array indexes. In our work, we mainly focus on the stack invariant which is the most important feature in Timsort. Gouw et al.[1] have found a bug that breaks the invariant in OpenJDK’s implementation. In this section, we start from the broken invariant of the stack and then prove the invariants of stack relevant and irrelevant procedures.

4.1 Broken Invariant

In Timsort, a collection of sorted pieces of the array are maintained in a stack. These pieces are kept in the pattern that the length of each piece is larger than the length of its next piece and is the sum of the lengths of its next two pieces if it has. These two rules ensure the efficiency of merge sort. Moreover, as we mentioned above, another rule is that the minimal length of sorted piece is 16 in our implementation. Timsort implentation refers these sorted piece as runs, so we will call them as runs in the following part of this section. Besides, we will call these rules about the stack as stack invariant. With the stack invariant, it is possible to infer that the space needed by the stack is a fixed number and can be calculated immediately given the length of the array to sort. Based on this property, the space of stack can be allocated in initialization as OpenJDK does. The following piece of code show the constructed function in Timsort class.

1int stackLen = (len < 120 ? 5 :
2                len < 1542 ? 10 :
3                len < 119151 ? 19 : 40);
4    runBase = new int[stackLen];
5    runLen = new int[stackLen];

If the length of array is less than 120, then 5 elements is enough for the stack. Actually, 4 is already a safe bound and the bounds in OpenJDK are slightly adjusted for more safety. Similarly, if the length of array is less than 1542, then 9 is the safe boundary. The default value is 40, because of the maximal value of integer is in Java.

Then, for fixed number of stack length , we can verify that it is safe if the array length is no longer than where and are fibonacci series and modified fibonacci series. fun fib nat  nat wherefib 0  1 fib Suc 0  1 fib Suc Suc n  fibn  fibSuc nfun fib2 nat  nat wherefib2 0  0 fib2 Suc 0  1 fib2 Suc Suc n  fib2n  fib2Suc n  1 We prove is the safe bound by showing that in the worst case the exact length of array is . The worst case means that each element in the stack is exactly the least value it can be. So in the worst case, the sum of a full stack is the least length of the array to be sorted by definition of . Consequently, arrays with less elements than the least length are safe because they will not cause the stack overflow. So we prove that the least value of each element is where k is the index of the elements. lemma runlenelemlowerbound i 3i  il  eleminv rl li u elembiggerthannext rl l2 elemlargerthanbound rl l1 u  length rl  l  l2  kl  rll1k  ufib k  fib2 k As a result of the lower bound of every elements, we can conclude that the sum of stack is . lemma runlensumlowerbound i 3i  il  eleminv rl li u elembiggerthannext rl l2  elemlargerthanbound rl l1 u  length rl  l  l2 sumn rl l  ufib l11  fib2 l1  l1 In order to maintain the stack invariant, every time after a new run is pushed onto the top of stack, the method will be called to check whether the stack invariant holds. If it does not, then two continuous runs will be merged to a larger run and then loop over until the stack invariant holds. However, the termination condition in the old version is not strong enough to make sure that the stack invariants hold for all the elements in the stack. The result is that we cannot conclude that the space allocated to the stack in the constructor function is adequate and a runtime error might happen. As a counter example, which breaks the stack invariant, and the worst case, which does cause the index out of bound exception, is given in [1] together with the fixed version of . We express the new version along with other relative methods using Simpl in Isabelle/HOL and verify that the stack invariant does hold after ensuring that an implementation of our specification is correct and thus does not trigger the error.

4.2 Prove Stack Invariant Relevant Procedures

To prove stack invariants, we could only consider the procedures that modify the elements in the stack. These procedures in Timsort are , , and . This subsection discusses the invariant proof of them.

Because most of work on the stack invariant has been done in KeY[1], we adopt them into Isabelle and prove them on Timsort procedures specified in Simpl. There are several properties that must be satisfied through the execution of the program. For example, given the length of stack is , the index points to the top of stack should satisfy the property . In KeY, these properties that hold all the time are called class invariant and are annotated by the keyword ”invariant” so that they do not need to be added manually to the pre- and post-conditions of each procedure. In Simpl, we could define the pre- and post-conditions that imply the invariant. The invariant is defined in Isabelle as follows.

definition invariant  nat list  nat list  int list  nat  boolwhere invariant runlen runbase a stacksize  size runbase  size runlen size a  120   size runlen  4 size a  120  size a  1542   size runlen  9 size a 1542  size a  119151   size runlen  18 size a  119151  size a  2917196496   size runlen  39 size a  2917196496 runbase0  sumn runlen stacksize  size a stacksize  0  stacksize  size runbase i i5  istacksizeeleminv runlen stacksizei 16 stacksize 4  elembiggerthannext runlen stacksize4 stacksize 3  elemlargerthanbound runlen stacksize3 16 stacksize 2 elemlargerthanbound runlen stacksize2 16 stacksize 1 elemlargerthanbound runlen stacksize1 1  i i0  istacksize1runbasei  runleni  runbasei1 elemlargerthanbound runbase 0 0

The invariant actually covers all aspects of the two stacks that store the indexes and lengths of s. The sizes of the two stacks must be the same. The size is an exact integer number given the length of input array which is not longer than . The sum of all the lengths in plus the index of first cannot exceed the length of input array. always points to the top of stack and can never be greater than the size of stack. After a merge or a new element pushed into stack, the stack invariant may be broken temporarily, so these invariants are actually looser than the stack invariant. Finally, the property that the index of each plus its length should be the index of next for every valid run stored in the stacks.

Next, we discuss the functional correctness of Timsort procedures, where the invariant defined above is included in the pre- and post-condition of each procedure.

The functional correctness of the procedure is defined as pre- and post-conditions as follows. In Simpl, the variables with on up left means the old variable in the state in the pre-condition. The procedure is called every time when the information of a new is about to be stored in stacks. Before it is called the stack invariant should be satisfied strictly. Moreover, for a valid it is supposed to have a positive index and length and its index should be its predecessor’s index plus the length if it has predecessor. Also, this cannot exceed the input array. These constrains compose the precondition of . After the execution of , not only the basic functionalities should be met, we also expect that the other elements are not changed in the procedure. lemma in pushrunimpl pushrunspec   runleni0runlenisize arunbasei0stacksize0     runbaseirunbasestacksize1runlenstacksize1 runlenirunbasei  size a i i3  istacksizeeleminv runlen stacksizei 16 stacksize2  elembiggerthannext runlen stacksize2 stacksize1  elemlargerthanbound runlen stacksize1 16 stacksize0  stacksizesize runlen invariant runlen runbase a stacksize PROC pushrunrunbasei runleni  runbasestacksize  runbasei runlen stacksize  runleni  stacksize   stacksize  1 runbasei  runbasei  runleni  runleni i i0  istacksize1  runleni  runleni i i0  istacksize1  runbasei  runbasei  invariant runlen runbase a stacksize  size a  size a 

Now, we discuss the procedure . The pre- and post-conditions of the procedure is defined as follows. For the top three elements in , , and , there are different merge strategies for different cases. It can either happen on and or on and given different conditions. So the index may either point at or at if there are at least three elements. After the merge, except for basic functionalities, we can also prove that the sum of valid elements in does not change, which means merge does not change the total number of elements. Moreover, after merge we expect that the last element in does not decrease, which mean the process of merge can either make the last valid longer or stay it unchanged. lemma in mergeatimpl mergeatspec    stacksize2  istacksize2 stacksize3  istacksize2 istacksize3 stacksize  1  i0  a   invariant runlen runbase a stacksize    PROC mergeati  i  i  runbase0  runbase0  size a  size a stacksize  stacksize  1 runleni  runleni  runleni1  i  stacksize 3   runleni1  runleni2        runbasei1  runbasei2 invariant runlen runbase a stacksize sumn runlen stacksize  sumn runlen stacksize runlenstacksize1  runlenstacksize1 

The pre- and post-conditions of the procedure is defined as follows. Every time after a new is pushed to the stack, this algorithm checks whether the stack invariant holds and merge a few sub-arrays to re-establish the stack invariant, which is what procedure does. Because the length of the new isnot known, the precondition only constrains the other elements in the stack invariant. But after the procedure is done, the stack invariants must strictly holds for each element. Again, there are some corollaries that come together with the postcondition just as in the specification of procedure . lemma in mergecollapseimpl mergecollapsespec   stacksize0 stacksize4  eleminv runlen stacksize4 16 stacksize3  elembiggerthannext runlen stacksize3 a    invariant runlen runbase a stacksize PROC mergecollapse  i i3  istacksizeeleminv runlen stacksizei 16 stacksize2  elembiggerthannext runlen stacksize2 sumn runlen stacksize  sumn runlen stacksize  runlenstacksize1  runlenstacksize1 stacksize0  stacksize  stacksize invariant runlen runbase a stacksize size a  size a runbase0  runbase0 

Finally, for the procedure , after the entire input array has been divided into s, all the s are merged from back to the beginning. There is not special requirement before this procedure and we only expect that only one remains at last. lemma in mergeforcecollapseimpl mergeforcecollapsespec   stacksize  0  a   invariant runlen runbase a stacksize PROC mergeforcecollapse stacksize  1  invariant runlen runbase a stacksize size a  size a 

4.3 Prove Stack Invariant Irrelevant Procedures

For the procedures that do not modify the stack, their specification are relatively simple. We prove that their parameters are valid and the invariant that mentioned above still holds. There are several ways to prove the invariant. On one hand, we can add them in the pre- and post-conditions and prove them as normal. On the other hand, we can use the modification specification provided by Simpl instead, which describes whether a global variable is modified in this procedure. In this paper, we use the second way. For instance, the condition of the procedure is defined as follows.

lemma in mergeloimpl mergelomodifies  shows      PROC mergelobase1 len1 base2 len2            t t mayonlymodifyglobals  in a

In such a condition, the caller of knows that this procedure only modifies the array, and that the global variables that are not modified in the new state are the same as in the state before procedure call. This implicitly proves that if the related properties hold before the call, they hold after the call as well because they are not changed at all.

4.4 Modularity

In the process of verification, there are several specifications involving a large number of pre- and post-conditions. As a result, it is complicated for Simpl to simplify these conditions by the verification condition generator. Moreover, if we modify only one of these conditions in the large specification, the whole proof need to be rearranged because one of the proof steps might fail after the change and the following steps will be invalid. Therefore, we need modularity to divide the large specification into several small pieces. For example, consider the specification , if both and contain a large number of conditions, the proof size will be large consequently. However, it is possible to prove two sub-specifications and where and then conclude the original specification with the application of rule . lemma PostConjI   assumes derivQ F P c QA   assumes derivR F P c RB  shows F P c Q  RA  B Furthermore, if only the subset is adequate to derive the post-conditions set , then we can get rid of the redundant assumptions in the first sub-specification, which can be achieved by the rule . lemma conseqPre tF P c QA  P  P  tF P c QA Therefore, we can reduce the size of specification to ease the verification. Another benefit is modularity, which means we can prove different aspects of algorithms in different specifications. For example, we do not need to prove correctness and stability of a sort algorithm at the same time. Instead, it is possible to verify them separately.

5 Evaluation and Discussion

5.1 Generated C code

For this stage, we manually develop the C code according to Simpl. This process is simple because most of the Simpl statements can be directly mapped to C statements. We replace and with C standard library functions and to copy an area in memory and allocate new area for arrays. Besides, we pass an array to a function body as a parameter by passing its pointer because we can only shallow copy arrays in C and therefore we avoid returning two values in function. Further, when it comes to try-catch clauses, we translate them according to the behaviour they specify. If abrupt termination follows an assignment to return value we translate it as a return in C. Likewise, if it means to begin a new iteration or quit from iteration in a loop then it is a continue or break. In case of try-catch structure in a nested loop, we define some flag variables to tell which level of loop to break or continue avoiding use of goto statement. The other parts are straightforward.

We tested the C implementation with simple random test cases. Randomness does not mean that each element is generated randomly, because in this case it is of slight possibility to obtain an ascending or descending array longer than the minimal length boundary for s. Consequently almost all the sub-arrays are forced to extend to the minimal length. Timsort does not have an optimal performance when all s have equal length, for which basically it has no differences from the traditional merge sort with a threshold. Therefore, we generate random numbers that represent the length of ascending sub-arrays and generate random numbers increasingly in the array to obtain the test case. When it comes to check the correctness of results, it is not reliable to just traverse the result arrays and assert that every elements is less than its successor. It is also necessary to check all the elements are preserved. So the method we use is to copy the test array and call quicksort in C standard library to sort it. We assume that quicksort in C standard library is correct and conclude Timsort implementation in C is correct from the fact that two implementations generated exactly the same results.

It is desirable to automate the translation from Simpl to C. First, for large programs it is hard to manually carry out the translation. Second, it is not reliable to generate code by hand because it is a process prone to introduce errors during the manual translation. Although there are methods to check the consistency between Simpl code and corresponding C code, it is much better if reliable C code can be generated directly. To achieve this, we expect to create a verified compiler from Simpl to C embedded in Isabelle/HOL.

5.2 Statistics of Specification and Proof

We use Isabelle/HOL as the specification and verification system for our work. All derivations of our proofs have passed through the Isabelle proof kernel. We develop 400 lines of Simpl specification, which is at the same scale as those in OpenJDK. We use 1,600 lines of proof to show its correctness, most of the effort is on the stack relevant invariants. The C implementation of Timsort generated from the Simpl specification is 500 lines. Compared to a large number ( 5000) of manual proof steps in KeY [1], we show a less proof effort in Simpl and the advantage of using its VCG.

6 Conclusion and Future Work

We have formally verified Timsort using Isabelle/HOL and Simpl. We have first specified the Timsort algorithm using the generic imperative language Simpl based on the implementation in OpenJDK. Then we have verified relevant properties w.r.t. the stack on the specification using Simpl VCG to simplify the verification to high order logic problems that can be solved using Isabelle/HOL automated decision procedures. Finally, we have manually produced verified Timsort implementation in C from our Simpl specification.

In the future, we plan to formally verify the full functional correctness of the Timsort implementation as well as use benckmarks to fully test the performance of our Timsort implementation in C. To complete the verification framework we consider to build a verified translator from Simpl specifications to C.


  • [1] de Gouw, S., Rot, J., de Boer, F.S., Bubel, R., Hähnle, R.: Openjdk’s java.utils.collection.sort() is broken: The good, the bad and the worst case. In: Kroening, D., Păsăreanu, C.S. (eds.) Computer Aided Verification. pp. 273–289. Springer International Publishing, Cham (2015)
  • [2] Klein, G., Elphinstone, K., Heiser, G., Andronick, J., Cock, D., Derrin, P., Elkaduwe, D., Engelhardt, K., Kolanski, R., Norrish, M., et al.: sel4: Formal verification of an os kernel. In: Proceedings of ACM SIGOPS 22nd Symposium on Operating Systems Principles. pp. 207–220. SOSP’09, ACM Press, Big Sky, Montana, USA (2009)
  • [3] Leroy, X.: Formal verification of a realistic compiler. Communications of the ACM 52(7), 107–115 (July 2009)
  • [4] Noschinski, L., Rizkallah, C., Mehlhorn, K.: Verification of certifying computations through autocorres and simpl. In: Badger, J.M., Rozier, K.Y. (eds.) NASA Formal Methods. pp. 46–61. Springer International Publishing, Cham (2014)
  • [5] Peters, T.: Timsort description in C, http://svn.python.org/projects/python/trunk/Objects/listobject.c
  • [6] Sanán, D., Zhao, Y., Hou, Z., Zhang, F., Tiu, A., Liu, Y.: Csimpl: A rely-guarantee-based framework for verifying concurrent programs. In: Legay, A., Margaria, T. (eds.) Tools and Algorithms for the Construction and Analysis of Systems. pp. 481–498. Springer Berlin Heidelberg, Berlin, Heidelberg (2017)
  • [7] Schirmer, N.: Verification of Sequential Imperative Programs in Isabelle/HOL. Ph.D. thesis, Technischen Universitat Munchen (2006)
  • [8] Schirmer, N.: A Sequential Imperative Programming Language Syntax, Semantics, Hoare Logics and Verification Environment. Archive of Formal Proofs (Feb 2008)
  • [9]

    Sternagel, C.: Proof Pearl—A Mechanized Proof of GHC’s Mergesort. Journal of Automated Reasoning

    51(4), 357–370 (12 2013)