1 Introduction
Formal verification has been considered as a promising way to the reliability of programs. With development of verification tools, it is possible to perform fully formal verification of large and complex programs in recent years [2, 3]. Formal verification of traditional algorithms are of great significance due to their wide application in stateoftheart software. The goal of this paper is the functional verification of sorting algorithms as well as generation of C source code. We investigated Timsort algorithm which is a hybrid stable sorting algorithm, derived from merge sort and insertion sort, designed to work well on many kinds of realworld data.
Tim Peters invented Timsort algorithm and applied it in the Python standard library. Afterwards it has also been used to sort arrays of nonprimitive type in Java, on the Android platform and in GNU Octave. Gouw et al.[1] have carried our formal verification of the OpenJDK’s Timsort implementation using KeY. Timsort is the main sorting algorithm provided by the Java standard library. KeY is a Java verification tool and a semiautomatic, interactive theorem prover, covering nearly full sequential Java. However, there is still not a standard and formally verified Timsort implementation in C programming language. Tim Peters [5] himself released a C version Timsort, which is actually a part of the implementation of Python’s List data structure in C and has not been verified.
This paper studies Timsort implementation and its formal verification using Simpl [7] in Isabelle/HOL. Different from KeY which mainly focuses on Java programs, Simpl is a generic imperative language embedded into Isabelle/HOL that was designed as an intermediate language for program verification. In Isabelle/HOL, the GHC’s sorting algorithm for lists has been formalized and its correctness and stability have been proved in [9]
. Quicksort algorithm has been verified in Simpl based on split heap model
[8], which cost only less than 100 interactive proofs. Simpl has been deeply used in formal verification of the seL4 OS kernel [2], where the C source code of the kernel is automatically translated into Simpl specification by the verified tools, CParser and Autocorres. Lars Noschinski et al.[4] has formally verified a certifying algorithm checkers for connectedness of graphs written in C from the library LEDA using Simpl and Autocorres. Besides, in order to reason about concurrent programs, Sanan et al. extends Simpl to CSimpl [6] which is an extension of Simpl with concurrencyoriented language features and verification techniques.In this paper, we specify the Timsort algorithm using Simpl in the Isabelle/HOL theorem prover, and then generate real C source code after its functional verification ^{1}^{1}1The Isabelle/HOL specification and proof, and the generated C code are available at https://github.com/LVPGroup/TimSort/. As a first step, the C code generation is done manually according to the Simpl specification. Thanks to the Clike concrete syntax of Simpl in Isabelle/HOL, the generation is straightforward and could be easily implemented by a translator in the future. Compare to the posthoc verification of Timsort algorithms within KeY [1], we use Simpl and Isabelle/HOL to specify and verify Timsort algorithm providing machinechecked proof, and then export the specification into C code. Second, KeY is a proof assistant designed for Java programs, whilst Isabelle/HOL and Simpl are more general. Therefore, it is possible to generate verified a verified Timsort implementation in other imperative languages. Third, Simpl is embedded in Isabelle/HOL and we can make use of its comprehensive libraries and stronger solvers/provers. So we expect the process of verification comes at a lower cost compared with its counterpart in KeY.
2 Preliminaries
2.1 Timsort Algorithm
Timsort is an effective combination of merge sort and insertion sort, which subtly modifies the two classical algorithms to reach better performance. It is a stable sort with complexity of at worst case and at best. Timsort is designed to take advantage of partial ordering that already exists in the data so it’s remarkably fast for nearly sorted data sequence and reverse sorted data. The procedure of Timsort basically follows the pattern of divideandconquer:

Divide an input array into subarrays with a minimal length

Sort each subarray by binary sort(a combination of binary search and insertion sort)

Merge all the sorted subarrays into a single array using a modified merge sort
The keys of Timsort lie in the details of these steps. We refer to the subarray as and the minimal length of runs as . The first step is to calculate the parameter . It should not be too large, because insertion sort is only effective for short arrays. It also should not be too small, because it will lead to more merge iterations in the last step. Based on experiments the values work well between 32 and 65. Besides, the optimal value is when is a power of 2 where is the length of input array because merge sort works perfectly on balanced subarrays. But there is not always such an integer for every possible value of , so we pick a value in range (32,65) that is a power of 2 or is strictly less than a power of 2.
The second step is to divide the input array into s. We first count the number of continuous increasing of decreasing elements from current pointer. If the number is greater than , then this sorted subarray will be count as a and if it’s decreasing reverse it in place. Otherwise, we extend this subarray to the length of and using binary sort to keep it sorted.
The last step is to merge all these sorted subarrays. It is always wise to merge the subarrays of equal or similar size. To achieve this, Timsort uses stacks to store the indexes and lengths of these subarrays. Every time a new is created, the length of the will be pushed to the stack and its index to the stack . More importantly, if consider the top three elements in stack are X, Y and Z, the stack maintains two invariants: and .
These invariants aim at maintaining run lengths as close to each other as possible to ensure balanced merges, which are more efficient. Once a new element is pushed and the rules are broken, Y will be merged with the smaller one between X and Z. The merging continues until both the invariants are satisfied. After the complete input array has been divided into s, the top two s in the stack are merged until there is only one remains, which is the sorted array. Consider the lengths of s in the stack are: 128, 64, 32, 16, 8, 4, 2 and finally the last subarray comes with length of 2. Then there will be 7 perfectly balanced merges.
2.2 Simpl in Isabelle/HOL
Schirmer introduces in [7] a verification framework for imperative sequential programs developed in Isabelle/HOL. The verification framework includes a generic imperative language, called Simpl, which is composed of the necessary constructors to capture most of the features present in common sequential languages, such as conditional branching, loops, abrupt termination and exceptions, assertions, mutually recursive functions, expressions with side effects, and nondeterminism. Additionally, Simpl can express memory related features like the memory heap, pointers, and pointers to functions. The Simpl verification framework also includes a Floyd/Hoarelike logic to reason about partial and total correctness, and on top of it, the framework implements a verification condition generator (VCG) to ease the verification process.
The syntax of Simpl (shown in Fig. 1) is defined in terms of states, of type ’s; a set of fault types, of type ’f; and a set of procedure names of type ’p. The constructor Skip indicates program termination; Seq s1 s2, Cond b c1 c2, and While b c are respectively the standard constructors for sequential, conditional, and loop statements. Throw and Throw c1 c2 are the complements for abrupt termination of programs of Skip and Seq c1 c2, and they allow to model exceptions. Call p invokes procedure p; Guard f g c represents assertions, where c is executed if the guard g holds in the current state, fault of type ’f is raised otherwise. Finally, Spec r introduces a nondeterministic behavior expressed by relation r, and DynCom cs provides state dependent dynamic command transformation using the function cs which is used to model blocks and functions with arguments. The function call in Simpl is implemented by the dynamic command.
Based on operational semantics, Simpl implement a Hoare proof system for functional correctness of programs. In Simpl, the specification of Hoare logic has the form:
is the procedure environment. is a set of assumptions that contains the specifications we can utilize while verifying the program . are the precondition, postcondition for normal termination and abrupt termination respectively. Both partial and total correctness are defined inductively in Simpl. Moreover, both of them are proved sound and complete with reference to their semantics. The main tool in Simpl to utilize Hoare logic investigating programs is a verification condition generator that is implemented as tactic called . For a specification , applying simplifies the problem to the form , where is the weakest precondition after execution of , and .
Here, we illustrate how to specify and verify programs in Simpl. First, we use the keyword “procedures” to define a procedure and specify its signature and body as follows.
procedures Fac Nnat Rnat IF N 0 THEN R 1 ELSE R CALL FacN 1 R N R FI
Then, we use pre and postcondition to define its correctness specification and use Hoare logic to prove its correctness. We prove the specification to show the correctness of the procedure. First, we apply the rule to expand the body of procedure. Then, the method reduces the problem to the level of first order logic. Finally, we solve it automatically.
lemma in Facimpl Facspecshows n N n R PROC FacN R fac napply hoarerule HoarePartialProcRec1apply vcg apply clarsimp apply casetac N apply auto done
So far we have got the specification of factorial procedure and now we are able to make use of it. The syntax for procedure call is straightforward and when reaching a procedure call, verification condition generator looks for the specification and applies the rule instantiated with the specification. lemma in Facimplshows N 3 R CALL FacN R 6 apply vcg apply auto simp addnumeral3eq3 done
3 Timsort Implementation by Simpl
As mentioned in Section 1, the first step is to specify Timsort algorithm using Simpl. We develop Simpl specification of Timsort according to the OpenJDK’s implementation. Thanks to the expressiveness of Simpl, the specification is a direct mapping from the implementation for most of the statements. However, some features of Java are different from the general language model, making necessary to introduce additional Simpl specification to model them. In summary, we specify all functions of Timsort as show in Fig. 2.
3.1 Instance Variables
Java is an objectoriented programming language where data and methods are encapsulated into classes. The Timsort algorithm is implemented as a class in Java, so it has its own instance variables and methods. In Simpl, we use global variables to model Timsort instance variables, and procedures to model the class methods. Simpl uses hoarestate to store variables and the hoarestate that contains global variables begins with prefix “globals”. The declaration is as below:
hoarestate globalsvar stacksize nat runbase nat list runlen nat list stacklen nat a int list globalmingallop nat
Most of variables are declared in the same way as the Java implementation except the stacks and its size. Because both stacks and store nonnegative elements and arrays are modeled as lists in Simpl, they have the type “nat list”, which means list of natural numbers. Similarly, the variable is the size of the two stacks so it is defined as type . Isabelle/HOL itself has the type int, but the advantage of over is that many auxiliary definitions and lemmas are defined using natural numbers and their inductive structures. As a result, we can use these useful definitions and lemmas for free.
3.2 Restate Methods in Simpl
3.2.1 System methods
The Timsort implementation in Java involves copying a part of an array from source position to destinate position when doing binary sort and merge sort in gallop mode. This is achieved by the system methods System.arraycopy() in Java. This method is a native method, which means it is written in other programming language and may be executed differently on different architectures and virtual machines. In most programming languages, the function to copy memory are provided in standard library, which is assumed to be correct for this stage. So we define this method on the Isabelle/HOL level, which is directly used in the Simpl specification and from the view of Simpl it looks like a method from the “library”. Over the Isabelle/HOL specification we prove properties of the system method that are necessary for the correctness of the Simpl specification. Moreover, additional lemmas can be proven easily for future uses. The definition for copying to is as below:
definition listcopy a list nat a list nat nat a list wherelistcopy xs n ys m l take n xs take l drop m ys drop nl xs
And because IndexOutOfBoundException is a throw in Java when source position plus copy length exceeds the length of source array, or when destinate position plus copy length exceeds the length of destinate array, we can use these constraints as assumptions to conclude the correctness of our definition and some useful lemmas. Here we prove that in the result array elements are preserved with reference to original arrays and the length of array does not change.
lemma listcopylensimpmllength ys nllength xs length listcopy xs n ys m l length xs by auto simp addlistcopydeflemma listcopyifrontsimpnllength xs mllength ys in listcopy xs n ys m li xsi by auto simp addlistcopydef lemma listcopyimidsimpnllength xs mllength ys ininl listcopy xs n ys m li ysinm apply auto simp addlistcopydef apply subgoaltac min length xs n n apply simp apply subgoaltac in l by auto simp addaddcommute lemma listcopyiendsimpnllength xs mllength ys inlilength xs listcopy xs n ys m li xsi apply auto simp addlistcopydef apply subgoaltac min length xs n min length ys m l nl by auto Procedure abstractions at Simpl level to Isabelle/HOL level is very useful when a piece of code can be assumed correct because in this way we just need to deal with several simple lemmas. Otherwise, the verification condition generator will create many complex pre and postconditions to prove.
3.2.2 Deep and shallow copy
For efficiency reasons, Java instance variables are passed as reference arguments to class methods. However, in Simpl all variables are passed by value carrying out a deep copy of the parameter, for which modifications are not returned back to the global variable when the procedure finishes. Although Simpl allows to define pointers, adding pointers would make the verification more complicated. Since Simpl allows returning of multiple variables, instead of using pointers referencepassed arguments in the Java implementation are returned variables in the specification, therefore allowing to reflect modifications over these arguments.
For instance, in OpenJDK’s implementation, when is called to extend a partially sorted subarray to the minimal length, the field variable is passed to the method as a parameter because the access a local variable is faster than a field variable. In our specification, after we sort a subarray, the sorted list will be returned to the caller and we assign the returned value to the original list variable. Therefore, the procedure call of binary sort is defined as: a CALL binarysorta lo loforce lorunleni
3.2.3 Methods with bitwise operations
There are two private methods involve bitwise operations, and . The method will calculate a number , such that and is close to, but strictly less than, an exact power of 2. This return value is the minimum acceptable run length for an array of the specified length where run is an ordered segment of the original array and will be merged later. In general, the purpose of the method is to find a suitable threshold to improve the performance. Therefore, in our Simpl implementation, we can just assign this value to the number 16. Similarly, the method is to ensure a comparatively low extra space to be used. Therefore, we just create enough space for new arrays using function, which simplifies the verification.
3.2.4 An example of translation
The method is defined as follows in Java:
We translated it to the Simpl specification as follows. The translation of assignment statements, ifbranches and whileloops is straightforward. Trycatch structure is used here to create the same effect as a break statement. procedures imports globalsvar mergecollapse where nnat inTRY WHILE stacksize 1 DO n stacksize2 IF n0 runlenn1 runlenn runlenn1 n1 runlenn2 runlenn1 runlenn THEN IF runlenn1 runlenn1 THEN nn1 FI ELSE IF n0 runlenn runlenn1 THEN THROW FI FI CALL mergeatn OD CATCH SKIP END
4 Formal Verification by Hoare Logic
For formal verification, a large number of aspects can be verified, such as functional correctness, sorting stability and absence of illegal array indexes. In our work, we mainly focus on the stack invariant which is the most important feature in Timsort. Gouw et al.[1] have found a bug that breaks the invariant in OpenJDK’s implementation. In this section, we start from the broken invariant of the stack and then prove the invariants of stack relevant and irrelevant procedures.
4.1 Broken Invariant
In Timsort, a collection of sorted pieces of the array are maintained in a stack. These pieces are kept in the pattern that the length of each piece is larger than the length of its next piece and is the sum of the lengths of its next two pieces if it has. These two rules ensure the efficiency of merge sort. Moreover, as we mentioned above, another rule is that the minimal length of sorted piece is 16 in our implementation. Timsort implentation refers these sorted piece as runs, so we will call them as runs in the following part of this section. Besides, we will call these rules about the stack as stack invariant. With the stack invariant, it is possible to infer that the space needed by the stack is a fixed number and can be calculated immediately given the length of the array to sort. Based on this property, the space of stack can be allocated in initialization as OpenJDK does. The following piece of code show the constructed function in Timsort class.
If the length of array is less than 120, then 5 elements is enough for the stack. Actually, 4 is already a safe bound and the bounds in OpenJDK are slightly adjusted for more safety. Similarly, if the length of array is less than 1542, then 9 is the safe boundary. The default value is 40, because of the maximal value of integer is in Java.
Then, for fixed number of stack length , we can verify that it is safe if the array length is no longer than where and are fibonacci series and modified fibonacci series. fun fib nat nat wherefib 0 1 fib Suc 0 1 fib Suc Suc n fibn fibSuc nfun fib2 nat nat wherefib2 0 0 fib2 Suc 0 1 fib2 Suc Suc n fib2n fib2Suc n 1 We prove is the safe bound by showing that in the worst case the exact length of array is . The worst case means that each element in the stack is exactly the least value it can be. So in the worst case, the sum of a full stack is the least length of the array to be sorted by definition of . Consequently, arrays with less elements than the least length are safe because they will not cause the stack overflow. So we prove that the least value of each element is where k is the index of the elements. lemma runlenelemlowerbound i 3i il eleminv rl li u elembiggerthannext rl l2 elemlargerthanbound rl l1 u length rl l l2 kl rll1k ufib k fib2 k As a result of the lower bound of every elements, we can conclude that the sum of stack is . lemma runlensumlowerbound i 3i il eleminv rl li u elembiggerthannext rl l2 elemlargerthanbound rl l1 u length rl l l2 sumn rl l ufib l11 fib2 l1 l1 In order to maintain the stack invariant, every time after a new run is pushed onto the top of stack, the method will be called to check whether the stack invariant holds. If it does not, then two continuous runs will be merged to a larger run and then loop over until the stack invariant holds. However, the termination condition in the old version is not strong enough to make sure that the stack invariants hold for all the elements in the stack. The result is that we cannot conclude that the space allocated to the stack in the constructor function is adequate and a runtime error might happen. As a counter example, which breaks the stack invariant, and the worst case, which does cause the index out of bound exception, is given in [1] together with the fixed version of . We express the new version along with other relative methods using Simpl in Isabelle/HOL and verify that the stack invariant does hold after ensuring that an implementation of our specification is correct and thus does not trigger the error.
4.2 Prove Stack Invariant Relevant Procedures
To prove stack invariants, we could only consider the procedures that modify the elements in the stack. These procedures in Timsort are , , and . This subsection discusses the invariant proof of them.
Because most of work on the stack invariant has been done in KeY[1], we adopt them into Isabelle and prove them on Timsort procedures specified in Simpl. There are several properties that must be satisfied through the execution of the program. For example, given the length of stack is , the index points to the top of stack should satisfy the property . In KeY, these properties that hold all the time are called class invariant and are annotated by the keyword ”invariant” so that they do not need to be added manually to the pre and postconditions of each procedure. In Simpl, we could define the pre and postconditions that imply the invariant. The invariant is defined in Isabelle as follows.
definition invariant nat list nat list int list nat boolwhere invariant runlen runbase a stacksize size runbase size runlen size a 120 size runlen 4 size a 120 size a 1542 size runlen 9 size a 1542 size a 119151 size runlen 18 size a 119151 size a 2917196496 size runlen 39 size a 2917196496 runbase0 sumn runlen stacksize size a stacksize 0 stacksize size runbase i i5 istacksizeeleminv runlen stacksizei 16 stacksize 4 elembiggerthannext runlen stacksize4 stacksize 3 elemlargerthanbound runlen stacksize3 16 stacksize 2 elemlargerthanbound runlen stacksize2 16 stacksize 1 elemlargerthanbound runlen stacksize1 1 i i0 istacksize1runbasei runleni runbasei1 elemlargerthanbound runbase 0 0
The invariant actually covers all aspects of the two stacks that store the indexes and lengths of s. The sizes of the two stacks must be the same. The size is an exact integer number given the length of input array which is not longer than . The sum of all the lengths in plus the index of first cannot exceed the length of input array. always points to the top of stack and can never be greater than the size of stack. After a merge or a new element pushed into stack, the stack invariant may be broken temporarily, so these invariants are actually looser than the stack invariant. Finally, the property that the index of each plus its length should be the index of next for every valid run stored in the stacks.
Next, we discuss the functional correctness of Timsort procedures, where the invariant defined above is included in the pre and postcondition of each procedure.
The functional correctness of the procedure is defined as pre and postconditions as follows. In Simpl, the variables with on up left means the old variable in the state in the precondition. The procedure is called every time when the information of a new is about to be stored in stacks. Before it is called the stack invariant should be satisfied strictly. Moreover, for a valid it is supposed to have a positive index and length and its index should be its predecessor’s index plus the length if it has predecessor. Also, this cannot exceed the input array. These constrains compose the precondition of . After the execution of , not only the basic functionalities should be met, we also expect that the other elements are not changed in the procedure. lemma in pushrunimpl pushrunspec runleni0runlenisize arunbasei0stacksize0 runbaseirunbasestacksize1runlenstacksize1 runlenirunbasei size a i i3 istacksizeeleminv runlen stacksizei 16 stacksize2 elembiggerthannext runlen stacksize2 stacksize1 elemlargerthanbound runlen stacksize1 16 stacksize0 stacksizesize runlen invariant runlen runbase a stacksize PROC pushrunrunbasei runleni runbasestacksize runbasei runlen stacksize runleni stacksize stacksize 1 runbasei runbasei runleni runleni i i0 istacksize1 runleni runleni i i0 istacksize1 runbasei runbasei invariant runlen runbase a stacksize size a size a
Now, we discuss the procedure . The pre and postconditions of the procedure is defined as follows. For the top three elements in , , and , there are different merge strategies for different cases. It can either happen on and or on and given different conditions. So the index may either point at or at if there are at least three elements. After the merge, except for basic functionalities, we can also prove that the sum of valid elements in does not change, which means merge does not change the total number of elements. Moreover, after merge we expect that the last element in does not decrease, which mean the process of merge can either make the last valid longer or stay it unchanged. lemma in mergeatimpl mergeatspec stacksize2 istacksize2 stacksize3 istacksize2 istacksize3 stacksize 1 i0 a invariant runlen runbase a stacksize PROC mergeati i i runbase0 runbase0 size a size a stacksize stacksize 1 runleni runleni runleni1 i stacksize 3 runleni1 runleni2 runbasei1 runbasei2 invariant runlen runbase a stacksize sumn runlen stacksize sumn runlen stacksize runlenstacksize1 runlenstacksize1
The pre and postconditions of the procedure is defined as follows. Every time after a new is pushed to the stack, this algorithm checks whether the stack invariant holds and merge a few subarrays to reestablish the stack invariant, which is what procedure does. Because the length of the new isnot known, the precondition only constrains the other elements in the stack invariant. But after the procedure is done, the stack invariants must strictly holds for each element. Again, there are some corollaries that come together with the postcondition just as in the specification of procedure . lemma in mergecollapseimpl mergecollapsespec stacksize0 stacksize4 eleminv runlen stacksize4 16 stacksize3 elembiggerthannext runlen stacksize3 a invariant runlen runbase a stacksize PROC mergecollapse i i3 istacksizeeleminv runlen stacksizei 16 stacksize2 elembiggerthannext runlen stacksize2 sumn runlen stacksize sumn runlen stacksize runlenstacksize1 runlenstacksize1 stacksize0 stacksize stacksize invariant runlen runbase a stacksize size a size a runbase0 runbase0
Finally, for the procedure , after the entire input array has been divided into s, all the s are merged from back to the beginning. There is not special requirement before this procedure and we only expect that only one remains at last. lemma in mergeforcecollapseimpl mergeforcecollapsespec stacksize 0 a invariant runlen runbase a stacksize PROC mergeforcecollapse stacksize 1 invariant runlen runbase a stacksize size a size a
4.3 Prove Stack Invariant Irrelevant Procedures
For the procedures that do not modify the stack, their specification are relatively simple. We prove that their parameters are valid and the invariant that mentioned above still holds. There are several ways to prove the invariant. On one hand, we can add them in the pre and postconditions and prove them as normal. On the other hand, we can use the modification specification provided by Simpl instead, which describes whether a global variable is modified in this procedure. In this paper, we use the second way. For instance, the condition of the procedure is defined as follows.
lemma in mergeloimpl mergelomodifies shows PROC mergelobase1 len1 base2 len2 t t mayonlymodifyglobals in a
In such a condition, the caller of knows that this procedure only modifies the array, and that the global variables that are not modified in the new state are the same as in the state before procedure call. This implicitly proves that if the related properties hold before the call, they hold after the call as well because they are not changed at all.
4.4 Modularity
In the process of verification, there are several specifications involving a large number of pre and postconditions. As a result, it is complicated for Simpl to simplify these conditions by the verification condition generator. Moreover, if we modify only one of these conditions in the large specification, the whole proof need to be rearranged because one of the proof steps might fail after the change and the following steps will be invalid. Therefore, we need modularity to divide the large specification into several small pieces. For example, consider the specification , if both and contain a large number of conditions, the proof size will be large consequently. However, it is possible to prove two subspecifications and where and then conclude the original specification with the application of rule . lemma PostConjI assumes derivQ F P c QA assumes derivR F P c RB shows F P c Q RA B Furthermore, if only the subset is adequate to derive the postconditions set , then we can get rid of the redundant assumptions in the first subspecification, which can be achieved by the rule . lemma conseqPre tF P c QA P P tF P c QA Therefore, we can reduce the size of specification to ease the verification. Another benefit is modularity, which means we can prove different aspects of algorithms in different specifications. For example, we do not need to prove correctness and stability of a sort algorithm at the same time. Instead, it is possible to verify them separately.
5 Evaluation and Discussion
5.1 Generated C code
For this stage, we manually develop the C code according to Simpl. This process is simple because most of the Simpl statements can be directly mapped to C statements. We replace and with C standard library functions and to copy an area in memory and allocate new area for arrays. Besides, we pass an array to a function body as a parameter by passing its pointer because we can only shallow copy arrays in C and therefore we avoid returning two values in function. Further, when it comes to trycatch clauses, we translate them according to the behaviour they specify. If abrupt termination follows an assignment to return value we translate it as a return in C. Likewise, if it means to begin a new iteration or quit from iteration in a loop then it is a continue or break. In case of trycatch structure in a nested loop, we define some flag variables to tell which level of loop to break or continue avoiding use of goto statement. The other parts are straightforward.
We tested the C implementation with simple random test cases. Randomness does not mean that each element is generated randomly, because in this case it is of slight possibility to obtain an ascending or descending array longer than the minimal length boundary for s. Consequently almost all the subarrays are forced to extend to the minimal length. Timsort does not have an optimal performance when all s have equal length, for which basically it has no differences from the traditional merge sort with a threshold. Therefore, we generate random numbers that represent the length of ascending subarrays and generate random numbers increasingly in the array to obtain the test case. When it comes to check the correctness of results, it is not reliable to just traverse the result arrays and assert that every elements is less than its successor. It is also necessary to check all the elements are preserved. So the method we use is to copy the test array and call quicksort in C standard library to sort it. We assume that quicksort in C standard library is correct and conclude Timsort implementation in C is correct from the fact that two implementations generated exactly the same results.
It is desirable to automate the translation from Simpl to C. First, for large programs it is hard to manually carry out the translation. Second, it is not reliable to generate code by hand because it is a process prone to introduce errors during the manual translation. Although there are methods to check the consistency between Simpl code and corresponding C code, it is much better if reliable C code can be generated directly. To achieve this, we expect to create a verified compiler from Simpl to C embedded in Isabelle/HOL.
5.2 Statistics of Specification and Proof
We use Isabelle/HOL as the specification and verification system for our work. All derivations of our proofs have passed through the Isabelle proof kernel. We develop 400 lines of Simpl specification, which is at the same scale as those in OpenJDK. We use 1,600 lines of proof to show its correctness, most of the effort is on the stack relevant invariants. The C implementation of Timsort generated from the Simpl specification is 500 lines. Compared to a large number ( 5000) of manual proof steps in KeY [1], we show a less proof effort in Simpl and the advantage of using its VCG.
6 Conclusion and Future Work
We have formally verified Timsort using Isabelle/HOL and Simpl. We have first specified the Timsort algorithm using the generic imperative language Simpl based on the implementation in OpenJDK. Then we have verified relevant properties w.r.t. the stack on the specification using Simpl VCG to simplify the verification to high order logic problems that can be solved using Isabelle/HOL automated decision procedures. Finally, we have manually produced verified Timsort implementation in C from our Simpl specification.
In the future, we plan to formally verify the full functional correctness of the Timsort implementation as well as use benckmarks to fully test the performance of our Timsort implementation in C. To complete the verification framework we consider to build a verified translator from Simpl specifications to C.
References
 [1] de Gouw, S., Rot, J., de Boer, F.S., Bubel, R., Hähnle, R.: Openjdk’s java.utils.collection.sort() is broken: The good, the bad and the worst case. In: Kroening, D., Păsăreanu, C.S. (eds.) Computer Aided Verification. pp. 273–289. Springer International Publishing, Cham (2015)
 [2] Klein, G., Elphinstone, K., Heiser, G., Andronick, J., Cock, D., Derrin, P., Elkaduwe, D., Engelhardt, K., Kolanski, R., Norrish, M., et al.: sel4: Formal verification of an os kernel. In: Proceedings of ACM SIGOPS 22nd Symposium on Operating Systems Principles. pp. 207–220. SOSP’09, ACM Press, Big Sky, Montana, USA (2009)
 [3] Leroy, X.: Formal verification of a realistic compiler. Communications of the ACM 52(7), 107–115 (July 2009)
 [4] Noschinski, L., Rizkallah, C., Mehlhorn, K.: Verification of certifying computations through autocorres and simpl. In: Badger, J.M., Rozier, K.Y. (eds.) NASA Formal Methods. pp. 46–61. Springer International Publishing, Cham (2014)
 [5] Peters, T.: Timsort description in C, http://svn.python.org/projects/python/trunk/Objects/listobject.c
 [6] Sanán, D., Zhao, Y., Hou, Z., Zhang, F., Tiu, A., Liu, Y.: Csimpl: A relyguaranteebased framework for verifying concurrent programs. In: Legay, A., Margaria, T. (eds.) Tools and Algorithms for the Construction and Analysis of Systems. pp. 481–498. Springer Berlin Heidelberg, Berlin, Heidelberg (2017)
 [7] Schirmer, N.: Verification of Sequential Imperative Programs in Isabelle/HOL. Ph.D. thesis, Technischen Universitat Munchen (2006)
 [8] Schirmer, N.: A Sequential Imperative Programming Language Syntax, Semantics, Hoare Logics and Verification Environment. Archive of Formal Proofs (Feb 2008)

[9]
Sternagel, C.: Proof Pearl—A Mechanized Proof of GHC’s Mergesort. Journal of Automated Reasoning
51(4), 357–370 (12 2013)
Comments
There are no comments yet.