1 Introduction
Hoare logic – together with strongest postconditions or weakest preconditions calculi – allow one to verify properties of programs defined by bounded sequences of instructions [20]. Given a precondition satisfied by the inputs of program , algorithms exist to compute the strongest formula such that holds, meaning that if holds initially then is satisfied after is executed, and any formula that holds after is executed is such that . To check that the final state satisfies some formula , we thus only have to check that is a logical consequence of . However, in order to handle programs containing loops, it is necessary to associate each loop occurring within the program with an inductive invariant. An inductive invariant for a given loop is a formula that holds every time the program enters (i.e., it must be a logical consequence of the preconditions of ), and is preserved by the sequence of instructions in . Testing whether a formula is an inductive invariant is a straightforward task, and the difficulty resides in generating candidate invariants. These can be supplied by the programmer, but this is a rather tedious and timeconsuming task; for usability and scalability, it is preferable to generate those formulas automatically when possible. In this paper, we describe a system to generate such invariants in a completely automated way, via abductive reasoning modulo theories, based on the methodology developed in [13]. Roughly speaking, the algorithm works as follows. Given a program decorated with a set of assertions that are to be established, all loops are first assigned the same candidate invariant . These invariants are obviously sound: they hold before the loops and are preserved by the sequence of instructions in the loop; however they are usually not strong enough to prove the assertions decorating the program. They are therefore strengthened by adding hypotheses that are sufficient to ensure that the assertions hold; these hypotheses are generated by a tool that performs abductive inferences, and the strengthened formulas are candidate invariants. Additional strengthening steps are taken to guarantee that these candidates are actual invariants, i.e., that they are preserved by the sequence of instructions in the loop. These steps are iterated until a set of candidate invariants that are indeed inductive is obtained.
We rely on two existing systems to accomplish this task. The first one is Why3 (see, e.g., http://why3.lri.fr/ or [16]), a wellknown and widelyused platform for deductive program verification that is used to compute verification conditions and verify assertions. The second system, GPiD, is designed to generate implicants^{1}^{1}1An implicant of a formula is a formula such that . It is the dual notion of that of implicates of quantifierfree formulas modulo theories [14]. This system is used as an abductive reasoning procedure, thanks to the following property: if , finding a hypothesis such that is equivalent to finding such that . GPiD is generic, since it only relies on the existence of a decision procedure for the considered theory (counterexamples are exploited when available to speedup the generation of the implicants when available). Both systems are connected in the Ilinva framework.
Related Work.
A large number of different techniques have been proposed to generate loop invariants automatically, especially on numeric domains [9, 10], but also in more expressive logics, for programs containing arrays or expressible using combination of theories [26, 8, 23, 18, 22, 24]. We only briefly review the main ideas of the most popular and successful approaches. Methods based on abstract interpretations (see, e.g., [11, 25]
) work by executing the program in a symbolic way, on some abstract domain, and try to compute overestimations of the possible states of the memory after an unbounded number of iterations of the loop. Counterexamples generated from runs can be exploited to refine the considered abstraction
[17, 19]. The idea is that upon detection of a run for which the assertion is violated, if the run does not correspond to a concrete execution path, then the considered abstraction may be refined to dismiss it.Candidate invariants can also be inferred by generating formulas of some userprovided patterns and testing them against some particular executions of the program [15]. Those formulas that are violated in any of the runs can be rejected, and the soundness of the remaining candidates can be checked afterwards. Invariants can be computed by using iterative backward algorithms [27], starting from the postcondition and computing weakest preconditions until a fixpoint is reached (if any). Other approaches [21] have explored the use of quantifier elimination to refine properties obtained using a representation of all execution paths.
The work that is closest to our approach is [13], which presents an algorithm to compute invariants as boolean combinations of linear constraints over integers. The algorithm is similar to ours, and also uses abduction to strengthen candidate invariant so that verification conditions are satisfied. The algorithms differ by the way the verification conditions and abductive hypotheses are proceeded: in our approach the conditions always propagate forward from an invariant to another along execution paths, and we eagerly ensure that all the loop invariants are inductive. Another difference is that we use a completely different technique to perform abductive reasoning: in [13] is based on model construction and quantifier elimination for Presburger arithmetic, whereas our approach uses a generic algorithm, assuming only the existence of a decision procedure for the underlying theory. This permits to generate invariants expressed in theories that are out of the scope of [13].
Contribution.
The main contribution is the implementation of a general framework for the generation of loop invariants, connecting the platform Why3 and GPiD. The evaluation demonstrates that the system permits to generate loop invariants for a wide range of theories, though it suffers from a large search space which may induce a large computation time.
2 Verification Conditions
In what follows, we consider formulas in a base logic expressing properties of the memory and assume that such formulas are closed under the usual boolean connectives. These formulas are interpreted modulo some theory , where denotes logical entailment w.r.t. . The memory is modified by programs, which are sequences of instructions; they are inductively defined as follows:
where , and are programs, is a condition on the state of the memory, is a formula and is an instruction. Assumptions correspond to formulas that are taken as hypotheses, they are mostly useful to specify preconditions. Assertions correspond to formulas that are to be proved. Base instructions are left unspecified, they depend on the target language and application domain; they may include, for instance, assignments and pointer redirection. The formula in the while loop is a candidate loop invariant, it is meant to hold every time condition is tested. In our setting each candidate loop invariant will be set to before invoking Ilinva (except when another formula is provided by, e.g., the user), and the program will iteratively update these formulas. We assume that conditions contain no instructions, i.e., that the evaluation of these conditions does not affect the memory. We write if programs and are identical up to the loop candidate invariants.
An example of a program is provided in Figure 1. It uses assignments on integers and usual constructors and functions on lists as base instructions. It contains one loop with candidate invariant (Line 1) and one assertion (Line 1).
It contains one loop for which we will generate an invariant.
A location is a finite sequence of natural numbers. The empty location is denoted by and the concatenation of two locations and is denoted by . If is a location and is a set of locations then denotes the set . The set of locations in a program or in an instruction is inductively defined as follows:

If is an empty sequence then .

If then .

If is a base instruction or an assumption/assertion, then .

If then .

If then .
For instance, a program where denote base instructions has three locations: (beginning of the program), (between and ) and (end of the program). Note that there are no locations within an atomic instruction. The program in Figure 1 has eight locations, namely , , , , , , , . We denote by the instruction occurring just after location in (if any):

If then , and .

If then and .

If then .
Note that is a partial function, since locations denoting the end of a sequence do not correspond to an instruction. We denote by the set of locations in such that is a loop and by the set of loops occurring in . For instance, if denotes the program in Figure 1, then is , and .
We denote by the usual order on locations: iff either there exist numbers and locations such that , and , or there exists a location such that .
We assume the existence of a procedure that, given a program , generates a set of verification conditions for . These verification conditions are formulas of the form , each of which is meant to be valid. Given a program , the set of conditions can be decomposed as follows:

Assertion conditions, which ensure that the assertion formulas hold at the corresponding location in the program. These conditions also include additional properties to prevent memory access errors, e.g., to verify that the index of an array is within the defined valid range of indexes. The set of assertion conditions for program is denoted by .

Propagation conditions, ensuring that loop invariants do propagate. Given a loop occurring at position in program , we denote by the set of assertions ensuring that the loop invariant for propagates.

Loop preconditions, ensuring that the loop invariants hold when the corresponding loop is entered. Given a loop occurring at position in program , we denote by the set of assertions ensuring that the loop invariant holds before loop is entered.
Thus, Such verification conditions are generally defined using standard weakest precondition or strongest postcondition calculi (see, e.g., [12]), where loop invariant are used as underapproximations. Formal definitions are recalled in Figures 2 and 3 (the definition for the basic instructions depends on the application language and is thus omitted). For the sake of readability, we assume, by a slight abuse of notation, that the condition is also a formula in the base logic.

The formula in the last line states that the loop invariant holds when the loop is entered, that it propagates and that it entails the formula
. The vector
denotes the vector of variables occurring in .This permits to define the goal of the paper in a more formal way: our aim is to define an algorithm that, given a program , constructs a program (i.e., constructs loop invariants for each loop in ) such that only contains valid formulas. Note that all the loops and invariants must be handled globally since verification conditions depend on one another.

describes the state of the memory after . The conditions corresponding to loops are approximated by using the provided loop invariants (the corresponding verification conditions are not stated).
3 Abduction
As mentioned above, abductive reasoning will be performed by generating implicants. Because it would not be efficient to blindly generate all implicants of a formula, this generation is controlled by fixing the literals that can occur in an implicant. We thus consider a set of literals in the considered logic, called the abducible literals.
Definition 1
Let be a formula. An implicant of (modulo ) is a conjunction (or set) of literals such that , for all and .
We use the procedure GPiD described in [14] to generate implicants. A simplified version of this procedure is presented in Algorithm 1. A call to the procedure GPiD() is meant to generate implicants of that: (i) are of the form , for some ; (ii) are as general as possible; and (iii) satisfy property . When itself is not an implicant of , a subset of relevant literals from is computed (Line 1), and for each literal in this subset, a recursive call is made to the procedure after augmenting with this literal and discarding all those that become irrelevant (Lines 1 and 1). In particular, the algorithm is parameterized by an ordering on abducible literals which is used to ensure that sets of hypotheses are explored in a nonredundant way. The algorithm relies on the existence of a decision procedure for testing satisfiability in (Line 1). In practice, this procedure does not need terminating or complete^{2}^{2}2However, Theorem 3.1 only holds if the proof procedure is terminating and complete., e.g., it may be called with a timeout (any “unknown” result is handled as “satisfiable”). At Line 1, a model of the formula is used to prune the search space, by dismissing some abducible literals. In practice, no such model may be available, either because no model building algorithm exists for the considered theory or because of termination issues. In this case, no such pruning is performed. Property denotes an abstract property of sets of literals. It is used to control the form of generated implicants, it is for example possible to force the algorithm to only generate implicants with a fixed maximal size. For Theorem 3.1 to hold, it is simply required that be closed under subsets, i.e., that for all sets of abducible literals and , .
Compared to [14], details that are irrelevant for the purpose of the present paper are skipped and the procedure has been adapted to generate implicants instead of implicates (implicants and implicates are dual notions).
Theorem 3.1 ([14])
The call terminates and returns a set of implicants of satisfying . Further, if is closed under subsets, then for every implicant of satisfying , there exists such that .
This procedure also comes with generic algorithms for pruning redundant implicants i.e., for removing all implicants such that there exist another implicant such that , see [14, Section ].
4 Generating Loop Invariants
In this section, we present an algorithm for the generation of loop invariants. As explained in Section 2, we distinguish between kinds of verification conditions, which will be handled in different ways: assertion and propagation conditions; and loop preconditions. As can be seen from the rules in Figure 2, loop invariants can occur as antecedents in verification conditions, this is typically the case when a loop occurs just before an assertion in some execution path. In such a situation, we say that the considered condition depends on loop . When a condition depends on a loop, a strengthening of the loop invariant of loop yields a strengthening of the hypotheses of the verification condition, i.e., makes the condition less general (easier to prove).
This principle is used in Algorithm 2, which we briefly describe before going into details. Starting with a program in which it is assumed that every loop invariant is inductive, the algorithm attempts to recursively generate invariants that make all assertion conditions in valid. It begins by selecting a nonvalid formula from and a location such that depends on , then generates a set of hypotheses that would make valid (Line 2). For each such hypothesis , a loop location such that is selected, and a formula that is a weakest precondition at causing to hold at location is computed (Line 2). This formula is added to the invariant of the loop at location (Line 2), so that if this invariant was , the new candidate invariant is . If does not hold before entering the loop then is discarded (Line 2); otherwise, the program attempts to update the other loop invariants to ensure that propagates (Line 2). When this succeeds, a recursive call is made with the updated invariants (Line 2) to handle the other nonvalid assertion conditions.
Procedure (invoked Line 2 of Algorithm 2) is described in Algorithm 3. It generates formulas that logically entail ; it is used to generate the candidate hypotheses for strengthening. It first extracts a set of abducible literals by collecting variables and symbols from the program and/or from the theory and combining them to create literals up to a certain depth (procedure GetAbducibles at Line 3). To avoid any redundancy, this task is actually done in two steps: a set of abducible literals for the entire program is initially constructed (this is done once at the beginning of the search), and depending on the considered program location, a subset of these literals is selected. The abducible literals that are logically entailed by modulo are filtered out (Line 3), and procedure GPiD is called to generate implicants of . Finally, implicants are combined to form disjunctive formulas. Note that another way of generating disjunction of literals would be to add these disjunction in the initial set of abducible literals, but this solution would greatly increase the search space.
Each of the hypotheses generated by is used to strengthen the invariant of a loop occurring at position (Line 2 in Algorithm 2). The strengthening formula is computed using the Weakest Precondition Calculus on , on a program obtained from by ignoring all loops between and , since they have corresponding invariants. To this purpose we define a function which, for positions , backpropagates abductive hypotheses from a location to (see Figure 4). This is done by extracting the part of the code between the locations and while ignoring loops, and computing the weakest precondition corresponding to this part of the code and the formula .

denotes the program obtained from by removing all while instructions and denotes the concatenation operator on programs.
The addition of hypothesis to the invariant of the loop at position ensures that the considered assertion holds, but it is necessary to ensure that this strengthened invariant is still inductive. This is done as follows. Line 2 of Algorithm 2 filters away all candidates for which the precondition before entering the loop is no longer valid, and Algorithm 4 ensures that the candidate still propagates. This algorithm behaves similarly to Algorithm 2 (testing the verification conditions in instead of those in ), except that it strengthens the invariants that correspond either to the considered loop, or to other loops occurring within it (in the case of nested loops). Note that in this case, properties must be propagated forward, from location to the actual location of the strengthened invariant, using a Strongest Postcondition Calculus (Function in Figure 4). This technique avoids considering hypotheses that do not propagate.
When applied on the program in Figure 1, Ilinva first sets the initial invariant of the loop to and considers the assertion . As the entailment does not hold, it will call GPiD to get an implicant of . Assume that GPiD returns the (trivial) solution . As indeed holds when the loop is entered^{3}^{3}3This can be checked by computing the weakest precondition of w.r.t. Lines . The obtained formula is which is equivalent to (w.r.t. the usual definitions of and )., Ilinva will add to the invariant of the loop and call Ind. Since does not propagate Ind will further strengthen the invariant, yielding, e.g., the correct solution: .
The efficiency of Algorithm 2 crucially depends on the order in which candidate hypotheses are processed at Line 2
for the strengthening operation. The heuristic used in our current implementation is to try the simplest hypotheses with the highest priority. Abducible atoms are therefore ordered as follows: first boolean variables, then equations between variables of the same sort, then applications of predicate symbols to variables (of the appropriate sorts) and finally deep literals involving function symbols (up to a certain depth). In every case, negative literals are also considered, with the same priority as the corresponding atom. Similarly, unit
implicants are tested before nonunit ones, and single implicants before disjunctions of implicants. In the iteration on line 2 of Algorithm 2, the loops that are closest to the considered assertions are considered first. Due to the number of loops involved, numerous parameters are used to control the application of the procedures, by fixing limits on the number of abducible literals that may be considered and on the maximal size of implicants. When a call to Ilinva fails, these parameters are increased, using an iterative deepening search strategy. The parameter controlling the maximal number of implicants in the disjunctions (currently either or ) is fixed outside of the loop as it has a strong impact on the computation cost.The following theorem states the main properties of the algorithm.
Theorem 4.1
Let be a program such that and are valid for all . If Ilinva () terminates and returns a program other than , then and is valid modulo . Furthermore, if the considered set of abducible literals is finite (i.e., if there exists a finite set such that for all formulas ), then Ilinva () terminates.
Proof
The proof is provided in the extended version^{4}^{4}4TODO: Import the extended version on arxiv and add the link to the extended version here. The proof is by induction on the recursive calls. It is clear that because the algorithm only modifies loop invariants. By construction (Line 2 of Algorithm 2), must be valid when is returned. By hypothesis and are valid, and by definition , thus is valid in this case. Furthermore, it is easy to check, by inspection of Algorithm 4, that all the recursive calls to Ilinva occur on programs such that is valid. Furthermore, all the formulas are also valid, due to the tests at Line 2 in Algorithm 2 and Line 4 in Algorithm 4 (indeed, it is clear that the strengthening of the invariant at location preserves the validity of for ). Thus the precondition above holds for these recursive calls and the result follows by the induction hypothesis.
Termination is immediate since there are only finitely many possible candidate invariants built on , thus the (strict) strengthening relation (formally defined as: ) forms a wellfounded order. At each recursive call, one of the invariants is strictly strengthened and the other ones are left unchanged hence the multiset of invariants is strictly decreasing, according to the multiset extension of the strengthening relation.
5 Implementation
5.1 Overview
The Ilinva algorithm described in Section 4 has been implemented by connecting Why3 with GPiD. A workflow graph of this implementation is shown on Figure 5. The input file (a WhyML program) and a configuration is forwarded to the tool via the command line. The tool then loads this input file within a wrapper where it identifies the candidate loop invariants that may be strengthened by the system. This wrapper will also modify the candidate loop invariants within the program when strengthened, and can export the corresponding file at any time.
The main system then forwards the configuration to the invariant generation algorithm. During the execution of the Ilinva algorithm, the WhyML wrapper is tasked to query Why3 to check whether the latter is able to prove all the assertions of the updated program, and if not to recover the verification conditions that are not satisfied. Selected conditions are transferred to the main generator which will create appropriate abduction tasks for them, ask GPiD for implicants, select the meaningful ones and strengthen associated candidate loop invariants accordingly. When a proof for the verification conditions of the program is found, the file wrapper returns the program updated with the corresponding loop invariants. We also expressed that GPiD candidates can be pruned when they contradict loops initial conditions. Note that both GPiD and Why3 call external SMT solvers to check the satisfiability of formulas. A workflow graph of this implementation is detailed in the extended paper. Note that both systems themselves call external SMT solvers to check the satisfiability of formulas. In particular, the GPiD toolbox is easy to plug to any SMTlib2compliant SMT solver. The framework is actually generic, in the sense that it could be plugged with other systems, both to generate and verify proof obligations and to strengthen loop invariants. It is also independent of the constructions used for defining the language: other constructions (e.g., for loops) can be considered, provided they are handled by the program verification framework.
Given an input program written in WhyML, Why3 generates a verification condition the validity of which will ensure that all the asserted properties are verified (including additional conditions related to, e.g., memory safety) This initial verification condition is split by Why3 into several subtasks. These conditions are enriched with all appropriate information (e.g., theories, axioms,…) and sent to external SMT solvers to check satisfiability. The conditions we are interested in are those linked to the proofs of the program assertions, as well as those ensuring that the candidate loop invariants are inductive. In our implementation, Why3 is taken as a black box, and we merely recover the files that are passed from Why3 to the SMT solvers, together with additional configuration data for the solvers we can extract from Why3. If the proof obligation fails, then we relate the file to the corresponding assertion in the WhyML program and extract the set of abducible literals as explained in Section 4, restricting ourselves to symbols corresponding to WhyML variables, functions and predicates. We then tune the SMTlib2 file to adapt it for computations by GPiD and invoke GPiD with the same SMTsolver as the one used by Why3 to check satisfiability, as the problem is expected to be optimized/configured for it. We also configure GPiD to skip the exploration of subtrees that will produce candidate invariants that do not satisfy the loop preconditions. GPiD returns a stream of solutions to the abductive reasoning problem. We then backwardtranslate the formulas into the WhyML formalism and use them to strengthen loop invariants. For efficiency, the systems run in parallel: the generation of abductive hypotheses (by GPiD, via the procedure Abduce) and their processing in WhyML (via Ilinva) is organized as a pipeline, where new abduction solutions are computed during the processing of the first ones.
To bridge Ilinva and Why3, we had to devise an interface, which is able to analyze WhyML programs and to identify loop locations and the corresponding invariants. It invokes Why3 to generate and prove the associated verification tasks, and it recovers the failed ones. The library also includes tools to extract and modify loop invariants, to extract variables and reference variables in WhyML files, as well as types, predicates and functions, and wrappers to call the Why3 executable and external tools, and to extract the files sent by WhyML to SMTsolvers.
5.2 Distribution
The Abdulot framework is available on GitHub [7]. It contains an revamped interface to the GPiD libraries and algorithm, a generic library of the Ilinva algorithm automatically plugged with GPiD, the code interface for Why3 and the related executables. GPiD interfaces and related executables are generated for CVC4, Z3 and AltErgo ^{5}^{5}5Those are the three solvers the Why3 documentation recommends to work with as an initial setup. (see also http://why3.lri.fr/@External Provers.) via their SMTlib2 interface. Note that the SMT solvers are not provided by our framework, they must be installed separately (all versions with an SMTlib2compatible interface are supported). Additional interfaces and executables can be produced using C++ libraries for MiniSAT, CVC4 and Z3 if their supported version is available^{6}^{6}6The AltErgo interface provided by the tool uses an SMTlib2 interface that is under heavy development and that, in practice, does not work well with the examples we send it..
The framework also provides libraries and toolbox executables to work with abducible files, C++ libraries to handle WhyML files, helpers for the generation of abducible literals out of SMTlib2 files, and an extensive lisp parser. It also includes a documentation, which explains in particular how to extend it to other solvers and program verification framework. All the tools can be compiled using any C++ 11 compliant compiler. The whole list of dependencies is available in the documentation, as well as a dependency graph for the different parts of the framework.
6 Experiments
We use benchmarks collected from several sources [13, 4, 5, 6, 1, 2, 3] (see also [7] for a more detailed view of the benchmark sources), with additional examples corresponding to standard algorithms for division and exponentiation (involving lists, arrays, and non linear arithmetic). Some of these benchmarks have been translated^{7}^{7}7The translation was done by hand. from C or Java into WhyML. In all cases, the initial invariant associated with each loop is . We used Z3 for the benchmarks containing real arithmetic, AltErgo for lists and arrays and CVC4 in all the other cases. All examples are embedded with the source of the Ilinva tool.
6.1 Results
We ran Ilinva on each example, first without disjunctive invariants (i.e., taking in Procedure Abduce) then with disjunctions of size . The results are reported in Figure 6. For each example, we report whether our tool was able generate invariants allowing Why3 to check the soundness of all program assertions before the timeout, in which case we also report the time Ilinva took to do so (columns T(C) when generating conjunctions only and T(D) when generating implicants containing disjunctions). We also report the number of candidate invariants that have been tried (columns C(D) and C(D)) and the number of abducible literals that were sent to the GPiD algorithm (column Abd). Note that the number of candidate invariants does not correspond to the number of SMT calls that are made by the system: those made by GPiD to generate these candidates are not taken into account. The timeout is set to min. For some of the examples that we deemed interesting, we allowed the algorithm to run longer. We report those cases by putting the results between parentheses. Light gray cells indicate that the program terminates before the timeout without returning any solution, and dark gray cells indicate that the timeout was reached. Empty cells mean that the tool could not generate any candidate invariant. The last column of both tables report the time Why3 takes to prove all the assertions of an example when correct invariants are provided.
The tests were performed on a computer powered by a dualcore Intel i5 processor running at 1.3GHz with 4 GB of RAM, under macOS 10.14.3. We used Why3 version 1.2.0 and the SMT solvers AltErgo (version 2.2.0), CVC4 (prerelease of version 1.7) and Z3 (version 4.7.1).
An essential point concerns the handling of local solver timeouts. Indeed, most calls to the SMT solver in the abductive reasoning procedure will involve satisfiable formulas, and the solvers usually take a lot of time to terminate on such formulas (or in the worst case will not terminate at all if the theory is not decidable, e.g., for problems involving firstorder axioms). We thus need to set a timeout after which a call will be considered as satisfiable (see Section 3). Obviously, we neither want this timeout to be too high as it can significantly increase computation time, nor too low, since it could make us miss solutions. We decided to set this timeout to second, independently of the solver used, after measuring the computation time of the Why3 verification conditions already satisfied (for which the solver returns unsat) across all benchmarks. We worked under the assumption that the computation time required to prove the other verification conditions when possible would be roughly similar.
6.2 Discussion
As can be observed, Ilinva is able to generate solutions for a wide range of theories, although the execution time is usually high. The number of invariant candidates is relatively high, which has a major impact on the efficiency and scalability of the approach.
When applied to examples involving arithmetic invariants, the program is rather slow, compared to the approach based on minimization and quantifier elimination [13]. This is not surprising, since it is very unlikely that a purely generic approach based on a modelbased tree exploration algorithm involving many calls to an SMT solver can possibly compete with a more specific procedure exploiting particular properties of the considered theory. We also wish to emphasize that the fact that our framework is based on an external program verification system (which itself calls external solvers) involves a significant overcost compared to a more integrated approach: for instance, for the Oxy examples (taken from [13]), the time used by Why3 to check the verification conditions once the correct invariants have been generated is often greater than the total time reported in [13] for computing the invariants and checking all properties. Of course, our choice also has clear advantages in terms of genericity, generality and evolvability.
When applied to theories that are harder for SMT solvers, the algorithm can still generate satisfying invariants. However, due to the high number of candidates it tries, combined with the heavy time cost of a verification (which can be several seconds), it may take some time to do so.
The number of abducible literals has a strong impact on the efficiency of the process, leading to timeouts when the considered program contains many variables or function/predicate symbols. It can be observed that the abduction depth is rather low in all examples ( or ).
Our prototype has some technical limitations that have a significant impact on the time cost of the execution. For instance, we use SMTlib2 files for communication between GPiD and CVC4 or Z3, instead of using the available APIs. We went back to this solution, which is clearly not optimal for performance, because we experienced many problems coping with the numerous changes in the specifications when updating the solvers to always use the latest versions. The fact that Why3 is taken as a black box also yields some time consumption, first in the (backward and forward) translations (e.g., to associate program variables to their logical counterparts), but also in the verification tasks, which have to be rechecked from the beginning each time an invariant is updated. Our aim in the present paper was not to devise an efficient system, but rather to assess the feasability and usefulness of this approach. Still, the cost of the numerous calls to the SMT solvers and the size of the search tree of the abduction procedure remain the bottleneck of the approach, especially for hard theories (e.g., nonlinear arithmetics) for which most calls with satisfiable formulas yield to a local timeout (see Section 6.1).
7 Conclusion and Future Work
By combining our generic system GPiD for abductive reasoning modulo theories with the Why3 platform to generate verification conditions, we obtained a tool to check properties of WhyML programs, which is able to compute loop invariants in a purely automated way.
The main drawback of our approach is that the set of possible abducible literals is large, yielding a huge search space, especially if disjunctions of implicants are considered. Therefore, we believe that our system in its current state is mainly useful when used in an interactive way. For instance, the user could provide the properties of interest for some of the loops and let the system automatically compute suitable invariants by combining these properties, or the program could rely on the user to choose between different solutions to the abduction problem before applying the strengthening. Beside, it is also useful for dealing with theories for which no specific abductive reasoning procedure exists, especially for reasoning in the presence of userdefined symbols or axioms.
In the future, we will focus on the definition of suitable heuristics for automatically selecting abducible literals and ordering them, to reduce the search space and avoid backtracking. The number of occurrences of symbols should be taken into account, as well as properties propagating from previous invariant strengthening. A promising approach is to use dynamic program analysis tools to select relevant abducibles. It would also be interesting to adapt the GPiD algorithm to explore the search space widthfirst, to ensure that simplest solutions are always generated first. Another option is to give Ilinva a more precise control on the GPiD algorithm, e.g., to explore some branches more deeply, based on information related to the verification conditions. GPiD could also be tuned to generate disjunctions of solutions in a more efficient way.
From a more technical point of view, a tighter integration with the Why3 platform would certainly be beneficial, as explained in Section 6.2. The framework could be extended to handle procedures and functions (with pre and post conditions).
A current limitation of our tool is that it cannot handle problems in which Why3 relies on a combination of different solvers to check the desired properties. In this case, Ilinva cannot generate the invariants, as the same SMT solver is used for each abduction problem (trying all solvers in parallel on every problem would be possible in theory but this would greatly increase the search space). This problem could be overcome by using heuristic approaches to select the most suited solver for a given problem.
From a theoretical point of view, it would be interesting to investigate the completeness of our approach. It is clear that no general completeness result possibly holds, due to usual theoretical limitations, however, if we assume that a program such that is valid exists, does the call always succeed? This of course would require that the invariants in can be constructed from abducibles occurring in the set returned by the procedure GetAbducibles.
References
 [1] http://toccata.lri.fr/gallery/.
 [2] http://pauillac.inria.fr/~levy//why3/sorting/.
 [3] https://www.lri.fr/~sboldo/research.html.
 [4] Invgen tool. http://pub.ist.ac.at/ agupta/invgen/.
 [5] Neclabs necla verification benchmarks. http://www.neclabs.com/research/system/systems SAV website/benchmarks.php.
 [6] Satconv benchmarks.
 [7] Abdulot framework/GPiDIlinva tool suite. https://github.com/sellamiy/GPiDFramework.
 [8] D. Beyer, T. A. Henzinger, R. Majumdar, and A. Rybalchenko. Invariant synthesis for combined theories. In Verification, Model Checking, and Abstract Interpretation, 8th International Conference, VMCAI 2007, Nice, Proceedings, 2007.
 [9] A. R. Bradley. IC3 and beyond: Incremental, inductive verification. In Computer Aided Verification  24th International Conference, CAV 2012, Berkeley, CA, USA, July 713, 2012 Proceedings, page 4, 2012.
 [10] A. R. Bradley and Z. Manna. Propertydirected incremental invariant generation. Formal Asp. Comput., 20(45):379–405, 2008.
 [11] P. Cousot and N. Halbwachs. Automatic discovery of linear restraints among variables of a program. In Proceedings of the 5th ACM SIGACTSIGPLAN Symposium on Principles of Programming Languages, POPL ’78, New York, 1978. ACM.
 [12] E. W. Dijkstra. A Discipline of Programming. Prentice Hall PTR, Upper Saddle River, NJ, USA, 1st edition, 1997.
 [13] I. Dillig, T. Dillig, B. Li, and K. L. McMillan. Inductive invariant generation via abductive inference. In A. L. Hosking, P. T. Eugster, and C. V. Lopes, editors, Proceedings of OOPSLA 2013, Indianapolis, pages 443–456. ACM, 2013.
 [14] M. Echenim, N. Peltier, and Y. Sellami. A generic framework for implicate generation modulo theories. In D. Galmiche, S. Schulz, and R. Sebastiani, editors, IJCAR 2018, Oxford, volume 10900 of LNCS, pages 279–294. Springer, 2018.
 [15] M. D. Ernst, J. Cockrell, W. G. Griswold, and D. Notkin. Dynamically discovering likely program invariants to support program evolution. In Proceedings of the 21st International Conference on Software Engineering, ICSE ’99, pages 213–224, New York, NY, USA, 1999. ACM.
 [16] J.C. Filliâtre and A. Paskevich. Why3 — where programs meet provers. In M. Felleisen and P. Gardner, editors, Proceedings of the 22nd European Symposium on Programming, volume 7792 of Lecture Notes in Computer Science, pages 125–128. Springer, Mar. 2013.
 [17] C. Flanagan and K. R. M. Leino. Houdini, an annotation assistant for esc/java. In J. N. Oliveira and P. Zave, editors, FME 2001: Formal Methods for Increasing Software Productivity, pages 500–517, Berlin, Heidelberg, 2001. Springer Berlin Heidelberg.
 [18] S. Ghilardi and S. Ranise. Backward reachability of arraybased systems by SMT solving: Termination and invariant synthesis. Logical Methods in Computer Science, 6(4), 2010.
 [19] T. A. Henzinger, R. Jhala, R. Majumdar, and G. Sutre. Software verification with blast. In T. Ball and S. K. Rajamani, editors, Model Checking Software, pages 235–239, Berlin, Heidelberg, 2003. Springer Berlin Heidelberg.
 [20] C. A. R. Hoare. An axiomatic basis for computer programming. Commun. ACM, 12(10):576–580, Oct. 1969.
 [21] D. Kapur. A quantifierelimination based heuristic for automatically generating inductive assertions for programs. J. Systems Science & Complexity, 19, 2006.
 [22] A. Karbyshev, N. Bjørner, S. Itzhaky, N. Rinetzky, and S. Shoham. Propertydirected inference of universal invariants or proving their absence. J. ACM, 64(1):7:1–7:33, 2017.
 [23] L. Kovács and A. Voronkov. Finding loop invariants for programs over arrays using a theorem prover. In Fundamental Approaches to Software Engineering, 12th International Conference, FASE 2009, Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2009, York, UK, March 2229, 2009. Proceedings, pages 470–485, 2009.
 [24] L. Kovács and A. Voronkov. Interpolation and symbol elimination. In Automated Deduction  CADE22, 22nd International Conference on Automated Deduction, Montreal, Canada, August 27, 2009. Proceedings, pages 199–213, 2009.
 [25] A. Miné. The octagon abstract domain. Higher Order Symbol. Comput., 19, 2006.
 [26] O. Padon, N. Immerman, S. Shoham, A. Karbyshev, and M. Sagiv. Decidability of inferring inductive invariants. In Proceedings of the 43rd Annual ACM SIGPLANSIGACT Symposium on Principles of Programming Languages, POPL 2016, St. Petersburg, FL, USA, January 20  22, 2016, pages 217–231, 2016.
 [27] N. Suzuki and K. Ishihata. Implementation of an array bound checker. 1977.