Identifying Overly Restrictive Matching Patterns in SMT-based Program Verifiers

05/10/2021
by   Alexandra Bugariu, et al.
ETH Zurich
0

Universal quantifiers occur frequently in proof obligations produced by program verifiers, for instance, to axiomatize uninterpreted functions and to express properties of arrays. SMT-based verifiers typically reason about them via E-matching, an SMT algorithm that requires syntactic matching patterns to guide the quantifier instantiations. Devising good matching patterns is challenging. In particular, overly restrictive patterns may lead to spurious verification errors if the quantifiers needed for a proof are not instantiated; they may also conceal unsoundness caused by inconsistent axiomatizations. In this paper, we present the first technique that identifies and helps the users remedy the effects of overly restrictive matching patterns. We designed a novel algorithm to synthesize missing triggering terms required to complete a proof. Tool developers can use this information to refine their matching patterns and prevent similar verification errors, or to fix a detected unsoundness.

READ FULL TEXT VIEW PDF

page 1

page 2

page 3

page 4

02/08/2021

From Matching Logic To Parallel Imperative Language Verification

Program verification is to develop the program's proof system, and to pr...
03/17/2018

Meta-F*: Metaprogramming and Tactics in an Effectful Program Verifier

Verification tools for effectful programming languages often rely on aut...
10/10/2020

Cuvée: Blending SMT-LIB with Programs and Weakest Preconditions

Cuvée is a program verification tool that reads SMT-LIB-like input files...
12/30/2020

Quantitative Corner Case Feature Analysis of Hybrid Automata with ForFET^SMT

The analysis and verification of hybrid automata (HA) models against ric...
08/21/2022

A Failed Proof Can Yield a Useful Test

A successful automated program proof is, in software verification, the u...
09/09/2019

Structural and semantic pattern matching analysis in Haskell

Haskell functions are defined as a series of clauses consisting of patte...
01/27/2019

Subsumption of Weakly Well-Designed SPARQL Patterns is Undecidable

Weakly well-designed SPARQL patterns is a recent generalisation of well-...

1 Introduction

Proof obligations frequently contain universal quantifiers, both in the specification and to encode the semantics of the programming language. Most deductive verifiers [1, 3, 6, 9, 13, 22, 36] rely on SMT solvers to discharge the proof obligations via E-matching [12]. This SMT algorithm requires syntactic matching patterns of ground terms (called patterns in the following), to control the instantiations. The pattern in the formula instructs the solver to instantiate the quantifier only when it finds a triggering term that matches the pattern, e.g., . The patterns can be written manually or inferred automatically. However, devising them is challenging [19, 23]. Too permissive patterns may lead to unnecessary instantiations that slow down verification or even cause non-termination (if each instantiation produces a new triggering term, in a so-called matching loop [12]). Overly restrictive patterns may prevent the instantiations needed to complete a proof; they cause two major problems in program verification, incompleteness and undetected unsoundness.

Incompleteness.

Overly restrictive patterns may cause spurious verification errors when the proof of valid proof obligations fails. Fig. 1 illustrates this case. The integer x represents the address of a node, and the uninterpreted functions len and nxt encode operations on linked lists. The axiom defines len: its result is positive and the last node points to itself. The assertion directly follows from the axiom, but the proof fails because the proof obligation does not contain the triggering term len(nxt(7)); thus, the axiom does not get instantiated. However, realistic proof obligations often contain hundreds of quantifiers [33], which makes the manual identification of missing triggering terms extremely difficult.

function len(x: int): int;
function nxt(x: int): int;
axiom (forall x: int :: {len(nxt(x))}
        len(x) > 0 && (nxt(x) == x ==> len(x) == 1) &&
        (nxt(x) != x ==> len(x) == len(nxt(x)) + 1));
procedure trivial() { assert len(7) > 0; }
Figure 1: Example (in Boogie [5]) that leads to a spurious error. The assertion follows from the axiom, but the axiom does not get instantiated without a triggering term.
Unsoundness.

Most of the universal quantifiers in proof obligations appear in axioms over uninterpreted functions (to encode type information, heap models, datatypes, etc.). To obtain sound results, these axioms must be consistent (i.e., satisfiable); otherwise all proof obligations hold trivially. Consistency can be proved once and for all by showing the existence of a model, as part of the soundness proof. However, this solution is difficult to apply for those verifiers which generate axioms dynamically, depending on the program to be verified. Proving consistency then requires verifying the algorithm that generates the axioms for all possible inputs, and needs to consider many subtle issues [10, 29, 20].

A more practical approach is to check if the axioms generated for a given program are consistent. However, this check also depends on triggering: an SMT solver may fail to prove unsat if the triggering terms needed to instantiate the contradictory axioms are missing. The unsoundness can thus remain undetected.

Figure 2: Fragment of an old version of Dafny’s sequence axiomatization. and are uninterpreted types. All the named functions are uninterpreted. To improve readability, we use mathematical notation throughout this paper instead of SMT-LIB syntax [8].

For example, Dafny’s [22] sequence axiomatization from June 2008 contained an inconsistency found only over a year later. A fragment of this axiomatization is shown in Fig. 2. It expresses that empty sequences and sequences obtained through the operation are well-typed (), that the length of a type-correct sequence must be non-negative (), and that constructs a new sequence of the required length (). The intended behavior of is to update the element at index in sequence to . However, since there are no constraints on the parameter , can be used with a negative length, leading to a contradiction with . This error cannot be detected by checking the satisfiability of the formula , as no axiom gets instantiated.

This work.

For SMT-based deductive verifiers, discharging proof obligations and revealing inconsistencies in axiomatizations require a solver to prove unsat via E-matching. (Verification techniques based on proof assistants are out of scope.) Given an SMT formula for which E-matching yields unknown due to insufficient quantifier instantiations, our technique generates suitable triggering terms that allow the solver to complete the proof. These terms enable users to understand and remedy the revealed completeness or soundness issue. Since the SMT queries for the verification of different input programs are typically very similar, fixing such issues benefits the verification of many or even all future runs of the verifier.

Fixing the incompleteness.

For Fig. 1, our technique finds the triggering term len(nxt(7)), which allows one to fix the incompleteness. Tool users (who cannot change the axioms) can add the term to the program; e.g., adding var t: int; t := len(nxt(7)) before the assertion has no effect on the execution, but triggers the instantiation of the axiom. Tool developers can devise less restrictive patterns. For instance, they can move the conjunct len(x) >  to a separate axiom with the pattern \{len(x)\} (simply changing the axiom’s pattern to \{len(x)\} would cause matching loops). Alternatively, tool developers can adapt the encoding to emit additional triggering terms enforcing certain instantiations [17, 19].

Fixing the unsoundness.

In Fig. 2, our triggering term (for a fresh value ) is sufficient to detect the unsoundness (as shown in Appx. A). Tool developers can use this information to add a precondition to , which prevents the construction of sequences with negative lengths.

Soundness modulo patterns.

Fig. 3 illustrates another scenario: Boogie’s [5] map axiomatization is inconsistent by design at the SMT level [21], but this behavior cannot be exposed from Boogie, as the type system prevents the required instantiations. Thus it does not affect Boogie’s soundness. It is nevertheless important to detect it because it could surface if Boogie was extended to support quantifier instantiation algorithms that are not based on E-matching (such as MBQI [16]) or first-order provers. They could unsoundlyclassify an incorrect program that uses this map axiomatization as correct. Since states that storing a key-value pair into a map results in a new map with a potentially different type, one can prove that two different types (e.g., Boolean and Int) are equal in SMT. This example shows that the problems tackled in this paper cannot be solved by simply switching to other instantiation strategies: these are not the preferred choices of most verifiers [1, 3, 6, 9, 13, 22, 36], and may produce unsound results for verifiers designed for E-matching with axiomatizations sound only modulo patterns.

Figure 3: Fragment of Boogie’s map axiomatization, which is sound only modulo patterns. and are uninterpreted types. All the named functions are uninterpreted.
Contributions.

This paper makes the following technical contributions:

  1. We present the first automated technique that allows the developers to detect completeness issues in program verifiers and soundness problems in their axiomatizations. Moreover, our approach helps them devise better triggering strategies for all future runs of their tool with E-matching.

  2. We developed a novel algorithm for synthesizing the triggering terms necessary to complete unsatisfiability proofs using E-matching. Since quantifier instantiation is undecidable for first-order formulas over uninterpreted functions, our algorithm might not terminate. However, all identified triggering terms are indeed sufficient to complete the proof; there are no false positives.

  3. We evaluated our technique on benchmarks with known triggering problems from four program verifiers. Our experimental results show that it successfully synthesized the missing triggering terms in 65,6% of the cases, and can significantly reduce the human effort in localizing and fixing the errors.

Outline.

The rest of the paper is organized as follows: Sec. 2 gives an overview of our technique; the details follow in Sec. 3. In Sec. 4, we present our experimental results. We discuss related work in Sec. 5, and conclude in Sec. 6.

2 Overview

Figure 4: Main steps of our algorithm that helps the developers of program verifiers devise better triggering strategies. Rounded boxes depict processing steps and arrows data.

Our goal is to synthesize missing triggering terms, i.e., concrete instantiations for (a small subset of) the quantified variables of an input formula I, which are necessary for the solver to prove its unsatisfiablity. Intuitively, these triggering terms include counter-examples to the satisfiability of I and can be obtained from a model of its negation. For example, is unsatisfiable, and a counter-example is a model of its negation .

However, this idea does not apply to formulas over uninterpreted functions, which are common in proof obligations. The negation of , where is an uninterpreted function, is . This is a second-order constraint (it quantifies over functions), and cannot be encoded in SMT, which supports only first-order logic. We thus take a different approach.

Let be a second-order formula. We define its approximation as:

(*)

where are uninterpreted functions. The approximation considers only one interpretation, not all possible interpretations for each uninterpreted function.

We therefore construct a candidate triggering term from a model of and check if it is sufficient to prove that I is unsatisfiable (due to the approximation, a model is no longer guaranteed to be a counter-example for the original formula).

The four main steps of our algorithm are depicted in Fig. 4. The algorithm is stand-alone, i.e., not integrated into, nor dependent on any specific SMT solver. We illustrate it on the inconsistent axioms from Fig. 5 (which we assume are part of a larger axiomatization). To show that is unsatisfiable, the solver requires the triggering term . The corresponding instantiations of and generate contradictory constraints: and . In the following, we explain how we obtain this triggering term systematically.

Figure 5: Formulas that set contradictory constraints on the function . Synthesizing the triggering term requires theory reasoning and syntactic term unification.
Step 1: Clustering.

As typical proof obligations or axiomatizations contain hundreds of quantifiers, exploring combinations of triggering terms for all of them does not scale. To prune the search space, we exploit the fact that I is unsatisfiable only if there exist instantiations of some (in the worst case all) of its quantified conjuncts such that they produce contradictory constraints on some uninterpreted functions. (If there is a contradiction among the quantifier-free conjuncts, the solver will detect it directly.) We identify clusters of formulas that share function symbols and then process each cluster separately. In Fig. 5, and share the function symbol , so we build the cluster .

Step 2: Syntactic unification.

The formulas within clusters usually contain uninterpreted functions applied to different arguments (e.g., is applied to in and to in ). We thus perform syntactic unification to identify sharing constraints on the quantified variables (which we call rewritings and denote their set by ) such that instantiations that satisfy these rewritings generate formulas with common terms (on which they might set contradictory constraints). and share the term if we perform the rewritings .

Step 3: Identifying candidate triggering terms.

The cluster from step 1 contains a contradiction if there exists a formula in such that: (1) is unsatisfiable by itself, or (2) contradicts at least one other formula from .

To address scenario (1), we ask an SMT solver for a model of the formula , where is defined in (*2) above. After Skolemization, is quantifier-free, so the solver is generally able to provide a model if one exists. We then obtain a candidate triggering term by substituting the quantified variables from the patterns of the formulas in with their corresponding values from the model.

However, scenario (1) is not sufficient to expose the contradiction from Fig. 5, since both and are individually satisfiable. Our algorithm thus also derives stronger formulas corresponding to scenario (2). That is, it will next consider the case where contradicts , whose encoding into first-order logic is: , where is the set of rewritings identified in step 2, used to connect the quantified variables. This formula is universally-quantified (since is), so the solver cannot prove its satisfiability and generate models. We solve this problem by requiring to contradict the instantiation of , which is a weaker constraint. Let be an arbitrary formula. We define its instantiation as:

(**)

where are variables. Then is equivalent to . (To simplify the notation, here and in the following formulas, we omit existential quantifiers.) All its models set to 7. Substituting by (according to ) and by 7 (its value from the model) in the patterns of and yields the candidate triggering term .

Step 4: Validation.

Once we have found a candidate triggering term, we add it to the original formula I (wrapped in a fresh uninterpreted function, to make it available to E-matching, but not affect the input’s satisfiability) and check if the solver can prove unsat. If so, our algorithm terminates successfully and reports the synthesized triggering term (after a minimization step that removes unnecessary sub-terms); otherwise, we go back to step 3 to obtain another candidate. In our example, the triggering term is sufficient to complete the proof.

3 Synthesizing Triggering Terms

Next, we define the input formulas (Sec. 3.1), explain the details of our algorithm (Sec. 3.2) and discuss its limitations (Sec. 3.3). Appx. C and Appx. E present extensions that enable complex proofs and optimizations used in Sec. 4.

3.1 Input formula

To simplify our algorithm, we pre-process the inputs (i.e., the proof obligations or the axioms of a verifier): we Skolemize existential quantifiers and transform all propositional formulas into negation normal form (NNF), where negation is applied only to literals and the only logical connectives are conjunction and disjunction; we also apply the distributivity of disjunction over conjunction and split conjunctions into separate formulas. These steps preserve satisfiability and the semantics of patterns (Appx. E addresses scalability issues). The resulting formulas follow the grammar in Fig. 6. Literals may include interpreted and uninterpreted functions, variables and constants. Free variables are nullary functions. Quantified variables can have interpreted or uninterpreted types, and the pre-processing ensures that their names are globally unique. We assume that each quantifier is equipped with a pattern (if none is provided, we run the solver to infer one). Patterns are combinations of uninterpreted functions and must mention all quantified variables. Since there are no existential quantifiers after Skolemization, we use the term quantifier to denote universal quantifiers.

Figure 6: Grammar of input formulas I. Inputs are conjunctions of formulas , which are (typically quantified) disjunctions of literals ( or ) or nested quantified formulas. Each quantifier is equipped with a pattern . denotes a (non-empty) list of variables.

3.2 Algorithm

Arguments : I --- input formula, also treated as set of conjuncts
 --- similarity threshold for clustering
 --- maximum depth for clustering
 --- maximum number of different models
Result: The synthesized triggering term or None, if no term was found
1 Procedure synthesizeTriggeringTerm
2       foreach depth  do
3             foreach  I | is  do
4                   foreach  clustersRewritings(I, depth) do // Steps 1,2
5                         Inst foreach  | is or  do
6                               Inst[] |
7                        Inst[] foreach Inst[]  do // Cartesian product
8                               foreach m  do
9                                     resG, model checkSat() if resG SAT   then
                                           break   // No models if is not SAT
10                                          
                                     candidateTerm(, , model)   // Step 3
                                     resI, _ checkSat(I )   // Step 4
11                                     if resI UNSAT then
                                           return minimized()   // Success
12                                          
                                     model   // Avoid this model next iteration
13                                    
14      return None
Algorithm 1 Our algorithm for synthesizing triggering terms that enable unsatisfiability proofs. We assume that all quantified variables are globally unique and I does not contain nested quantifiers. The auxiliary procedures clustersRewritings and candidateTerm are presented in Alg. 2 and Alg. 3.

The pseudo-code of our algorithm is given in Alg. 1. It takes as input an SMT formula I (defined in Fig. 6), which we treat in a slight abuse of notation as both a formula and a set of conjuncts. Three other parameters allow us to customize the search strategy and are discussed later. The algorithm yields a triggering term that enables the unsat proof, or None, if no term was found. We assume here that I contains no nested quantifiers and present those later in this section.

The algorithm iterates over each quantified conjunct of I (Alg. 1, line 3) and checks if is individually unsatisfiable (for depth = ). For complex proofs, this is usually not sufficient, as I is typically inconsistent due to a combination of conjuncts ( in Fig. 5). In such cases, the algorithm proceeds as follows:

Arguments : I --- input formula, also treated as set of conjuncts
 — quantified conjunct of I, i.e., I | is
 --- similarity threshold for clustering
depth --- current depth for clustering
Result: A set of pairs, consisting of clusters and their corresponding rewritings
1 Procedure clustersRewritings
2       if depth = 0 then
3             return
      simFormulas and (F, f, )}   // Step 1
5      4 rewritings foreach  simFormulas do
             rws unify()   // Step 2
6             if rws and is  then
7                   simFormulas simFormulas
8            rewritings[] rws
      return {() | simFormulas and rewritings[]                       and qvars(C): and lhs(r)
Algorithm 2 Auxiliary procedure for Alg. 1, which identifies clusters of formulas similar to and their rewritings. sim is defined in text (step 1). unify is a first-order unification algorithm (not shown); it returns a set of rewritings with restricted shapes, defined in text (step 2).
Step 1: Clustering.

It constructs clusters of formulas similar to (Alg. 2, line 4), based on their Jaccard similarity index. Let and be two arbitrary formulas, and and their respective sets of uninterpreted function symbols (from their bodies and the patterns). The Jaccard similarity index is defined as:

(the number of common uninterpreted functions divided by the total number). For Fig. 5, , , .

Our algorithm explores the search space by iteratively expanding clusters to include transitively-similar formulas up to a maximum depth (parameter in Alg. 1). For two formulas , we define the similarity function as:

where is a similarity threshold used to parameterize our algorithm.

The initial cluster () includes all the conjuncts of I that are directly similar to . Each subsequent iteration adds the conjuncts that are directly similar to an element of the cluster from the previous iteration, that is, transitively similar to . This search strategy allows us to gradually strengthen the formulas (used to synthesize candidate terms in step 3) without overly constraining them (an over-constrained formula is unsatisfiable, and has no models).

Step 2: Syntactic unification.

Next (Alg. 2, line 8) we identify rewritings, i.e., constraints under which two similar quantified formulas share terms. (Appx. D presents the quantifier-free case.) We obtain the rewritings by performing a simplified form of syntactic term unification, which reduces their number to a practical size. Our rewritings are directed equalities. For two formulas and and an uninterpreted function  they have one of the following two shapes:
     (1) , where is a quantified variable of , are terms from defined below, contains a term and contains a term ,
     (2) , where is a quantified variable of , are terms from defined below, contains a term and contains a term ,

where is a constant , a quantified variable , or a composite function occurring in the formula and are arbitrary (interpreted or uninterpreted) functions. That is, we determine the most general unifier [4] only for those terms that have uninterpreted functions as the outer-most functions and quantified variables as arguments. The unification algorithm is standard (except for the restricted shapes), so it is not shown explicitly.

Since a term may appear more than once in , or unifies with multiple similar formulas through the same quantified variable, we can obtain alternative rewritings for a quantified variable. In such cases, we either duplicate or split the cluster, such that in each cluster-rewriting pair, each quantified variable is rewritten at most once (see Alg. 2, line 12). In Fig. 7, both and are similar to (all three formulas share the uninterpreted symbol ). Since the unification produces alternative rewritings for ( and ), the procedure clustersRewritings returns the pairs .

Figure 7: Formulas that set contradictory constraints on the function . Synthesizing the triggering term requires clusters of similar formulas with alternative rewritings.
Step 3: Identifying candidate terms.

From the clusters and the rewritings (identified before), we then derive quantifier-free formulas (Alg. 1, line 10), and, if they are satisfiable, construct the candidate triggering terms from their models (Alg. 1, line 15). Each formula consists of: (1)  (defined in (*2), which is equivalent to , since has the shape from Alg. 1, line 3), (2) the instantiations (see (**2)) of all the similar formulas from the cluster, and (3) the corresponding rewritings . (Since we assume that all the quantified variables are globally unique, we do not perform variable renaming for the instantiations).

If a similar formula has multiple disjuncts , the solver uses short-circuiting semantics when generating the model for . That is, if it can find a model that satisfies the first disjunct, it does not consider the remaining ones. To obtain more diverse models, we synthesize formulas that cover each disjunct, i.e., make sure that it evaluates to at least once. We thus compute multiple instantiations of each similar formula, of the form: (see Alg. 1, line 7). To consider all the combinations of disjuncts, we derive the formula from the Cartesian product of the instantiations (Alg. 1, line 9). (To present the pseudo-code in a concise way, we store in the instantiations map as well (Alg. 1, line 8), even if it does not represent the instantiation of .)

In Fig. 8, is similar to and . has two disjuncts and thus two possible instantiations: . The formula for the first instantiation is satisfiable, but none of the values the solver can assign to (which are all greater or equal to ) are sufficient for the unsatisfiability proof to succeed. The second instantiation adds additional constraints: instead of , it requires (. The resulting formula has a unique solution for , namely 0, and the triggering term is sufficient to prove unsat.

Figure 8: Formulas that set contradictory constraints on the function . Synthesizing the triggering term requires instantiations that cover all the disjuncts.
Arguments :  --- set of formulas in the cluster
 --- set of rewritings for the cluster
model --- SMT model, mapping variables to values
Result: A triggering term with no semantic information
1 Procedure candidateTerm
2       patterns() while   do
3             choose and remove from
4      foreach  qvars(C) do
5             model(x)
6      return "dummy" + "(" + + ")"
Algorithm 3 Auxiliary procedure for Alg. 1, which constructs a triggering term from the given cluster, rewritings, and SMT model. dummy is a fresh function symbol, which conveys no information about the truth value of the candidate term; thus conjoining it to the input preserves (un)satisfiability.

The procedure candidateTerm from Alg. 3 synthesizes a candidate triggering term from the model of and the rewritings . We first collect all the patterns of the formulas from the cluster (Alg. 3, line 2), i.e., of and of its similar conjuncts (see Alg. 1, line 15). Then, we apply the rewritings, in an arbitrary order (Alg. 3, lines 3–6). That is, we substitute the quantified variable from the left hand side of the rewriting with the right hand side term and propagate this substitution to the remaining rewritings. This step allows us to include in the synthesized triggering terms additional information, which cannot be provided by the solver. Then (Alg. 3, lines 7–8) we substitute the remaining variables with their constant values from the model (i.e., constants for built-in types, and fresh, unconstrained variables for uninterpreted types). The resulting triggering term is wrapped in an application to a fresh, uninterpreted function dummy to ensure that conjoining it to I does not change I’s satisfiability.

Step 4: Validation.

We validate the candidate triggering term by checking if is unsatisfiable, i.e., if these particular interpretations for the uninterpreted functions generalize to all interpretations (Alg. 1, line 16). If this is the case then we return the minimized triggering term (Alg. 1, line 18). The function has multiple arguments, each of them corresponding to one pattern from the cluster (Alg. 3, line 9). This is an over-approximation of the required triggering terms (once instantiated, the formulas may trigger each other), so minimized removes redundant (sub-)terms. If does not validate, we re-iterate its construction up to a bound and strengthen the formula to obtain a different model (Alg. 1, lines 19 and 11). Appx. B

discusses heuristics for obtaining

diverse models.

Nested quantifiers.

Our algorithm also supports nested quantifiers. Nested existential quantifiers in positive positions and nested universal quantifiers in negative positions are replaced in NNF by new, uninterpreted Skolem functions. Step 2 is also applicable to them: Skolem functions with arguments (the quantified variables from the outer scope) are unified as regular uninterpreted functions; they can also appear as in a rewriting, but not as the left-hand side (we do not perform higher-order unification). In such cases, the result is imprecise: the unification of and produces only the rewriting .

After pre-processing, the conjunct and the similar formulas may still contain nested universal quantifiers. is always negated in , thus it becomes, after Skolemization, quantifier-free. To ensure that is also quantifier-free (and the solver can generate a model), we extend the algorithm to recursively instantiate similar formulas with nested quantifiers when computing the instantiations.

3.3 Limitations

Next, we discuss the limitations of our technique, as well as possible solutions.

Applicability.

Our algorithm effectively addresses a common cause of failed unsatisfiability proofs in program verification, i.e., missing triggering terms. Other causes (e.g., incompleteness in the solver’s decision procedures due to undecidable theories) are beyond the scope of our work. Also, our algorithm is tailored to unsatisfiability proofs; satisfiability proofs cannot be reduced to unsatisfiability proofs by negating the input, because the negation cannot usually be encoded in SMT (as we have illustrated in Sec. 2).

SMT solvers.

Our algorithm synthesizes triggering terms as long as the SMT solver can find models for our quantifier-free formulas. However, solvers are incomplete, i.e., they can return unknown and generate only partial models, which are not guaranteed to be correct. Nonetheless, we also use partial models, as the validation step (step 4 in Fig. 4) ensures that they do not lead to false positives.

Patterns.

Since our algorithm is based on patterns (provided or inferred), it will not succeed if they do not permit the necessary instantiations. For example, the formula is unsatisfiable. However, the SMT solver cannot automatically infer a pattern from the body of the quantifier, since equality is an interpreted function and must not occur in a pattern. Thus E-matching (and implicitly our algorithm) cannot solve this example, unless the user provides as pattern some uninterpreted function that mentions both and (e.g., ).

Bounds and rewritings.

Synthesizing triggering terms is generally undecidable. We ensure termination by bounding the search space through various customizable parameters, thus our algorithm misses results not found within these bounds. We also only unify applications of uninterpreted functions, which are common in verification. Efficiently supporting interpreted functions (especially equality) is very challenging for inputs with a small number of types (e.g., from Boogie [5]).

Despite these limitations, our algorithm effectively synthesizes the triggering terms required in practical examples, as we experimentally show next.

4 Evaluation

Evaluating our work requires benchmarks with known triggering issues (i.e., for which E-matching yields unknown). Since there is no publicly available suite, in Sec. 4.1 we used manually-collected benchmarks from four verifiers [22, 35, 39, 24]. Our algorithm succeeded for 65,6%. To evaluate its applicability to other verifiers, in Sec. 4.2 we used SMT-COMP [33] inputs. As they were not designed to expose triggering issues, we developed a filtering step (see Appx. F) to automatically identify the subset that falls into this category. The results show that our algorithm is suited also for [6, 31, 9]. Sec. 4.3 illustrates that our triggering terms are simpler than the unsat proofs produced by quantifier instantation and refutation techniques, enabling one to fix the root cause of the revealed issues.

Setup.

We used Z3 (4.8.10) [11] to infer the patterns, generate the models and validate the candidate terms. However, our tool can be used with any solver that supports E-matching and exposes the inferred patterns. We used Z3’s NNF tactic to transform the inputs into NNF and locality-sensitive hashing to compute the clusters. We fixed Z3’s random seeds to arbitrary values (sat.random_seed to 488, smt.random_seed to 599, and nlsat.seed to 611). We set the (soft) timeout to 600s and the memory limit to 6 GB per run and used a 1s timeout for obtaining a model and for validating a candidate term. The experiments were conducted on a Linux server with 252 GB of RAM and 32 Intel Xeon CPUs at 3.3 GHz.

4.1 Effectiveness on verification benchmarks with triggering issues

C0 C1 C2 C3 C4 Our Z3 CVC4 Vampire
Source min-max min-max default type sub work MBQI enum inst CASCZ3

Dafny
4 6 - 16 5 - 16 1 1 1 1 0 1 1 0 2

F*
2 18 - 2388 15 - 2543 1 1 1 1 2 2 1 0 2

Gobra
11 64 - 78 50 - 63 5 10 1 7 10 11 6 0 11

Viper
15 84 - 143 68 - 203 7 5 3 5 5 7 11 0 15


Total 32 21 (65,6%) 19 (59,3%)     0 (0%) 30 (93,7%)

= similarity threshold; = batch size; type = type-based constraints; sub = sub-terms    C0: ; ; type; sub
Table 1: Results on verification benchmarks with known triggering issues. The columns show: the source of the benchmarks, the number of files (#), their number of conjuncts () and of quantifiers (), the number of files for which five configurations (C0–C4) synthesized suited triggering terms, our results across all configurations, the number of unsat proofs generated by Z3 (with MBQI [16]), CVC4 (with enumerative instantiation [27]), and Vampire [18] (in CASC mode [34], using Z3 for ground theory reasoning).

First, we used manually-collected benchmarks with known triggering issues from Dafny [22], F* [35], Gobra [39], and Viper [24]. We reconstructed 4, respectively 2 inconsistent axiomatizations from Dafny and F*, based on the changes from the repositories and the messages from the issue trackers; we obtained 11 inconsistent axiomatizations of arrays and option types from Gobra’s developers and collected 15 incompleteness issues from Viper’s test suite [37], with at least one assertion needed only for triggering. These contain algorithms for arrays, binomial heaps, binary search trees, and regression tests. The file sizes (minimum-maximum number of formulas or quantifiers) are shown in Tab. 1, columns 3–4.

Configurations.

We ran our tool with five configurations, to also analyze the impact of its parameters (see Alg. 1 and Appx. C). The default configuration C0 has: (similarity threshold), (batch size, i.e., the number of candidate terms validated together), type (no type-based constraints), sub (no unification for sub-terms). The other configurations differ from C0 in the parameters shown in Tab. 1. All configurations use (maximum transitivity depth), (maximum number of different models), and 600s timeout per file.

Results.

Columns 5–9 in Tab. 1 show the number of files solved by each configuration, column 10 summarizes the files solved by at least one. Overall, we found suited triggering terms for 65,6%, including all F* and Gobra benchmarks. An F* unsoundness exposed by all configurations in 60s is given in Fig. 9. It required two developers to be manually diagnosed based on a bug report [14]. A simplified Gobra axiomatization for option types, solved by C4 in 13s, is shown in Fig. 11. Gobra’s team spent one week to identify some of the issues. As our triggering terms for F* and Gobra were similar to the manually-written ones, they could have reduced the human effort in localizing and fixing the errors.

Our algorithm synthesized missing triggering terms for 7 Viper files, including the array maximum example [2], for which E-matching could not prove that the maximal element in a strictly increasing array of size 3 is its last element. Our triggering term loc(a,2) (loc maps arrays and integers to heap locations) can be added by a user of the verifier to their postcondition. A developer can fix the root cause of the incompleteness by including a generalization of the triggering term to arbitrary array sizes: len(a)!=0 ==> x==loc(a,len(a)-1).val. Both result in E-matching refuting the proof obligation in under 0.1s. We also exposed another case where Boogie (used by Viper) is sound only modulo patterns (as in Fig. 3).

4.2 Effectiveness on SMT-COMP benchmarks

C0 C1 C2 C3 C4 Our Z3 CVC4 Vampire
Source min-max min-max default type sub work MBQI enum inst CASCZ3

Spec#
33 28 - 2363 25 - 645 16 16 14 16 15 16 16 0 29

VCC/Havoc
14 129 - 1126 100 - 1027 11 9 5 11 9 11 12 0 14

Simplify
1 256 129 0 0 0 0 0 0 1 0 0

BWI
13 189 - 384 198 - 456 1 1 2 1 1 2 12 0 12

Total 61 29 (47,5%) 41 (67,2%)     0 (0%) 55 (90,1%)


= similarity threshold; = batch size; type = type-based constraints; sub = sub-terms        C0: ; ; type; sub
Table 2: Results on SMT-COMP inputs. The columns have the structure from Tab. 1.

Next, we considered 61 SMT-COMP [33] benchmarks from Spec# [6], VCC [31], Havoc [9], Simplify [12], and the Bit-Width-Independent (BWI) encoding [25].

Results.

The results are shown in Tab. 2. Our algorithm enabled E-matching to refute 47.5% of the files, most of them from Spec# and VCC/Havoc. We manually inspected some BWI benchmarks (for which the algorithm had worse results) and observed that the validation step times out even with a much higher timeout. This shows that some candidate terms trigger matching loops and explains why C2 (which validates them individually) solved one more file. Extending our algorithm to avoid matching loops, by construction, is left as future work.

4.3 Comparison with unsatisfiability proofs

As an alternative to our work, tool developers could try to manually identify triggering issues from refutation proofs, but these do not consider patterns and are harder to understand. Columns 11–13 in Tab. 1 and Tab. 2 show the number of proofs produced by Z3 with MBQI [16], CVC4 [7] with enumerative instantiation [27], and Vampire [18] using Z3 for ground theory reasoning [26] and the CASC [34] portfolio mode with competition presets. CVC4 failed for all examples (it cannot construct proofs for quantified logics), Vampire refuted most of them. Our algorithm outperformed MBQI for F* and Gobra and had similar results for Dafny, Spec# and VCC/Havoc. All our configurations solved two VCC/Havoc files not solved by MBQI (Appx. D shows an example). Moreover, our triggering terms are much simpler and directly highlight the root cause of the issues. Compared to our generated term loc(a,2), MBQI’s proof for Viper’s array maximum example has 2135 lines and over 700 reasoning steps, while Vampire’s proof has 348 lines and 101 inference steps. Other proofs have similar complexity.

Vampire and MBQI cannot replace our technique: as most deductive verifiers employ E-matching, it is important to help the developers use the algorithm of their choice and return sound results even if they rely on patterns for soundness (as in Fig. 3). Our tool can also produce multiple triggering terms (see Appx. C), thus it can reveal multiple triggering issues for the same input formula.

5 Related Work

To our knowledge, no other approach automatically produces the information needed by developers to remedy the effects of overly restrictive patterns. Quantifier instantiation and refutation techniques (discussed next) can produce unsatisfiability proofs, but these are much more complex than our triggering terms.

Quantifier instantiation techniques.

Model-based quantifier instantiation [16] (MBQI) was designed for sat formulas. It checks if the models obtained for the quantifier-free part of the input satisfy the quantifiers, whereas we check if the synthesized triggering terms obtained for some interpretation of the uninterpreted functions generalize to all interpretations. In some cases, MBQI can also generate unsatisfiability proofs, but they require expert knowledge to be understood; our triggering terms are much simpler.Counterexample-guided quantifier instantiation [28] is a technique for sat formulas, which synthesizes computable functions from logical specifications. It is applicable to functions whose specifications have explicit syntactic restrictions on the space of possible solutions, which is usually not the case for axiomatizations. Thus the technique cannot directly solve the complementary problem of proving soundness of the axiomatization.

E-matching-based approaches.

Rümmer [30] proposed a calculus for first-order logic modulo linear integer arithmetic that integrates constraint-based free variable reasoning with E-matching. Our algorithm does not require reasoning steps, so it is applicable to formulas from all the logics supported by the SMT solver. Enumerative instantiation [27] is an approach that exhaustively enumerates ground terms from a set of ordered, quantifier-free terms from the input. It can be used to refute formulas with quantifiers, but not to construct proofs (see Sec. 4.3). Our algorithm derives quantifier-free formulas and synthesizes the triggering terms from their models, even if the input does not have a quantifier-free part. It uses also syntactic information to construct complex triggering terms.

Theorem provers.

First-order theorem provers (e.g., Vampire [18]) also generate refutation proofs. More recent works combine a superposition calculus with theory reasoning [38, 26], integrating SAT/SMT solvers with theorem provers. We also use unification, but to synthesize triggering terms required by E-matching. However, our triggering terms are much simpler than Vampire’s proofs and can be used to improve the triggering strategies for all future runs of the verifier.

6 Conclusions

We have presented the first automated technique that enables the developers of verifiers remedy the effects of overly restrictive patterns. Since discharging proof obligations and identifying inconsistencies in axiomatizations require an SMT solver to prove the unsatisfiability of a formula via E-matching, we developed a novel algorithm for synthesizing triggering terms that allow the solver to complete the proof. Our approach is effective for a diverse set of verifiers, and can significantly reduce the human effort in localizing and fixing triggering issues.

Acknowledgements

We would like to thank the reviewers for their insightful comments. We are also grateful to Felix Wolf for providing us the Gobra benchmarks, and to Evgenii Kotelnikov for his detailed explanations about Vampire.

References

  • [1] A. Amighi, S. Blom, and M. Huisman (2016) VerCors: A layered approach to practical verification of concurrent software. In PDP, pp. 495–503. External Links: Link Cited by: §1, §1.
  • [2] (2021) Array maximum, by elimination. Note: http://viper.ethz.ch/examples/max-array-elimination.html Cited by: §4.1.
  • [3] V. Astrauskas, P. Müller, F. Poli, and A. J. Summers (2019) Leveraging Rust types for modular specification and verification. In Object-Oriented Programming Systems, Languages, and Applications (OOPSLA), Vol. 3, pp. 147:1–147:30. External Links: Document Cited by: §1, §1.
  • [4] F. Baader and W. Snyder (2001) Unification theory. In

    Handbook of Automated Reasoning

    , J. A. Robinson and A. Voronkov (Eds.),
    pp. 445–532. Cited by: Appendix E, §3.2.
  • [5] M. Barnett, B. E. Chang, R. DeLine, B. Jacobs, and K. R. M. Leino (2005) Boogie: a modular reusable verifier for object-oriented programs. In Formal Methods for Components and Objects (FMCO), F. S. de Boer, M. M. Bonsangue, S. Graf, and W. P. de Roever (Eds.), Lecture Notes in Computer Science, Vol. 5, pp. 364–387. Cited by: Figure 1, §1, §3.3.
  • [6] M. Barnett, M. Fähndrich, K. R. M. Leino, P. Müller, W. Schulte, and H. Venter (2011-06) Specification and verification: the Spec# experience. Communications of the ACM 54 (6), pp. 81–91. Cited by: Appendix F, §1, §1, §4.2, §4.
  • [7] C. Barrett, C. L. Conway, M. Deters, L. Hadarean, D. Jovanović, T. King, A. Reynolds, and C. Tinelli (2011) CVC4. In Computer Aided Verification, G. Gopalakrishnan and S. Qadeer (Eds.), Berlin, Heidelberg, pp. 171–177. External Links: ISBN 978-3-642-22110-1 Cited by: §4.3.
  • [8] C. Barrett, P. Fontaine, and C. Tinelli (2017) The SMT-LIB Standard: Version 2.6. Technical report Department of Computer Science, The University of Iowa. Note: Available at www.SMT-LIB.org Cited by: Appendix B, Figure 2.
  • [9] S. Chatterjee, S. K. Lahiri, S. Qadeer, and Z. Rakamarić (2007) A reachability predicate for analyzing low-level software. In Tools and Algorithms for the Construction and Analysis of Systems, O. Grumberg and M. Huth (Eds.), Berlin, Heidelberg, pp. 19–33. External Links: ISBN 978-3-540-71209-1 Cited by: Figure 15, Appendix F, §1, §1, §4.2, §4.
  • [10] Á. Darvas and K. R. M. Leino (2007) Practical reasoning about invocations and implementations of pure methods. In Fundamental Approaches to Software Engineering (FASE), M. B. Dwyer and A. Lopes (Eds.), LNCS, Vol. 4422, pp. 336–351. Cited by: §1.
  • [11] L. de Moura and N. Bjørner (2008) Z3: an efficient SMT solver. In Tools and Algorithms for the Construction and Analysis of Systems, C. R. Ramakrishnan and J. Rehof (Eds.), Berlin, Heidelberg, pp. 337–340. External Links: ISBN 978-3-540-78800-3 Cited by: §4.
  • [12] D. Detlefs, G. Nelson, and J. B. Saxe (2005-05) Simplify: a theorem prover for program checking. J. ACM 52 (3), pp. 365–473. External Links: ISSN 0004-5411, Link, Document Cited by: Appendix F, §1, §4.2.
  • [13] M. Eilers and P. Müller (2018) Nagini: a static verifier for python. In Computer Aided Verification (CAV), H. Chockler and G. Weissenbacher (Eds.), LNCS, Vol. 10982, pp. 596–603. External Links: Document Cited by: §1, §1.
  • [14] (2021) F* issue 1848. Note: https://github.com/FStarLang/FStar/issues/1848 Cited by: §4.1.
  • [15] M. Gario and A. Micheli (2015) PySMT: a solver-agnostic library for fast prototyping of SMT-based algorithms. In SMT Workshop 2015, Cited by: Appendix F.
  • [16] Y. Ge and L. de Moura (2009) Complete instantiation for quantified formulas in satisfiabiliby modulo theories. In Computer Aided Verification, A. Bouajjani and O. Maler (Eds.), Berlin, Heidelberg, pp. 306–320. External Links: ISBN 978-3-642-02658-4 Cited by: §1, §4.3, Table 1, §5.
  • [17] S. Heule, I. T. Kassios, P. Müller, and A. J. Summers (2013) Verification condition generation for permission logics with abstract predicates and abstraction functions. In European Conference on Object-Oriented Programming (ECOOP), G. Castagna (Ed.), Lecture Notes in Computer Science, Vol. 7920, pp. 451–476. Cited by: §1.
  • [18] L. Kovács and A. Voronkov (2013) First-order theorem proving and Vampire. In Computer Aided Verification, N. Sharygina and H. Veith (Eds.), Berlin, Heidelberg, pp. 1–35. External Links: ISBN 978-3-642-39799-8 Cited by: §4.3, Table 1, §5.
  • [19] K. R. M. Leino and R. Monahan (2009) Reasoning about comprehensions with first-order SMT solvers. In Proceedings of the 2009 ACM Symposium on Applied Computing, SAC’09, New York, NY, USA, pp. 615–622. External Links: ISBN 9781605581668, Link, Document Cited by: Appendix D, §1, §1.
  • [20] K. R. M. Leino and P. Müller (2008) Verification of equivalent-results methods. In European Symposium on Programming (ESOP), S. Drossopoulou (Ed.), Lecture Notes in Computer Science, Vol. 4960, pp. 307–321. Cited by: §1.
  • [21] K. R. M. Leino and P. Rümmer (2010) A polymorphic intermediate verification language: design and logical encoding. In Tools and Algorithms for the Construction and Analysis of Systems, J. Esparza and R. Majumdar (Eds.), Berlin, Heidelberg, pp. 312–327. External Links: ISBN 978-3-642-12002-2 Cited by: §1.
  • [22] K. R. M. Leino (2010) Dafny: an automatic program verifier for functional correctness. In

    Logic for Programming, Artificial Intelligence, and Reasoning

    , E. M. Clarke and A. Voronkov (Eds.),
    Berlin, Heidelberg, pp. 348–370. External Links: ISBN 978-3-642-17511-4 Cited by: §1, §1, §1, §4.1, §4.
  • [23] M. Moskal (2009) Programming with triggers. In SMT, ACM International Conference Proceeding Series, Vol. 375, pp. 20–29. Cited by: §1.
  • [24] P. Müller, M. Schwerhoff, and A. J. Summers (2016) Viper: a verification infrastructure for permission-based reasoning. In Verification, Model Checking, and Abstract Interpretation (VMCAI), B. Jobstmann and K. R. M. Leino (Eds.), LNCS, Vol. 9583, pp. 41–62. Cited by: §4.1, §4.
  • [25] A. Niemetz, M. Preiner, A. Reynolds, Y. Zohar, C. Barrett, and C. Tinelli (2019) Towards bit-width-independent proofs in SMT solvers. In Automated Deduction – CADE 27, P. Fontaine (Ed.), Cham, pp. 366–384. External Links: ISBN 978-3-030-29436-6 Cited by: Appendix F, §4.2.
  • [26] G. Reger, N. Bjorner, M. Suda, and A. Voronkov (2016) AVATAR modulo theories. In GCAI 2016. 2nd Global Conference on Artificial Intelligence, C. Benzmüller, G. Sutcliffe, and R. Rojas (Eds.), EPiC Series in Computing, Vol. 41, pp. 39–52. External Links: ISSN 2398-7340, Link, Document Cited by: §4.3, §5.
  • [27] A. Reynolds, H. Barbosa, and P. Fontaine (2018) Revisiting enumerative instantiation. In Tools and Algorithms for the Construction and Analysis of Systems, D. Beyer and M. Huisman (Eds.), Cham, pp. 112–131. External Links: ISBN 978-3-319-89963-3 Cited by: §4.3, Table 1, §5.
  • [28] A. Reynolds, M. Deters, V. Kuncak, C. Tinelli, and C. Barrett (2015) Counterexample-guided quantifier instantiation for synthesis in SMT. In Computer Aided Verification, D. Kroening and C. S. Păsăreanu (Eds.), Cham, pp. 198–216. External Links: ISBN 978-3-319-21668-3 Cited by: §5.
  • [29] A. Rudich, Á. Darvas, and P. Müller (2008) Checking well-formedness of pure-method specifications. In Formal Methods (FM), J. Cuellar and T. Maibaum (Eds.), Lecture Notes in Computer Science, Vol. 5014, pp. 68–83. Cited by: §1.
  • [30] P. Rümmer (2012) E-matching with free variables. In Logic for Programming, Artificial Intelligence, and Reasoning, N. Bjørner and A. Voronkov (Eds.), Berlin, Heidelberg, pp. 359–374. External Links: ISBN 978-3-642-28717-6 Cited by: §5.
  • [31] W. Schulte (2008-01) VCC: contract-based modular verification of concurrent c. In 31st International Conference on Software Engineering, ICSE 2009, 31st International Conference on Software Engineering, ICSE 2009 edition. External Links: Link Cited by: Figure 15, Appendix F, §4.2, §4.
  • [32] SMT-COMP 2019 (2019) The 14th international satisfiability modulo theories competition (including pending benchmarks). Note: https://smt-comp.github.io/2019/,https://clc-gitlab.cs.uiowa.edu:2443/SMT-LIB-benchmarks-tmp/benchmarks-pending Cited by: Figure 10, Appendix C.
  • [33] SMT-COMP 2020 (2020) The 15th international satisfiability modulo theories competition. Note: https://smt-comp.github.io/2020/ Cited by: Figure 15, Appendix F, §1, §4.2, §4.
  • [34] G. Sutcliffe (2016) The CADE ATP System Competition - CASC. AI Magazine 37 (2), pp. 99–101. Cited by: §4.3, Table 1.
  • [35] N. Swamy, C. Hritcu, C. Keller, A. Rastogi, A. Delignat-Lavaud, S. Forest, K. Bhargavan, C. Fournet, P. Strub, M. Kohlweiss, J. Zinzindohoue, and S. Zanella-Béguelin (2016) Dependent types and multi-monadic effects in F*. In Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL ’16, New York, NY, USA, pp. 256–270. External Links: ISBN 9781450335492, Link, Document Cited by: Figure 9, §4.1, §4.
  • [36] N. Swamy, J. Weinberger, C. Schlesinger, J. Chen, and B. Livshits (2013) Verifying higher-order programs with the Dijkstra monad. In Proceedings of the 34th annual ACM SIGPLAN conference on Programming Language Design and Implementation, PLDI ’13, pp. 387–398. External Links: Link Cited by: §1, §1.
  • [37] (2021) Viper test suite. Note: https://github.com/viperproject/silver/tree/master/src/test/resources Cited by: §4.1.
  • [38] A. Voronkov (2014) AVATAR: the architecture for first-order theorem provers. In Computer Aided Verification, A. Biere and R. Bloem (Eds.), Cham, pp. 696–710. External Links: ISBN 978-3-319-08867-9 Cited by: §5.
  • [39] F. A. Wolf, L. Arquint, M. Clochard, W. Oortwijn, J. C. Pereira, and P. Müller (2021) Gobra: modular specification and verification of go programs. In Computer Aided Verification (CAV), A. Silva and K. R. M. Leino (Eds.), pp. 367–379. Cited by: §4.1, §4.

Appendix A Background: E-matching

In this section, we briefly discuss the E-matching-related terminology and explain how this quantifier-instantiation algorithm works on an example.

Patterns vs triggering terms.

Patterns are syntactic hints attached to quantifiers which instruct the SMT solver when to perform an instantiation. In Fig. 2, the quantified formula will be instantiated only when a triggering term that matches the pattern is encountered during the SMT run (i.e., the triggering term is present in the quantifier-free part of the input formula or is obtained by the solver from the body of a previously-instantiated quantifier).

E-matching.

We now illustrate how E-matching works on the example from Fig. 2; in particular, we show how our synthesized triggering term helps the solver to prove unsat when added to the axiomatization ( is a fresh variable of type ). Due to space constraints, we omit unnecessary instantiations. The sub-terms and trigger the instantation of and , respectively. The solver obtains the body of the quantifiers for these particular values:

Since the first disjunct of evaluates to (from ), the solver learns that the second disjunct must hold (i.e., the length must be -1); we abbreviate it as L = -1. Further, the sub-terms and of the synthesized triggering term lead to the instantiation of and , respectively:

from triggers :

By equalizing the arguments of the outer-most in , the solver learns that the first disjunct of is . The second disjunct must thus hold (i.e., the length should be positive); we abbreviate it as . Since , the unsatisfiability proof succeeds.

Appendix B Diverse models

In this section, we explain the importance of the parameter from Alg. 1 (the maximum number of models) and discuss heuristics for obtaining diverse models.

Figure 9: Inconsistent axiom from F* [35]. is an uninterpreted function. Synthesizing the triggering term requires diverse models.

Let us consider the formula from Fig. 9, which was part of an axiomatization with 2,495 axioms. axiomatizes the uninterpreted function and is inconsistent, because there exist two integers whose real division ("/") is not an integer. The model produced by the solver for the formula is . is defined ("/" is a total function [8]), but its result is not specified. Thus the solver cannot validate this model (i.e., it returns unknown).

In such cases or when the candidate term does not generalize to all interpretations of the uninterpreted functions, we re-iterate its construction, up to the bound (Alg. 1, line 11). For this, we strengthen the previously-derived formula to force the solver find a different model. In Fig. 9, if we simply exclude previous models, we can obtain a sequence of models with different values for the numerator, but with the same value (0) for the denominator. There are infinitely many such models, and all of them fail to validate for the same reason.

There are various heuristics one can employ to guide the solver’s search for a new model and our algorithm can be parameterized with different ones. In our experiments, we interpret the conjunct from Alg. 1, line 19 as . The first component requires all the variables to have different values than before. This requirement may be too strong for some variables, but as we use only soft constraints, the solver may ignore some constraints if it cannot generate a satisfying assignment.

The second part requires models from different equivalence classes, where an equivalence class includes all the variables that are equal in the model. For example, if the model is , where is a value of the corresponding type, then and belong to the same equivalence class. Considering equivalence classes is particularly important for variables of uninterpreted types; the solver cannot provide actual values for them, thus it assigns fresh, unconstrained variables. However, different fresh variables do not lead to diverse models.

Appendix C Extensions

Next, we describe various extensions of our algorithm that enable complex proofs.

Combining multiple candidate terms.

In Alg. 1, each candidate term is validated separately. To enable proofs that require multiple instantiations of the same formula, we developed an extension that validates multiple triggering terms at the same time. In such cases, the algorithm returns a set of terms that are necessary and sufficient to prove unsat. Fig. 10 presents a simple example from SMT-COMP 2019 pending benchmarks [32]. The input is unsatisfiable, as there does not exist an interpretation for the function that satisfies all the constraints: requires to be ; if is instantiated for , the solver learns that must be as well; however, if , then must be , which is a contradiction. Exposing the inconsistency thus requires two instantiations of , triggered by and , respectively. We generate both triggering terms, but in separate iterations (independently, both fail to validate). However, by validating them simultaneously (i.e., conjoin both of them to I), our algorithm identifies the required triggering term .

Figure 10: Benchmark from SMT-COMP 2019 [32]. The formulas set contradictory constraints on the function . is an uninterpreted type, and are user-defined constants of type . Synthesizing the triggering term requires multiple candidate terms. We use conjunctions here for simplicity, but our pre-processing applies distributivity of disjunction over conjunction and splits into three different formulas with unique names for the quantified variables.
Unification across multiple instantiations.

The clusters constructed by our algorithm are sets (see Alg. 2, line 12), so they contain a formula at most once, even if it is similar to multiple other formulas from the cluster. We thus consider the rewritings for multiple instantiations of the same formula separately, in different iterations. To handle cases that require multiple (but boundedly many) instantiations, we extend the algorithm with a parameter , which bounds the maximum frequency of a quantified conjunct within the formulas . That is, it allows a similar quantified formula, as well as itself, to be added to a cluster more than once (after performing variable renaming, to ensure that the names of the quantified variables are still globally unique). This results in an equisatisfiable formula for which our algorithm determines multiple triggering terms. Inputs whose unsatisfiability proofs require an unbounded number of instantiations typically contain a matching loop, thus we do not consider them here.

Figure 11: Fragment of Gobra’s option types axiomatization. is an uninterpreted type, is a user-defined constant of type . have multi-patterns (Appx. D). Synthesizing the triggering term requires type-based constraints.
Type-based constraints.

The rewritings of the form can be too imprecise (especially for quantified variables of uninterpreted types), as they do not constrain the . In Fig. 11, the solver cannot provide concrete values of type for and , it can only assign fresh, unconstrained variables (e.g., and ). However, the triggering terms