1 Introduction
Proof obligations frequently contain universal quantifiers, both in the specification and to encode the semantics of the programming language. Most deductive verifiers [1, 3, 6, 9, 13, 22, 36] rely on SMT solvers to discharge the proof obligations via Ematching [12]. This SMT algorithm requires syntactic matching patterns of ground terms (called patterns in the following), to control the instantiations. The pattern in the formula instructs the solver to instantiate the quantifier only when it finds a triggering term that matches the pattern, e.g., . The patterns can be written manually or inferred automatically. However, devising them is challenging [19, 23]. Too permissive patterns may lead to unnecessary instantiations that slow down verification or even cause nontermination (if each instantiation produces a new triggering term, in a socalled matching loop [12]). Overly restrictive patterns may prevent the instantiations needed to complete a proof; they cause two major problems in program verification, incompleteness and undetected unsoundness.
Incompleteness.
Overly restrictive patterns may cause spurious verification errors when the proof of valid proof obligations fails. Fig. 1 illustrates this case. The integer x represents the address of a node, and the uninterpreted functions len and nxt encode operations on linked lists. The axiom defines len: its result is positive and the last node points to itself. The assertion directly follows from the axiom, but the proof fails because the proof obligation does not contain the triggering term len(nxt(7)); thus, the axiom does not get instantiated. However, realistic proof obligations often contain hundreds of quantifiers [33], which makes the manual identification of missing triggering terms extremely difficult.
Unsoundness.
Most of the universal quantifiers in proof obligations appear in axioms over uninterpreted functions (to encode type information, heap models, datatypes, etc.). To obtain sound results, these axioms must be consistent (i.e., satisfiable); otherwise all proof obligations hold trivially. Consistency can be proved once and for all by showing the existence of a model, as part of the soundness proof. However, this solution is difficult to apply for those verifiers which generate axioms dynamically, depending on the program to be verified. Proving consistency then requires verifying the algorithm that generates the axioms for all possible inputs, and needs to consider many subtle issues [10, 29, 20].
A more practical approach is to check if the axioms generated for a given program are consistent. However, this check also depends on triggering: an SMT solver may fail to prove unsat if the triggering terms needed to instantiate the contradictory axioms are missing. The unsoundness can thus remain undetected.
For example, Dafny’s [22] sequence axiomatization from June 2008 contained an inconsistency found only over a year later. A fragment of this axiomatization is shown in Fig. 2. It expresses that empty sequences and sequences obtained through the operation are welltyped (–), that the length of a typecorrect sequence must be nonnegative (), and that constructs a new sequence of the required length (). The intended behavior of is to update the element at index in sequence to . However, since there are no constraints on the parameter , can be used with a negative length, leading to a contradiction with . This error cannot be detected by checking the satisfiability of the formula , as no axiom gets instantiated.
This work.
For SMTbased deductive verifiers, discharging proof obligations and revealing inconsistencies in axiomatizations require a solver to prove unsat via Ematching. (Verification techniques based on proof assistants are out of scope.) Given an SMT formula for which Ematching yields unknown due to insufficient quantifier instantiations, our technique generates suitable triggering terms that allow the solver to complete the proof. These terms enable users to understand and remedy the revealed completeness or soundness issue. Since the SMT queries for the verification of different input programs are typically very similar, fixing such issues benefits the verification of many or even all future runs of the verifier.
Fixing the incompleteness.
For Fig. 1, our technique finds the triggering term len(nxt(7)), which allows one to fix the incompleteness. Tool users (who cannot change the axioms) can add the term to the program; e.g., adding var t: int; t := len(nxt(7)) before the assertion has no effect on the execution, but triggers the instantiation of the axiom. Tool developers can devise less restrictive patterns. For instance, they can move the conjunct len(x) > to a separate axiom with the pattern \{len(x)\} (simply changing the axiom’s pattern to \{len(x)\} would cause matching loops). Alternatively, tool developers can adapt the encoding to emit additional triggering terms enforcing certain instantiations [17, 19].
Fixing the unsoundness.
Soundness modulo patterns.
Fig. 3 illustrates another scenario: Boogie’s [5] map axiomatization is inconsistent by design at the SMT level [21], but this behavior cannot be exposed from Boogie, as the type system prevents the required instantiations. Thus it does not affect Boogie’s soundness. It is nevertheless important to detect it because it could surface if Boogie was extended to support quantifier instantiation algorithms that are not based on Ematching (such as MBQI [16]) or firstorder provers. They could unsoundlyclassify an incorrect program that uses this map axiomatization as correct. Since states that storing a keyvalue pair into a map results in a new map with a potentially different type, one can prove that two different types (e.g., Boolean and Int) are equal in SMT. This example shows that the problems tackled in this paper cannot be solved by simply switching to other instantiation strategies: these are not the preferred choices of most verifiers [1, 3, 6, 9, 13, 22, 36], and may produce unsound results for verifiers designed for Ematching with axiomatizations sound only modulo patterns.
Contributions.
This paper makes the following technical contributions:

We present the first automated technique that allows the developers to detect completeness issues in program verifiers and soundness problems in their axiomatizations. Moreover, our approach helps them devise better triggering strategies for all future runs of their tool with Ematching.

We developed a novel algorithm for synthesizing the triggering terms necessary to complete unsatisfiability proofs using Ematching. Since quantifier instantiation is undecidable for firstorder formulas over uninterpreted functions, our algorithm might not terminate. However, all identified triggering terms are indeed sufficient to complete the proof; there are no false positives.

We evaluated our technique on benchmarks with known triggering problems from four program verifiers. Our experimental results show that it successfully synthesized the missing triggering terms in 65,6% of the cases, and can significantly reduce the human effort in localizing and fixing the errors.
Outline.
2 Overview
Our goal is to synthesize missing triggering terms, i.e., concrete instantiations for (a small subset of) the quantified variables of an input formula I, which are necessary for the solver to prove its unsatisfiablity. Intuitively, these triggering terms include counterexamples to the satisfiability of I and can be obtained from a model of its negation. For example, is unsatisfiable, and a counterexample is a model of its negation .
However, this idea does not apply to formulas over uninterpreted functions, which are common in proof obligations. The negation of , where is an uninterpreted function, is . This is a secondorder constraint (it quantifies over functions), and cannot be encoded in SMT, which supports only firstorder logic. We thus take a different approach.
Let be a secondorder formula. We define its approximation as:
(*) 
where are uninterpreted functions. The approximation considers only one interpretation, not all possible interpretations for each uninterpreted function.
We therefore construct a candidate triggering term from a model of and check if it is sufficient to prove that I is unsatisfiable (due to the approximation, a model is no longer guaranteed to be a counterexample for the original formula).
The four main steps of our algorithm are depicted in Fig. 4. The algorithm is standalone, i.e., not integrated into, nor dependent on any specific SMT solver. We illustrate it on the inconsistent axioms from Fig. 5 (which we assume are part of a larger axiomatization). To show that is unsatisfiable, the solver requires the triggering term . The corresponding instantiations of and generate contradictory constraints: and . In the following, we explain how we obtain this triggering term systematically.
Step 1: Clustering.
As typical proof obligations or axiomatizations contain hundreds of quantifiers, exploring combinations of triggering terms for all of them does not scale. To prune the search space, we exploit the fact that I is unsatisfiable only if there exist instantiations of some (in the worst case all) of its quantified conjuncts such that they produce contradictory constraints on some uninterpreted functions. (If there is a contradiction among the quantifierfree conjuncts, the solver will detect it directly.) We identify clusters of formulas that share function symbols and then process each cluster separately. In Fig. 5, and share the function symbol , so we build the cluster .
Step 2: Syntactic unification.
The formulas within clusters usually contain uninterpreted functions applied to different arguments (e.g., is applied to in and to in ). We thus perform syntactic unification to identify sharing constraints on the quantified variables (which we call rewritings and denote their set by ) such that instantiations that satisfy these rewritings generate formulas with common terms (on which they might set contradictory constraints). and share the term if we perform the rewritings .
Step 3: Identifying candidate triggering terms.
The cluster from step 1 contains a contradiction if there exists a formula in such that: (1) is unsatisfiable by itself, or (2) contradicts at least one other formula from .
To address scenario (1), we ask an SMT solver for a model of the formula , where is defined in (* ‣ 2) above. After Skolemization, is quantifierfree, so the solver is generally able to provide a model if one exists. We then obtain a candidate triggering term by substituting the quantified variables from the patterns of the formulas in with their corresponding values from the model.
However, scenario (1) is not sufficient to expose the contradiction from Fig. 5, since both and are individually satisfiable. Our algorithm thus also derives stronger formulas corresponding to scenario (2). That is, it will next consider the case where contradicts , whose encoding into firstorder logic is: , where is the set of rewritings identified in step 2, used to connect the quantified variables. This formula is universallyquantified (since is), so the solver cannot prove its satisfiability and generate models. We solve this problem by requiring to contradict the instantiation of , which is a weaker constraint. Let be an arbitrary formula. We define its instantiation as:
(**) 
where are variables. Then is equivalent to . (To simplify the notation, here and in the following formulas, we omit existential quantifiers.) All its models set to 7. Substituting by (according to ) and by 7 (its value from the model) in the patterns of and yields the candidate triggering term .
Step 4: Validation.
Once we have found a candidate triggering term, we add it to the original formula I (wrapped in a fresh uninterpreted function, to make it available to Ematching, but not affect the input’s satisfiability) and check if the solver can prove unsat. If so, our algorithm terminates successfully and reports the synthesized triggering term (after a minimization step that removes unnecessary subterms); otherwise, we go back to step 3 to obtain another candidate. In our example, the triggering term is sufficient to complete the proof.
3 Synthesizing Triggering Terms
Next, we define the input formulas (Sec. 3.1), explain the details of our algorithm (Sec. 3.2) and discuss its limitations (Sec. 3.3). Appx. C and Appx. E present extensions that enable complex proofs and optimizations used in Sec. 4.
3.1 Input formula
To simplify our algorithm, we preprocess the inputs (i.e., the proof obligations or the axioms of a verifier): we Skolemize existential quantifiers and transform all propositional formulas into negation normal form (NNF), where negation is applied only to literals and the only logical connectives are conjunction and disjunction; we also apply the distributivity of disjunction over conjunction and split conjunctions into separate formulas. These steps preserve satisfiability and the semantics of patterns (Appx. E addresses scalability issues). The resulting formulas follow the grammar in Fig. 6. Literals may include interpreted and uninterpreted functions, variables and constants. Free variables are nullary functions. Quantified variables can have interpreted or uninterpreted types, and the preprocessing ensures that their names are globally unique. We assume that each quantifier is equipped with a pattern (if none is provided, we run the solver to infer one). Patterns are combinations of uninterpreted functions and must mention all quantified variables. Since there are no existential quantifiers after Skolemization, we use the term quantifier to denote universal quantifiers.
3.2 Algorithm
The pseudocode of our algorithm is given in Alg. 1. It takes as input an SMT formula I (defined in Fig. 6), which we treat in a slight abuse of notation as both a formula and a set of conjuncts. Three other parameters allow us to customize the search strategy and are discussed later. The algorithm yields a triggering term that enables the unsat proof, or None, if no term was found. We assume here that I contains no nested quantifiers and present those later in this section.
The algorithm iterates over each quantified conjunct of I (Alg. 1, line 3) and checks if is individually unsatisfiable (for depth = ). For complex proofs, this is usually not sufficient, as I is typically inconsistent due to a combination of conjuncts ( in Fig. 5). In such cases, the algorithm proceeds as follows:
Step 1: Clustering.
It constructs clusters of formulas similar to (Alg. 2, line 4), based on their Jaccard similarity index. Let and be two arbitrary formulas, and and their respective sets of uninterpreted function symbols (from their bodies and the patterns). The Jaccard similarity index is defined as:
(the number of common uninterpreted functions divided by the total number). For Fig. 5, , , .
Our algorithm explores the search space by iteratively expanding clusters to include transitivelysimilar formulas up to a maximum depth (parameter in Alg. 1). For two formulas , we define the similarity function as:
where is a similarity threshold used to parameterize our algorithm.
The initial cluster () includes all the conjuncts of I that are directly similar to . Each subsequent iteration adds the conjuncts that are directly similar to an element of the cluster from the previous iteration, that is, transitively similar to . This search strategy allows us to gradually strengthen the formulas (used to synthesize candidate terms in step 3) without overly constraining them (an overconstrained formula is unsatisfiable, and has no models).
Step 2: Syntactic unification.
Next (Alg. 2, line 8) we identify rewritings, i.e., constraints under which two similar quantified formulas share terms.
(Appx. D
presents the quantifierfree case.) We obtain the rewritings by performing a simplified form of syntactic term unification, which reduces their number to a practical size. Our rewritings are directed equalities. For two formulas and and an uninterpreted function they have one of the following two shapes:
(1) , where is a quantified variable of , are terms from defined below, contains a term and contains a term ,
(2) , where is a quantified variable of , are terms from defined below, contains a term and contains a term ,
where is a constant , a quantified variable , or a composite function occurring in the formula and are arbitrary (interpreted or uninterpreted) functions. That is, we determine the most general unifier [4] only for those terms that have uninterpreted functions as the outermost functions and quantified variables as arguments. The unification algorithm is standard (except for the restricted shapes), so it is not shown explicitly.
Since a term may appear more than once in , or unifies with multiple similar formulas through the same quantified variable, we can obtain alternative rewritings for a quantified variable. In such cases, we either duplicate or split the cluster, such that in each clusterrewriting pair, each quantified variable is rewritten at most once (see Alg. 2, line 12). In Fig. 7, both and are similar to (all three formulas share the uninterpreted symbol ). Since the unification produces alternative rewritings for ( and ), the procedure clustersRewritings returns the pairs .
Step 3: Identifying candidate terms.
From the clusters and the rewritings (identified before), we then derive quantifierfree formulas (Alg. 1, line 10), and, if they are satisfiable, construct the candidate triggering terms from their models (Alg. 1, line 15). Each formula consists of: (1) (defined in (* ‣ 2), which is equivalent to , since has the shape from Alg. 1, line 3), (2) the instantiations (see (** ‣ 2)) of all the similar formulas from the cluster, and (3) the corresponding rewritings . (Since we assume that all the quantified variables are globally unique, we do not perform variable renaming for the instantiations).
If a similar formula has multiple disjuncts , the solver uses shortcircuiting semantics when generating the model for . That is, if it can find a model that satisfies the first disjunct, it does not consider the remaining ones. To obtain more diverse models, we synthesize formulas that cover each disjunct, i.e., make sure that it evaluates to at least once. We thus compute multiple instantiations of each similar formula, of the form: (see Alg. 1, line 7). To consider all the combinations of disjuncts, we derive the formula from the Cartesian product of the instantiations (Alg. 1, line 9). (To present the pseudocode in a concise way, we store in the instantiations map as well (Alg. 1, line 8), even if it does not represent the instantiation of .)
In Fig. 8, is similar to and . has two disjuncts and thus two possible instantiations: . The formula for the first instantiation is satisfiable, but none of the values the solver can assign to (which are all greater or equal to ) are sufficient for the unsatisfiability proof to succeed. The second instantiation adds additional constraints: instead of , it requires (. The resulting formula has a unique solution for , namely 0, and the triggering term is sufficient to prove unsat.
The procedure candidateTerm from Alg. 3 synthesizes a candidate triggering term from the model of and the rewritings . We first collect all the patterns of the formulas from the cluster (Alg. 3, line 2), i.e., of and of its similar conjuncts (see Alg. 1, line 15). Then, we apply the rewritings, in an arbitrary order (Alg. 3, lines 3–6). That is, we substitute the quantified variable from the left hand side of the rewriting with the right hand side term and propagate this substitution to the remaining rewritings. This step allows us to include in the synthesized triggering terms additional information, which cannot be provided by the solver. Then (Alg. 3, lines 7–8) we substitute the remaining variables with their constant values from the model (i.e., constants for builtin types, and fresh, unconstrained variables for uninterpreted types). The resulting triggering term is wrapped in an application to a fresh, uninterpreted function dummy to ensure that conjoining it to I does not change I’s satisfiability.
Step 4: Validation.
We validate the candidate triggering term by checking if is unsatisfiable, i.e., if these particular interpretations for the uninterpreted functions generalize to all interpretations (Alg. 1, line 16). If this is the case then we return the minimized triggering term (Alg. 1, line 18). The function has multiple arguments, each of them corresponding to one pattern from the cluster (Alg. 3, line 9). This is an overapproximation of the required triggering terms (once instantiated, the formulas may trigger each other), so minimized removes redundant (sub)terms. If does not validate, we reiterate its construction up to a bound and strengthen the formula to obtain a different model (Alg. 1, lines 19 and 11). Appx. B
discusses heuristics for obtaining
diverse models.Nested quantifiers.
Our algorithm also supports nested quantifiers. Nested existential quantifiers in positive positions and nested universal quantifiers in negative positions are replaced in NNF by new, uninterpreted Skolem functions. Step 2 is also applicable to them: Skolem functions with arguments (the quantified variables from the outer scope) are unified as regular uninterpreted functions; they can also appear as in a rewriting, but not as the lefthand side (we do not perform higherorder unification). In such cases, the result is imprecise: the unification of and produces only the rewriting .
After preprocessing, the conjunct and the similar formulas may still contain nested universal quantifiers. is always negated in , thus it becomes, after Skolemization, quantifierfree. To ensure that is also quantifierfree (and the solver can generate a model), we extend the algorithm to recursively instantiate similar formulas with nested quantifiers when computing the instantiations.
3.3 Limitations
Next, we discuss the limitations of our technique, as well as possible solutions.
Applicability.
Our algorithm effectively addresses a common cause of failed unsatisfiability proofs in program verification, i.e., missing triggering terms. Other causes (e.g., incompleteness in the solver’s decision procedures due to undecidable theories) are beyond the scope of our work. Also, our algorithm is tailored to unsatisfiability proofs; satisfiability proofs cannot be reduced to unsatisfiability proofs by negating the input, because the negation cannot usually be encoded in SMT (as we have illustrated in Sec. 2).
SMT solvers.
Our algorithm synthesizes triggering terms as long as the SMT solver can find models for our quantifierfree formulas. However, solvers are incomplete, i.e., they can return unknown and generate only partial models, which are not guaranteed to be correct. Nonetheless, we also use partial models, as the validation step (step 4 in Fig. 4) ensures that they do not lead to false positives.
Patterns.
Since our algorithm is based on patterns (provided or inferred), it will not succeed if they do not permit the necessary instantiations. For example, the formula is unsatisfiable. However, the SMT solver cannot automatically infer a pattern from the body of the quantifier, since equality is an interpreted function and must not occur in a pattern. Thus Ematching (and implicitly our algorithm) cannot solve this example, unless the user provides as pattern some uninterpreted function that mentions both and (e.g., ).
Bounds and rewritings.
Synthesizing triggering terms is generally undecidable. We ensure termination by bounding the search space through various customizable parameters, thus our algorithm misses results not found within these bounds. We also only unify applications of uninterpreted functions, which are common in verification. Efficiently supporting interpreted functions (especially equality) is very challenging for inputs with a small number of types (e.g., from Boogie [5]).
Despite these limitations, our algorithm effectively synthesizes the triggering terms required in practical examples, as we experimentally show next.
4 Evaluation
Evaluating our work requires benchmarks with known triggering issues (i.e., for which Ematching yields unknown). Since there is no publicly available suite, in Sec. 4.1 we used manuallycollected benchmarks from four verifiers [22, 35, 39, 24]. Our algorithm succeeded for 65,6%. To evaluate its applicability to other verifiers, in Sec. 4.2 we used SMTCOMP [33] inputs. As they were not designed to expose triggering issues, we developed a filtering step (see Appx. F) to automatically identify the subset that falls into this category. The results show that our algorithm is suited also for [6, 31, 9]. Sec. 4.3 illustrates that our triggering terms are simpler than the unsat proofs produced by quantifier instantation and refutation techniques, enabling one to fix the root cause of the revealed issues.
Setup.
We used Z3 (4.8.10) [11] to infer the patterns, generate the models and validate the candidate terms. However, our tool can be used with any solver that supports Ematching and exposes the inferred patterns. We used Z3’s NNF tactic to transform the inputs into NNF and localitysensitive hashing to compute the clusters. We fixed Z3’s random seeds to arbitrary values (sat.random_seed to 488, smt.random_seed to 599, and nlsat.seed to 611). We set the (soft) timeout to 600s and the memory limit to 6 GB per run and used a 1s timeout for obtaining a model and for validating a candidate term. The experiments were conducted on a Linux server with 252 GB of RAM and 32 Intel Xeon CPUs at 3.3 GHz.
4.1 Effectiveness on verification benchmarks with triggering issues
C0  C1  C2  C3  C4  Our  Z3  CVC4  Vampire  
Source  minmax  minmax  default  type  sub  work  MBQI  enum inst  CASCZ3  
Dafny 
4  6  16  5  16  1  1  1  1  0  1  1  0  2 
F* 
2  18  2388  15  2543  1  1  1  1  2  2  1  0  2 
Gobra 
11  64  78  50  63  5  10  1  7  10  11  6  0  11 
Viper 
15  84  143  68  203  7  5  3  5  5  7  11  0  15 


Total  32  21 (65,6%)  19 (59,3%)  0 (0%)  30 (93,7%)  
= similarity threshold; = batch size; type = typebased constraints; sub = subterms C0: ; ; type; sub 
First, we used manuallycollected benchmarks with known triggering issues from Dafny [22], F* [35], Gobra [39], and Viper [24]. We reconstructed 4, respectively 2 inconsistent axiomatizations from Dafny and F*, based on the changes from the repositories and the messages from the issue trackers; we obtained 11 inconsistent axiomatizations of arrays and option types from Gobra’s developers and collected 15 incompleteness issues from Viper’s test suite [37], with at least one assertion needed only for triggering. These contain algorithms for arrays, binomial heaps, binary search trees, and regression tests. The file sizes (minimummaximum number of formulas or quantifiers) are shown in Tab. 1, columns 3–4.
Configurations.
We ran our tool with five configurations, to also analyze the impact of its parameters (see Alg. 1 and Appx. C). The default configuration C0 has: (similarity threshold), (batch size, i.e., the number of candidate terms validated together), type (no typebased constraints), sub (no unification for subterms). The other configurations differ from C0 in the parameters shown in Tab. 1. All configurations use (maximum transitivity depth), (maximum number of different models), and 600s timeout per file.
Results.
Columns 5–9 in Tab. 1 show the number of files solved by each configuration, column 10 summarizes the files solved by at least one. Overall, we found suited triggering terms for 65,6%, including all F* and Gobra benchmarks. An F* unsoundness exposed by all configurations in 60s is given in Fig. 9. It required two developers to be manually diagnosed based on a bug report [14]. A simplified Gobra axiomatization for option types, solved by C4 in 13s, is shown in Fig. 11. Gobra’s team spent one week to identify some of the issues. As our triggering terms for F* and Gobra were similar to the manuallywritten ones, they could have reduced the human effort in localizing and fixing the errors.
Our algorithm synthesized missing triggering terms for 7 Viper files, including the array maximum example [2], for which Ematching could not prove that the maximal element in a strictly increasing array of size 3 is its last element. Our triggering term loc(a,2) (loc maps arrays and integers to heap locations) can be added by a user of the verifier to their postcondition. A developer can fix the root cause of the incompleteness by including a generalization of the triggering term to arbitrary array sizes: len(a)!=0 ==> x==loc(a,len(a)1).val. Both result in Ematching refuting the proof obligation in under 0.1s. We also exposed another case where Boogie (used by Viper) is sound only modulo patterns (as in Fig. 3).
4.2 Effectiveness on SMTCOMP benchmarks
C0  C1  C2  C3  C4  Our  Z3  CVC4  Vampire  
Source  minmax  minmax  default  type  sub  work  MBQI  enum inst  CASCZ3  
Spec# 
33  28  2363  25  645  16  16  14  16  15  16  16  0  29 
VCC/Havoc 
14  129  1126  100  1027  11  9  5  11  9  11  12  0  14 
Simplify 
1  256  129  0  0  0  0  0  0  1  0  0 
BWI 
13  189  384  198  456  1  1  2  1  1  2  12  0  12 


Total  61  29 (47,5%)  41 (67,2%)  0 (0%)  55 (90,1%)  
= similarity threshold; = batch size; type = typebased constraints; sub = subterms C0: ; ; type; sub 
Next, we considered 61 SMTCOMP [33] benchmarks from Spec# [6], VCC [31], Havoc [9], Simplify [12], and the BitWidthIndependent (BWI) encoding [25].
Results.
The results are shown in Tab. 2. Our algorithm enabled Ematching to refute 47.5% of the files, most of them from Spec# and VCC/Havoc. We manually inspected some BWI benchmarks (for which the algorithm had worse results) and observed that the validation step times out even with a much higher timeout. This shows that some candidate terms trigger matching loops and explains why C2 (which validates them individually) solved one more file. Extending our algorithm to avoid matching loops, by construction, is left as future work.
4.3 Comparison with unsatisfiability proofs
As an alternative to our work, tool developers could try to manually identify triggering issues from refutation proofs, but these do not consider patterns and are harder to understand. Columns 11–13 in Tab. 1 and Tab. 2 show the number of proofs produced by Z3 with MBQI [16], CVC4 [7] with enumerative instantiation [27], and Vampire [18] using Z3 for ground theory reasoning [26] and the CASC [34] portfolio mode with competition presets. CVC4 failed for all examples (it cannot construct proofs for quantified logics), Vampire refuted most of them. Our algorithm outperformed MBQI for F* and Gobra and had similar results for Dafny, Spec# and VCC/Havoc. All our configurations solved two VCC/Havoc files not solved by MBQI (Appx. D shows an example). Moreover, our triggering terms are much simpler and directly highlight the root cause of the issues. Compared to our generated term loc(a,2), MBQI’s proof for Viper’s array maximum example has 2135 lines and over 700 reasoning steps, while Vampire’s proof has 348 lines and 101 inference steps. Other proofs have similar complexity.
Vampire and MBQI cannot replace our technique: as most deductive verifiers employ Ematching, it is important to help the developers use the algorithm of their choice and return sound results even if they rely on patterns for soundness (as in Fig. 3). Our tool can also produce multiple triggering terms (see Appx. C), thus it can reveal multiple triggering issues for the same input formula.
5 Related Work
To our knowledge, no other approach automatically produces the information needed by developers to remedy the effects of overly restrictive patterns. Quantifier instantiation and refutation techniques (discussed next) can produce unsatisfiability proofs, but these are much more complex than our triggering terms.
Quantifier instantiation techniques.
Modelbased quantifier instantiation [16] (MBQI) was designed for sat formulas. It checks if the models obtained for the quantifierfree part of the input satisfy the quantifiers, whereas we check if the synthesized triggering terms obtained for some interpretation of the uninterpreted functions generalize to all interpretations. In some cases, MBQI can also generate unsatisfiability proofs, but they require expert knowledge to be understood; our triggering terms are much simpler.Counterexampleguided quantifier instantiation [28] is a technique for sat formulas, which synthesizes computable functions from logical specifications. It is applicable to functions whose specifications have explicit syntactic restrictions on the space of possible solutions, which is usually not the case for axiomatizations. Thus the technique cannot directly solve the complementary problem of proving soundness of the axiomatization.
Ematchingbased approaches.
Rümmer [30] proposed a calculus for firstorder logic modulo linear integer arithmetic that integrates constraintbased free variable reasoning with Ematching. Our algorithm does not require reasoning steps, so it is applicable to formulas from all the logics supported by the SMT solver. Enumerative instantiation [27] is an approach that exhaustively enumerates ground terms from a set of ordered, quantifierfree terms from the input. It can be used to refute formulas with quantifiers, but not to construct proofs (see Sec. 4.3). Our algorithm derives quantifierfree formulas and synthesizes the triggering terms from their models, even if the input does not have a quantifierfree part. It uses also syntactic information to construct complex triggering terms.
Theorem provers.
Firstorder theorem provers (e.g., Vampire [18]) also generate refutation proofs. More recent works combine a superposition calculus with theory reasoning [38, 26], integrating SAT/SMT solvers with theorem provers. We also use unification, but to synthesize triggering terms required by Ematching. However, our triggering terms are much simpler than Vampire’s proofs and can be used to improve the triggering strategies for all future runs of the verifier.
6 Conclusions
We have presented the first automated technique that enables the developers of verifiers remedy the effects of overly restrictive patterns. Since discharging proof obligations and identifying inconsistencies in axiomatizations require an SMT solver to prove the unsatisfiability of a formula via Ematching, we developed a novel algorithm for synthesizing triggering terms that allow the solver to complete the proof. Our approach is effective for a diverse set of verifiers, and can significantly reduce the human effort in localizing and fixing triggering issues.
Acknowledgements
We would like to thank the reviewers for their insightful comments. We are also grateful to Felix Wolf for providing us the Gobra benchmarks, and to Evgenii Kotelnikov for his detailed explanations about Vampire.
References
 [1] (2016) VerCors: A layered approach to practical verification of concurrent software. In PDP, pp. 495–503. External Links: Link Cited by: §1, §1.
 [2] (2021) Array maximum, by elimination. Note: http://viper.ethz.ch/examples/maxarrayelimination.html Cited by: §4.1.
 [3] (2019) Leveraging Rust types for modular specification and verification. In ObjectOriented Programming Systems, Languages, and Applications (OOPSLA), Vol. 3, pp. 147:1–147:30. External Links: Document Cited by: §1, §1.

[4]
(2001)
Unification theory.
In
Handbook of Automated Reasoning
, J. A. Robinson and A. Voronkov (Eds.), pp. 445–532. Cited by: Appendix E, §3.2.  [5] (2005) Boogie: a modular reusable verifier for objectoriented programs. In Formal Methods for Components and Objects (FMCO), F. S. de Boer, M. M. Bonsangue, S. Graf, and W. P. de Roever (Eds.), Lecture Notes in Computer Science, Vol. 5, pp. 364–387. Cited by: Figure 1, §1, §3.3.
 [6] (201106) Specification and verification: the Spec# experience. Communications of the ACM 54 (6), pp. 81–91. Cited by: Appendix F, §1, §1, §4.2, §4.
 [7] (2011) CVC4. In Computer Aided Verification, G. Gopalakrishnan and S. Qadeer (Eds.), Berlin, Heidelberg, pp. 171–177. External Links: ISBN 9783642221101 Cited by: §4.3.
 [8] (2017) The SMTLIB Standard: Version 2.6. Technical report Department of Computer Science, The University of Iowa. Note: Available at www.SMTLIB.org Cited by: Appendix B, Figure 2.
 [9] (2007) A reachability predicate for analyzing lowlevel software. In Tools and Algorithms for the Construction and Analysis of Systems, O. Grumberg and M. Huth (Eds.), Berlin, Heidelberg, pp. 19–33. External Links: ISBN 9783540712091 Cited by: Figure 15, Appendix F, §1, §1, §4.2, §4.
 [10] (2007) Practical reasoning about invocations and implementations of pure methods. In Fundamental Approaches to Software Engineering (FASE), M. B. Dwyer and A. Lopes (Eds.), LNCS, Vol. 4422, pp. 336–351. Cited by: §1.
 [11] (2008) Z3: an efficient SMT solver. In Tools and Algorithms for the Construction and Analysis of Systems, C. R. Ramakrishnan and J. Rehof (Eds.), Berlin, Heidelberg, pp. 337–340. External Links: ISBN 9783540788003 Cited by: §4.
 [12] (200505) Simplify: a theorem prover for program checking. J. ACM 52 (3), pp. 365–473. External Links: ISSN 00045411, Link, Document Cited by: Appendix F, §1, §4.2.
 [13] (2018) Nagini: a static verifier for python. In Computer Aided Verification (CAV), H. Chockler and G. Weissenbacher (Eds.), LNCS, Vol. 10982, pp. 596–603. External Links: Document Cited by: §1, §1.
 [14] (2021) F* issue 1848. Note: https://github.com/FStarLang/FStar/issues/1848 Cited by: §4.1.
 [15] (2015) PySMT: a solveragnostic library for fast prototyping of SMTbased algorithms. In SMT Workshop 2015, Cited by: Appendix F.
 [16] (2009) Complete instantiation for quantified formulas in satisfiabiliby modulo theories. In Computer Aided Verification, A. Bouajjani and O. Maler (Eds.), Berlin, Heidelberg, pp. 306–320. External Links: ISBN 9783642026584 Cited by: §1, §4.3, Table 1, §5.
 [17] (2013) Verification condition generation for permission logics with abstract predicates and abstraction functions. In European Conference on ObjectOriented Programming (ECOOP), G. Castagna (Ed.), Lecture Notes in Computer Science, Vol. 7920, pp. 451–476. Cited by: §1.
 [18] (2013) Firstorder theorem proving and Vampire. In Computer Aided Verification, N. Sharygina and H. Veith (Eds.), Berlin, Heidelberg, pp. 1–35. External Links: ISBN 9783642397998 Cited by: §4.3, Table 1, §5.
 [19] (2009) Reasoning about comprehensions with firstorder SMT solvers. In Proceedings of the 2009 ACM Symposium on Applied Computing, SAC’09, New York, NY, USA, pp. 615–622. External Links: ISBN 9781605581668, Link, Document Cited by: Appendix D, §1, §1.
 [20] (2008) Verification of equivalentresults methods. In European Symposium on Programming (ESOP), S. Drossopoulou (Ed.), Lecture Notes in Computer Science, Vol. 4960, pp. 307–321. Cited by: §1.
 [21] (2010) A polymorphic intermediate verification language: design and logical encoding. In Tools and Algorithms for the Construction and Analysis of Systems, J. Esparza and R. Majumdar (Eds.), Berlin, Heidelberg, pp. 312–327. External Links: ISBN 9783642120022 Cited by: §1.

[22]
(2010)
Dafny: an automatic program verifier for functional correctness.
In
Logic for Programming, Artificial Intelligence, and Reasoning
, E. M. Clarke and A. Voronkov (Eds.), Berlin, Heidelberg, pp. 348–370. External Links: ISBN 9783642175114 Cited by: §1, §1, §1, §4.1, §4.  [23] (2009) Programming with triggers. In SMT, ACM International Conference Proceeding Series, Vol. 375, pp. 20–29. Cited by: §1.
 [24] (2016) Viper: a verification infrastructure for permissionbased reasoning. In Verification, Model Checking, and Abstract Interpretation (VMCAI), B. Jobstmann and K. R. M. Leino (Eds.), LNCS, Vol. 9583, pp. 41–62. Cited by: §4.1, §4.
 [25] (2019) Towards bitwidthindependent proofs in SMT solvers. In Automated Deduction – CADE 27, P. Fontaine (Ed.), Cham, pp. 366–384. External Links: ISBN 9783030294366 Cited by: Appendix F, §4.2.
 [26] (2016) AVATAR modulo theories. In GCAI 2016. 2nd Global Conference on Artificial Intelligence, C. Benzmüller, G. Sutcliffe, and R. Rojas (Eds.), EPiC Series in Computing, Vol. 41, pp. 39–52. External Links: ISSN 23987340, Link, Document Cited by: §4.3, §5.
 [27] (2018) Revisiting enumerative instantiation. In Tools and Algorithms for the Construction and Analysis of Systems, D. Beyer and M. Huisman (Eds.), Cham, pp. 112–131. External Links: ISBN 9783319899633 Cited by: §4.3, Table 1, §5.
 [28] (2015) Counterexampleguided quantifier instantiation for synthesis in SMT. In Computer Aided Verification, D. Kroening and C. S. Păsăreanu (Eds.), Cham, pp. 198–216. External Links: ISBN 9783319216683 Cited by: §5.
 [29] (2008) Checking wellformedness of puremethod specifications. In Formal Methods (FM), J. Cuellar and T. Maibaum (Eds.), Lecture Notes in Computer Science, Vol. 5014, pp. 68–83. Cited by: §1.
 [30] (2012) Ematching with free variables. In Logic for Programming, Artificial Intelligence, and Reasoning, N. Bjørner and A. Voronkov (Eds.), Berlin, Heidelberg, pp. 359–374. External Links: ISBN 9783642287176 Cited by: §5.
 [31] (200801) VCC: contractbased modular verification of concurrent c. In 31st International Conference on Software Engineering, ICSE 2009, 31st International Conference on Software Engineering, ICSE 2009 edition. External Links: Link Cited by: Figure 15, Appendix F, §4.2, §4.
 [32] (2019) The 14th international satisfiability modulo theories competition (including pending benchmarks). Note: https://smtcomp.github.io/2019/,https://clcgitlab.cs.uiowa.edu:2443/SMTLIBbenchmarkstmp/benchmarkspending Cited by: Figure 10, Appendix C.
 [33] (2020) The 15th international satisfiability modulo theories competition. Note: https://smtcomp.github.io/2020/ Cited by: Figure 15, Appendix F, §1, §4.2, §4.
 [34] (2016) The CADE ATP System Competition  CASC. AI Magazine 37 (2), pp. 99–101. Cited by: §4.3, Table 1.
 [35] (2016) Dependent types and multimonadic effects in F*. In Proceedings of the 43rd Annual ACM SIGPLANSIGACT Symposium on Principles of Programming Languages, POPL ’16, New York, NY, USA, pp. 256–270. External Links: ISBN 9781450335492, Link, Document Cited by: Figure 9, §4.1, §4.
 [36] (2013) Verifying higherorder programs with the Dijkstra monad. In Proceedings of the 34th annual ACM SIGPLAN conference on Programming Language Design and Implementation, PLDI ’13, pp. 387–398. External Links: Link Cited by: §1, §1.
 [37] (2021) Viper test suite. Note: https://github.com/viperproject/silver/tree/master/src/test/resources Cited by: §4.1.
 [38] (2014) AVATAR: the architecture for firstorder theorem provers. In Computer Aided Verification, A. Biere and R. Bloem (Eds.), Cham, pp. 696–710. External Links: ISBN 9783319088679 Cited by: §5.
 [39] (2021) Gobra: modular specification and verification of go programs. In Computer Aided Verification (CAV), A. Silva and K. R. M. Leino (Eds.), pp. 367–379. Cited by: §4.1, §4.
Appendix A Background: Ematching
In this section, we briefly discuss the Ematchingrelated terminology and explain how this quantifierinstantiation algorithm works on an example.
Patterns vs triggering terms.
Patterns are syntactic hints attached to quantifiers which instruct the SMT solver when to perform an instantiation. In Fig. 2, the quantified formula will be instantiated only when a triggering term that matches the pattern is encountered during the SMT run (i.e., the triggering term is present in the quantifierfree part of the input formula or is obtained by the solver from the body of a previouslyinstantiated quantifier).
Ematching.
We now illustrate how Ematching works on the example from Fig. 2; in particular, we show how our synthesized triggering term helps the solver to prove unsat when added to the axiomatization ( is a fresh variable of type ). Due to space constraints, we omit unnecessary instantiations. The subterms and trigger the instantation of and , respectively. The solver obtains the body of the quantifiers for these particular values:
Since the first disjunct of evaluates to (from ), the solver learns that the second disjunct must hold (i.e., the length must be 1); we abbreviate it as L = 1. Further, the subterms and of the synthesized triggering term lead to the instantiation of and , respectively:
from triggers :
By equalizing the arguments of the outermost in , the solver learns that the first disjunct of is . The second disjunct must thus hold (i.e., the length should be positive); we abbreviate it as . Since , the unsatisfiability proof succeeds.
Appendix B Diverse models
In this section, we explain the importance of the parameter from Alg. 1 (the maximum number of models) and discuss heuristics for obtaining diverse models.
Let us consider the formula from Fig. 9, which was part of an axiomatization with 2,495 axioms. axiomatizes the uninterpreted function and is inconsistent, because there exist two integers whose real division ("/") is not an integer. The model produced by the solver for the formula is . is defined ("/" is a total function [8]), but its result is not specified. Thus the solver cannot validate this model (i.e., it returns unknown).
In such cases or when the candidate term does not generalize to all interpretations of the uninterpreted functions, we reiterate its construction, up to the bound (Alg. 1, line 11). For this, we strengthen the previouslyderived formula to force the solver find a different model. In Fig. 9, if we simply exclude previous models, we can obtain a sequence of models with different values for the numerator, but with the same value (0) for the denominator. There are infinitely many such models, and all of them fail to validate for the same reason.
There are various heuristics one can employ to guide the solver’s search for a new model and our algorithm can be parameterized with different ones. In our experiments, we interpret the conjunct from Alg. 1, line 19 as . The first component requires all the variables to have different values than before. This requirement may be too strong for some variables, but as we use only soft constraints, the solver may ignore some constraints if it cannot generate a satisfying assignment.
The second part requires models from different equivalence classes, where an equivalence class includes all the variables that are equal in the model. For example, if the model is , where is a value of the corresponding type, then and belong to the same equivalence class. Considering equivalence classes is particularly important for variables of uninterpreted types; the solver cannot provide actual values for them, thus it assigns fresh, unconstrained variables. However, different fresh variables do not lead to diverse models.
Appendix C Extensions
Next, we describe various extensions of our algorithm that enable complex proofs.
Combining multiple candidate terms.
In Alg. 1, each candidate term is validated separately. To enable proofs that require multiple instantiations of the same formula, we developed an extension that validates multiple triggering terms at the same time. In such cases, the algorithm returns a set of terms that are necessary and sufficient to prove unsat. Fig. 10 presents a simple example from SMTCOMP 2019 pending benchmarks [32]. The input is unsatisfiable, as there does not exist an interpretation for the function that satisfies all the constraints: requires to be ; if is instantiated for , the solver learns that must be as well; however, if , then must be , which is a contradiction. Exposing the inconsistency thus requires two instantiations of , triggered by and , respectively. We generate both triggering terms, but in separate iterations (independently, both fail to validate). However, by validating them simultaneously (i.e., conjoin both of them to I), our algorithm identifies the required triggering term .
Unification across multiple instantiations.
The clusters constructed by our algorithm are sets (see Alg. 2, line 12), so they contain a formula at most once, even if it is similar to multiple other formulas from the cluster. We thus consider the rewritings for multiple instantiations of the same formula separately, in different iterations. To handle cases that require multiple (but boundedly many) instantiations, we extend the algorithm with a parameter , which bounds the maximum frequency of a quantified conjunct within the formulas . That is, it allows a similar quantified formula, as well as itself, to be added to a cluster more than once (after performing variable renaming, to ensure that the names of the quantified variables are still globally unique). This results in an equisatisfiable formula for which our algorithm determines multiple triggering terms. Inputs whose unsatisfiability proofs require an unbounded number of instantiations typically contain a matching loop, thus we do not consider them here.
Typebased constraints.
The rewritings of the form can be too imprecise (especially for quantified variables of uninterpreted types), as they do not constrain the . In Fig. 11, the solver cannot provide concrete values of type for and , it can only assign fresh, unconstrained variables (e.g., and ). However, the triggering terms