Trace Logic for Inductive Loop Reasoning

by   Pamina Georgiou, et al.

We propose trace logic, an instance of many-sorted first-order logic, to automate the partial correctness verification of programs containing loops. Trace logic generalizes semantics of program locations and captures loop semantics by encoding properties at arbitrary timepoints and loop iterations. We guide and automate inductive loop reasoning in trace logic by using generic trace lemmas capturing inductive loop invariants. Our work is implemented in the RAPID framework, by extending and integrating superposition-based first-order reasoning within RAPID. We successfully used RAPID to prove correctness of many programs whose functional behavior are best summarized in the first-order theories of linear integer arithmetic, arrays and inductive data types.



There are no comments yet.


page 1

page 2

page 3

page 4


Verifying Relational Properties using Trace Logic

We present a logical framework for the verification of relational proper...

Diffy: Inductive Reasoning of Array Programs using Difference Invariants

We present a novel verification technique to prove interesting propertie...

Putting the Squeeze on Array Programs: Loop Verification via Inductive Rank Reduction

Automatic verification of array manipulating programs is a challenging p...

Inductive and Coinductive Predicate Liftings for Effectful Programs

We formulate a framework for describing behaviour of effectful higher-or...

The Prioritized Inductive Logic Programs

The limit behavior of inductive logic programs has not been explored, bu...

Program Verification via Predicate Constraint Satisfiability Modulo Theories

This paper presents a verification framework based on a new class of pre...

Local Reasoning about Parameterized Reconfigurable Distributed Systems

This paper presents a Hoare-style calculus for formal reasoning about re...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

One of the main challenges in automating software verification comes with handling inductive reasoning over programs containing loops. Until recently, automated reasoning in formal verification was the primary domain of satisfiability modulo theory (SMT) solvers 

De Moura and Bjørner (2008); Barrett et al. (2011), yielding powerful advancements for inferring and proving loop properties with linear arithmetic and limited use of quantifiers, see e.g. Karbyshev et al. (2015); Gurfinkel et al. (2018); Fedyukovich et al. (2019). Formal verification however also requires reasoning about unbounded data types, such as arrays, and inductively defined data types. Specifying, for example as shown in Figure 1, that every element in the array b is initialized by a non-negative array element of a requires reasoning with quantifiers and can be best expressed in many-sorted extensions of first-order logic. Yet, the recent progress in automation for quantified reasoning in first-order theorem proving has not yet been fully integrated in formal verification. In this paper we address such a use of first-order reasoning and propose trace logic , an instance of many-sorted first-order logic, to automate the partial correctness verification of program loops, by expressing program semantics in , and use in combination with superposition-based first-order theorem proving.


In our previous work Barthe et al. (2019), an initial version of trace logic was introduced to formalize and prove relational properties. In this paper, we go beyond Barthe et al. (2019) and turn trace logic into an efficient approach to loop (safety) verification. We propose trace logic as a unifying framework to reason about both relational and safety properties expressed in full first-order logic with theories. We bring the following contributions.

(i) We generalize the semantics of program locations by treating them as functions of execution timepoints. In essence, unlike other works  Bjørner et al. (2015); Kobayashi et al. (2020); Chakraborty et al. (2020); Ish-Shalom et al. (2020), we formalize program properties at arbitrary timepoints of locations.

(ii) Thanks to this generalization, we provide a non-recursive axiomatization of program semantics in trace logic and prove completeness of our axiomatization with respect to Hoare logic. Our semantics in trace logic supports arbitrary quantification over loop iterations (Section V).

(iii) We guide and automate inductive loop reasoning in trace logic , by using generic trace lemmas capturing inductive loop invariants (Section VI). We prove soundness of each trace lemma we introduce.

(iv) We bring first-order theorem proving into the landscape of formal verification, by extending recent results in superposition-based reasoning Gleiss et al. (2020); Gleiss and Suda (2020); Kovács et al. (2017) with support for trace logic properties, complementing SMT-based verification methods in the area (Section VI). As logical consequences of our trace lemmas are also loop invariants, superposition-based reasoning in trace logic enables to automatically find loop invariants that are needed for proving safety assertions of program loops.

(v) We implemented our approach in the Rapid framework and combined Rapid with new extensions of the first-order theorem prover Vampire. We successfully evaluated our work on more than 100 benchmarks taken from the SV-Comp repository Beyer (2019), mainly consisting of safety verification challenges over programs containing arrays of arbitrary length and integers (Section VII). Our experiments show that Rapid automatically proves safety of many examples that, to the best of our knowledge, cannot be handled by other methods.

Ii Running Example

1        func main() {
2                const Int[] a;
4                Int[] b;
5                Int i = 0;
6                Int j = 0;
7                while (i < a.length) {
8                        if (a[i]  0) {
9                                b[j] = a[i];
10                                j = j + 1:
11                        }
12                        i = i + 1;
13                }
14        }
15        assert (k$_\Int. \exists$l$_\Int.  ((0 \leq$ k j  a.length )   b(k) = a(l)))
Fig. 1: Program copying positive elements from array a to b.

We illustrate and motivate our work with Figure 1. This program iterates over a constant integer array a of arbitrary length and copies positive values into a new array b. We are interested in proving the safety assertion given at line 15: given that the length a.length of a is not negative, every element in b is an element from a. Expressing such a property requires alternations of quantifiers in the first-order theories of linear integer arithmetic and arrays, as formalized in line 15. We write and to specify that are of sort integer .

While the safety assertion of line 15 holds, proving correctness of Figure 1 is challenging for most state-of-the-art approaches, such as e.g. Gurfinkel et al. (2015); Karbyshev et al. (2015); Gurfinkel et al. (2018); Fedyukovich et al. (2019). The reason is that proving safety of Figure 1 needs inductive invariants with existential/alternating quantification and involves inductive reasoning over arbitrarily bounded loop iterations/timepoints. In this paper we address these challenges as follows.

(i) We extend the semantics of program locations to describe locations parameterized by timepoints, allowing us to express values of program variables at arbitrary program locations within arbitrary loop iterations. We write for example to denote the value of program variable i at location in a loop iteration , where the location corresponds to the program line 12. We reserve the constant for specifying the last program location , that is line 15, corresponding to a terminating program execution of Figure 1. We then write to capture the value of array b at timepoint and position . For simplicity, as a is a constant array, we simply write instead .

(ii) Exploiting the semantics of program locations, we formalize the safety assertion of line 15 in trace logic as follows:


(iii) We express the semantics of Figure 1 as a set of first-order formulas in trace logic , encoding values and dependencies among program variables at arbitrary loop iterations. To this end, we extend with so-called trace lemmas, to automate inductive reasoning in trace logic . One such trace lemma exploits the semantics of updates to j, allowing us to infer that every value of between to ), and thus each position at which the array b has been updated, is given by some loop iteration. Moreover, updates to j happen at different loop iterations and thus a position j at which b is updated is visited uniquely throughout Figure 1.

(iv) We finally establish validity of (1), by deriving (1) to be a logical consequence of .

Iii Preliminaries

We assume familiarity with standard first-order logic with equality and sorts. We write for equality and to denote that a logical variable has sort . We denote by the set of integer numbers and by the boolean sort. The term algebra of natural numbers is denoted by , with constructors and successor . We also consider the symbols and as part of the signature of , interpreted respectively as the predecessor function and less-than-equal relation.

Let be a first-order formula with one free variable of sort . We recall the standard (step-wise) induction schema for natural numbers as being


In our work, we use a variation of the induction schema (2) to reason about intervals of loop iterations. Namely, we use the following schema of bounded induction

where are term algebra expressions of , called respectively as left and right bounds of bounded induction.

Iv Programming Model

We consider programs written in an imperative while-like programming language . This section recalls terminology from Barthe et al. (2019), however adapted to our setting of safety verification. Unlike Barthe et al. (2019), we do not consider multiple program traces in . In Section V, we then introduce a generalized program semantics in trace logic , extended with reachability predicates.

func main()\{ context \}
if( condition )\{ context \} else \{ context \}
while( condition )\{ context \}
statement; … ; statement
Fig. 2: Grammar of .

Figure 2 shows the (partial) grammar of our programming model , emphasizing the use of contexts to capture lists of statements. An input program in has a single main-function, with arbitrary nestings of if-then-else conditionals and while-statements. We consider mutable and constant variables, where variables are either integer-valued numeric variables or arrays of such numeric variables. We include standard side-effect free expressions over booleans and integers.

Iv-a Locations and Timepoints

A program in is considered as sets of locations, with each location corresponding to positions/lines of program statements in the program. Given a program statement s, we denote by its (program) location. We reserve the location to denote the end of a program. For programs with loops, some program locations might be revisited multiple times. We therefore model locations corresponding to a statement s as functions of iterations when the respective location is visited. For simplicity, we write also for the functional representation of the location of s. We thus consider locations as timepoints of a program and treat them as being functions over iterations. The target sort of locations is . For each enclosing loop of a statement s, the function symbol takes arguments of sort , corresponding to loop iterations. Further, when s is a loop itself, we also introduce a function symbol with argument and target sort ; intuitively, corresponds to the last loop iteration of s. We denote the set of all function symbols as , whereas the set of all function symbols is written as .

Example 1

We refer to program statements s by their (first) line number in Figure 1. Thus, encodes the timepoint corresponding to the first assignment of i in the program (line 5). We write and to denote the timepoints of the first and last loop iteration, respectively. The timepoints and correspond to the beginning of the loop body in the second and the -th loop iterations, respectively.

Iv-B Expressions over Timepoints

We next introduce commonly used expressions over timepoints. For each while-statement w of , we introduce a function that returns a unique variable of sort for w, denoting loop iterations of w.

Let be the enclosing loops for statement s and consider an arbitrary term of sort . We define to be the expressions denoting the timepoints of statements s as

if s is non-while statement
if s is while-statement
if s is while-statement

If s is a while-statement, we also introduce to denote the last iteration of s. Further, consider an arbitrary subprogram p, that is, p is either a statement or a context. The timepoint (parameterized by an iteration of each enclosing loop) denotes the timepoint when the execution of p has started and is defined as

We also introduce the timepoint to denote the timepoint upon which a subprogram p has been completely evaluated and define it as

Finally, if is the topmost statement of the top-level context in main(), we define

Iv-C Program Variables

We express values of program variables v at various timepoints of the program execution. To this end, we model (numeric) variables v as functions where gives the value of v at timepoint . For array variables v, we add an additional argument of sort , corresponding to the position where the array is accessed; that is, . The set of such function symbols corresponding to program variables is denoted by .

Our framework for constant, non-mutable variables can be simplified by omitting the timepoint argument in the functional representation of such program variables, as illustrated below.

Example 2

For Figure 1, we denote by the value of program variable i before being assigned in line 5.

As the array variable a is non-mutable (specified by const in the program), we write for the value of array a at the position corresponding to the current value of i at timepoint . For the mutable array b, we consider timepoints where b has been updated and write for the array b at position j at the timepoint during the loop.

We emphasize that we consider (numeric) program variables v to be of sort , whereas loop iterations are of sort .

Iv-D Program Expressions

Arithmetic constants and program expressions are modeled using integer functions and predicates. Let e be an arbitrary program expression and write to denote the value of the evaluation of e at timepoint .

Let , that is a function denoting a program variable v. Consider to be program expressions and let denote two timepoints. We define

to denote that the program variable v has the same values at and .

We further introduce

to define that all program variables have the same values at timepoints and . We also define

asserting that the numeric program variable v has been updated while all other program variables v’ remain unchanged. This definition is further extended to array updates as

Example 3

In Figure 1, we refer to the value of i+1 at timepoint as . Let be the set of function symbols representing the program variables of Figure 1.

For an update of j in line 10 at some iteration , we derive

V Axiomatic Semantics in Trace Logic

Trace logic has been introduced in Barthe et al. (2019), yet for the setting of relational verification. In this paper we generalize the formalization of Barthe et al. (2019) in three ways. First, (i) we define program semantics in a non-recursive manner using the predicate to characterize the set of reachable locations within a given program context (Section V-B). Second, and most importantly, (ii) we prove completeness of trace logic with respect to Hoare Logic (Theorem 2), which could have not been achieved in the setting of Barthe et al. (2019). Finally, (iii) we introduce the use of logic for safety verification (Section VI).

V-a Trace Logic

Trace logic is an instance of many-sorted first-order logic with equality. We define the signature of trace logic as

containing the signatures of the theory of natural numbers (term algebra) and integers , as well the respective sets of timepoints, program variables and last iteration symbols as defined in section IV.

We next define the semantics of in trace logic .

V-B Reachability and its Axiomatization

We introduce a predicate to capture the set of timepoints reachable in an execution and use to define the axiomatic semantics of in trace logic . We define reachability as a predicate over timepoints, in contrast to defining reachability as a predicate over program configurations such as in Hoder and Bjørner (2012); Bjørner et al. (2015); Fedyukovich et al. (2019); Ish-Shalom et al. (2020).

We axiomatize using trace logic formulas as follows.

Definition 1 (-predicate)

For any context , any statement s, let be the expression denoting a potential branching condition in s. We define

For any non-while statement occurring in context c, let

and for any while-statement occurring in context c, let

Finally let .

Note that our reachability predicate allows specifying properties about intermediate timepoints (since those properties can only hold if the referred timepoints are reached) and supports reasoning about which locations are reached.

V-C Axiomatic Semantics of

We axiomatize the semantics of each program statement in , and define the semantics of a program in as the conjunction of all these axioms.


Let p be an arbitrary, but fixed program in ; we give our definitions relative to p. The semantics of p, denoted by , consists of a conjunction of one implication per statement, where each implication has the reachability of the start-timepoint of the statement as premise and the semantics of the statement as conclusion:

where is the set of iterations of all enclosing loops of some statement s in p, and the semantics of program statements s is defined as follows.


Let s be a statement skip. Then

Integer assignments

Let s be an assignment v = e, where v is an integer-valued program variable and e is an expression. The evaluation of s is performed in one step such that, after the evaluation, the variable v has the same value as e before the evaluation. All other variables remain unchanged and thus

Array assignments

Consider s of the form a[e$_1$] = e$_2$, with a being an array variable and being expressions. The assignment is evaluated in one step. After the evaluation of s, the array a contains the value of e before the evaluation at position corresponding to the value of e before the evaluation. The values at all other positions of a and all other program variables remain unchanged and hence

Conditional if-then-else Statements

Let s be if(Cond)\{c$_1$\} else \{c$_2$\}. The semantics of s states that entering the if-branch and/or entering the else-branch does not change the values of the variables and we have


where the semantics of the expression Cond is according to Section IV-D.


Let s be the while-statement while(Cond)\{c\}. We refer to Cond as the loop condition. The semantics of s is captured by conjunction of the following three properties: (7a) the iteration is the first iteration where Cond does not hold, (7b) entering the loop body does not change the values of the variables, (7c) the values of the variables at the end of evaluating s are the same as the variable values at the loop condition location in iteration . As such, we have


V-D Soundness and Completeness.

The axiomatic semantics of in trace logic is sound. That is, given a program p in and a trace logic property , we have that any interpretation in is a model of according to the small-step operational semantics of . We conclude the next theorem - and refer to Appendix LABEL:sec:soundness for details.

Theorem 1 (-Soundness)

Let p be a program. Then the axiomatic semantics is sound with respect to standard small-step operational semantics.

Next, we show that the axiomatic semantics of in trace logic is complete with respect to Hoare logic Hoare (1969), as follows.

Intuitively, a Hoare Triple corresponds to the trace logic formula


where the expressions and denote the result of adding to each program variable in and the timepoints respectively as first arguments. We therefore define that the axiomatic semantics of is complete with respect to Hoare logic, if for any Hoare triple valid relative to the background theory , the corresponding trace logic formula (8) is derivable from the axiomatic semantics of in the background theory . With this definition at hand, we get the following result, proved formally in Appendix LABEL:sec:completeness.

Theorem 2 (-Completeness with respect to Hoare logic)

The axiomatic semantics of in trace logic is complete with respect to Hoare logic.

Vi Trace Logic for Safety Verification

We now introduce the use of trace logic for verifying safety properties of programs. We consider safety properties expressed in first-order logic with theories, as illustrated in line 15 of Figure 1. Thanks to soundness and completeness of the axiomatic semantics of , a partially correct program p with regard to can be proved to be correct using the axiomatic semantics of in trace logic . That is, we assume termination and establish partial program correctness. Assuming the existence of an iteration violating the loop condition can be help backward reasoning and, in particular, automatic splitting of loop iteration intervals.

However, proving correctness of a program p annotated with a safety property faces the reasoning challenges of the underlying logic, in our case of trace logic. Due to the presence of loops in , a challenging aspect in using trace logic for safety verification is to handle inductive reasoning as induction cannot be generally expressed in first-order logic. To circumvent the challenge of inductive reasoning and automate verification using trace logic, we introduce

a set of first-order lemmas, called trace lemmas, and extend the semantics of programs in trace logic with these trace lemmas. Trace lemmas describe generic inductive properties over arbitrary loop iterations and any logical consequence of trace lemmas yields a valid program loop property as well. We next summarize our approach to program verification using trace logic and then address the challenge of inductive reasoning in trace logic .

Vi-a Safety Verification in Trace Logic

Given a program p in and a safety property ,

  • we express program semantics in trace logic , as given in Section V;

  • we formalize the safety property in trace logic , that is we express by using program variables as functions of locations and timepoints (similarly as in (1)). For simplicity, let us denote the trace logic formalization of also by ;

  • we introduce instances of a set of trace lemmas, by instantiating trace lemmas with program variables, locations and timepoints of p;

  • to verify , we then show that is a logical consequence of ;

  • however to conclude that p is partially correct with regard to , two more challenges need to be addressed. First, in addition to Theorem 1, soundness of our trace lemmas needs to be established, implying that our trace lemma instances are also sound. Soundness of implies then validity of , whenever is proven to be a logical consequence of sound formulas . However, to ensure that is provable in trace logic, as a second challenge we need to ensure that our trace lemmas , and thus their instances , are strong enough to prove . That is, proving that is a safety assertion of p in our setting requires finding a suitable set of trace lemmas.

In the remaining of this section, we address (v) and show that our trace lemmas are sound consequences of bounded induction (Section VI-B). Practical evidence for using our trace lemmas are further given in Section VII-B.

Vi-B Trace Lemmas for Verification

Trace logic properties support arbitrary quantification over timepoints and describe values of program variables at arbitrary loop iterations and timepoints. We therefore can relate timepoints with values of program variables in trace logic , allowing us to describe the value distributions of program variables as functions of timepoints throughout program executions. As such, trace logic supports

  1. reasoning about the existence of a specific loop iteration, allowing us to split the range of loop iterations at a particular timepoint, based on the safety property we want to prove. For example, we can express and derive loop iterations corresponding to timepoints where one program variable takes a specific value for the first time during loop execution;

  2. universal quantification over the array content and range of loop iterations bounded by two arbitrary left and right bounds, allowing us to apply instances of the induction scheme (III) within a range of loop iterations bounded, for example, by and for some while-statement s.

Addressing these benefits of trace logic, we

express generic patterns of inductive program properties as trace lemmas.

Identifying a suitable set of trace lemmas to automate inductive reasoning in trace logic is however challenging and domain-specific. We propose three trace lemmas for inductive reasoning over arrays and integers, by considering


one trace lemma

describing how values of program variables change during an interval of loop iterations;


two trace lemmas to describe the behavior of loop counters.

We prove soundness of our trace lemmas - below we include only one proof and refer to Appendix LABEL:sec:trace-lemmas for further details.

(A1) Value Evolution Trace Lemma

Let w be a while-statement, let v be a mutable program variable and let be a reflexive and transitive relation - that is or in the setting of trace logic. The value evolution trace lemma of w, v, and is defined as


In our work, the value evolution trace lemma is mainly instantiated with the equality predicate to conclude that the value of a variable does not change during a range of loop iterations, provided that the variable value does not change at any of the considered loop iterations.

Example 4

For Figure 1, the value evaluation trace lemma (A1) yields the property

which allows to prove that the value of b at some position j remains the same from the timepoint the value was first set until the end of program execution. That is, we derive .

We next prove soundness of our trace lemma (A1).

Proof (Soundness Proof of Value Evolution Trace Lemma (A1)) Let and be arbitrary but fixed and assume that the premise of the outermost implication of (A1) holds. That is,


We use the induction axiom scheme (III) and consider its instance with , yielding the following instance of (III):


Note that the base case property (10a) holds since is reflexive. Further, the inductive case (10b) holds also since it is implied by (9). We thus derive property (10c), and in particular . Since is reflexive, we conclude , proving thus our trace lemma (A1).

(B1) Intermediate Value Trace Lemma

Let w be a while-statement and let v be a mutable program variable. We call v to be dense if the following holds:

The intermediate value trace lemma of w and v is defined as


The intermediate value trace lemma (B1) allows us conclude that if the variable v is dense, and if the value is between the value of v at the beginning of the loop and the value of v at the end of the loop, then there is an iteration in the loop, where v has exactly the value and is incremented. This trace lemma is mostly used to find specific iterations corresponding to positions in an array.

Example 5

In Figure 1, using trace lemma (B1) we synthesize the iteration such that .

(B2) Iteration Injectivity Trace Lemma

Let w be a while-statement and let v be a mutable program variable. The iteration injectivity trace lemma of w and v is


The trace lemma (B2) states that a strongly-dense variable visits each array-position at most once. As a consequence, if each array position is visited only once in a loop, we know that its value has not changed after the first visit, and in particular the value at the end of the loop is the value after the first visit.

Example 6

Trace lemma (B2) is necessary in Figure 1 to apply the value evolution trace lemma (A1) for b, as we need to make sure we will never reach the same position of j twice.

Based on the soundness of our trace lemmas, we conclude the next result.

Theorem 3 (Trace Lemmas and Induction)

Let p be a program. Let be a trace lemma for some while-statement w of p and some variable v of p. Then is a consequence of the bounded induction scheme (III) and of the axiomatic semantics of in trace logic .

Vii Implementation and Experiments

Vii-a Implementation

We implemented our approach in the Rapid tool, written in C++ and available at

Rapid takes as input a program in the while-language together with a property expressed in trace logic using the smt-lib syntax Barrett et al. (2017). Rapid outputs (i) the program semantics as in Section V, (ii) instantiations of trace lemmas for each mutable variable and for each loop of the program, as discussed in Section VI-B, and (iii) the safety property, expressed in trace logic and encoded in the smt-lib syntax.

For establishing safety, we pass the generated reasoning task to the first-order theorem prover Vampire Kovács and Voronkov (2013) to prove the safety property from the program semantics and the instantiated trace lemmas111We also established the soundness of each trace lemma instance separately by running additional validity queries with Vampire., as discussed in Section VI-A. Vampire searches for a proof by refuting the negation of the property based on saturation of a set of clauses with respect to a set of inference rules such as resolution and superposition.

In our experiments, we use a custom version222 of Vampire with a timeout of 60 seconds, in two different configurations. On the one hand, we use a configuration Rapid, where we tune Vampire to the trace logic domain using (i) existing options and (ii) domain-specific implementation to guide the high-level proof search. On the other hand, we use a configuration Rapid, which extends Rapid with recent techniques from Gleiss et al. (2020); Gleiss and Suda (2020) improving theory reasoning in equational theories. As such, Rapid represents the result of a fundamental effort to improve Vampire’s reasoning for software verification. In particular, theory split queues Gleiss and Suda (2020) present a partial solution to the prevalent challenge of combining quantification and light-weight theory reasoning, drastically improving first-order reasoning in applications of software verification, as shown next.

Vii-B Experimental Results

Benchmark Rapid Rapid atleast_one_iteration_0 atleast_one_iteration_1 find_sentinel find1_0 - find1_1 - find2_0 - find2_1 indexn_is_arraylength_0 indexn_is_arraylength_1 - set_to_one str_cpy_3 both_or_none - check_equal_set_flag_1 - collect_indices_eq_val_0 - collect_indices_eq_val_1 - copy - copy_absolute_0 - copy_absolute_1 - copy_nonzero_0 - copy_partial - copy_positive_0 - copy_two_indices - find_max_0 - find_max_2 - find_max_from_second_0 - - find_max_local_2 - - find_max_up_to_0 - - find_max_up_to_2 - - find_min_0 - find_min_2 - find_min_local_2 - - find_min_up_to_0 - - find_min_up_to_2 - - find1_4 - find2_4 Benchmark Rapid Rapid in_place_max - inc_by_one_0 - inc_by_one_1 - inc_by_one_harder_0 - inc_by_one_harder_1 - init - init_conditionally_0 - init_conditionally_1 - init_non_constant_0 - init_non_constant_1 - init_non_constant_2 - init_non_constant_3 - init_non_constant_easy_0 - init_non_constant_easy_1 - init_non_constant_easy_2 - init_non_constant_easy_3 - init_partial - init_prev_plus_one_0 - init_prev_plus_one_1 - init_prev_plus_one_alt_0 - init_prev_plus_one_alt_1 - max_prop_0 - max_prop_1 - merge_interleave_0 - - merge_interleave_1 - - min_prop_0 - min_prop_1 - partition_0 - partition_1 - push_back - reverse - str_cpy_0 - str_cpy_1 - str_cpy_2 swap_0 - Benchmark Rapid Rapid swap_1 - vector_addition - vector_subtraction - check_equal_set_flag_0 find_max_1 - - find_max_from_second_1 - - find1_2 find1_3 find2_2 find2_3 collect_indices_eq_val_2 - collect_indices_eq_val_3 - - copy_nonzero_1 - copy_positive_1 - find_max_local_0 - - find_max_local_1 - - find_max_up_to_1 - - find_min_1 - - find_min_local_0 - - find_min_local_1 - - find_min_up_to_1 - - merge_interleave_2 - - partition_2 - partition_3 - partition_4 - - partition_5 - partition_6 - - partition-harder_0 - partition-harder_1 - partition-harder_2 - - partition-harder_3 - - partition-harder_4 - - str_len Total solved 15 78
TABLE I: Experimental results

We considered challenging Java- and C-like verification benchmarks from the SV-Comp repository Beyer (2019), containing the combination of loops and arrays. We omitted those examples for which the task is to find bugs in form of counterexample traces, as well as those examples that cannot be expressed in our programming model , such as examples with explicit memory management. In order to improve the set of benchmarks, we also included additional challenging programs and functional properties. As a result, we obtained benchmarks ranging over 45 unique programs with a total of 103 tested properties. Our benchmarks are available in the Rapid repository333

We manually transformed those benchmarks into our input format. SV-Comp benchmarks encode properties featuring universal quantification by extending the corresponding program with an additional loop containing a standard C-like assertion. For instance, the property

would be encoded by extending the program with a loop

for(int i = 0; i < a.length; i++)

While this encoding loses explicit structure and results in a harder reasoning task, it is necessary as other tools do not support explicit universal quantification in their input language. In contrast, our approach can handle arbitrarily quantified properties over unbounded data structures. We, thus, directly formulate universally quantified properties, without using any program transformations.

The results of our experiments are presented in Table 1. We divided the results in four segments in the following order: the first eleven problems are quantifier-free, the largest part of 62 problems are universally quantified, seven problems are existentially quantified, while the last 23 problems contain quantifier alternations. First, we are interested in the overall number of problems we are able to prove correct. In the configuration Rapid, which represents our main configuration, Vampire is able to prove 78 out of 103 encodings. In particular, we verify Figure 1, corresponding to benchmark copy_positive_1, as well as other challenging properties that involve quantifier alternations, such as partition_5.

Second, we are interested in comparing the results for configurations Rapid and Rapid, in order to understand the importance of recently developed techniques from Gleiss et al. (2020) and Gleiss and Suda (2020) for reasoning in the trace logic domain. While Rapid is only able to prove 15 out of 103 properties, Rapid is able to prove 78 properties, that is, Rapid improves over Rapid by 63 examples. Moreover, only Rapid is able to prove advanced properties involving quantifier alternations. We therefore see that Rapid drastically outperforms Rapid, suggesting that the recently developed techniques are essential for efficient reasoning in trace logic.

Third, we are interested in what kinds of properties Rapid can prove. It comes with no surprise that all quantifier-free instances could be proved. Out of 62 universally quantified properties, Rapid could establish correctness of 53 such properties. More interestingly, Rapid proves 14 out of 30 benchmarks containing either existentially quantified properties or such with quantifier alternations. The benchmarks that could not be solved by Rapid are primarily universally and alternatingly quantified properties that need additional trace lemmas relating values of multiple program variables.

Comparing with other tools. We compare our work against other approaches in VIII. Here, we omit a direct comparison of Rapid with other tools for the following reasons:
(1) Our benchmark suite includes 62 universally quantified and 11 non-quantified properties that could technically be supported by state-of-the-art tools such as Spacer/SeaHorn and FreqHorn. Our benchmarks, however, also include 30 benchmarks with existential (7 examples) and alternating quantification (23 examples) that these tools cannot handle. As these examples depend on invariants that are alternatingly or at least existentially quantified, we believe these other tools cannot solve these benchmarks, while Rapid could solve 14 examples in this domain.
(2) In our preliminary work Barthe et al. (2019), we already compared our reasoning within Rapid against Z3 and CVC4. These experiments showed that due to the fundamental difference in handling variables as functions over timepoints in our semantics, Rapid outperformed SMT-based reasoning approaches.
(3) Our program semantics is different than the one used in Horn clause verification techniques.

Concerning previous approaches with first-order reasoners, the benchmarks of Gleiss et al. (2018) represent a subset of 55 examples from our current benchmark suite: only 21 examples from our benchmark suite could be proved by Gleiss et al. (2018). For instance, our example in Figure 1 could not be proven in Gleiss et al. (2018). We believe that our work can be combined with approaches from Kovács and Voronkov (2009); Gleiss et al. (2018) to non-trivial invariants and loop bounds from saturation-based proof search. Our work can, thus, complement existing tools in proving complex quantified properties.

Viii Related Work

Our work is closely related to recent efforts in using first-order theorem provers for proving software properties Kovács and Voronkov (2009); Gleiss et al. (2018). While Gleiss et al. (2018) captures programs semantics in the first-order language of extended expressions over loop iterations, in our work we further generalize the semantics of program locations and consider program expressions over loop iterations and arbitrary timepoints. Further, we introduce and prove trace lemmas to automate inductive reasoning based on bounded induction over loop iterations. Our generalizations in trace logic proved to be necessary to automate the verification of properties with arbitrary quantification, which could not be effectively achieved in Gleiss et al. (2018). Our work is not restricted to reasoning about single loops as in Gleiss et al. (2018).

Compared to Barthe et al. (2019), we provide a non-recursive generalization of the axiomatic semantics of programs in trace logic, prove completeness of our axiomatization in trace logic, ensure soundness of our trace lemmas and use trace logic for safety verification.

In comparison to verification approaches based on program transformations Kobayashi et al. (2020); Chakraborty et al. (2020); Yang et al. (2019), we do not require user-provided functions to transform program states to smaller-sized states Ish-Shalom et al. (2020), nor are we restricted to universal properties generated by symbolic executions Chakraborty et al. (2020). Rather, we use only three trace lemmas that we prove sound and automate the verification of first-order properties, possibly with alternations of quantifiers.

The works Dillig et al. (2010); Cousot et al. (2011) consider expressive abstract domains and limit the generation of universal invariants to these domains, while supporting potentially more generic program grammars than our language. Our work however can verify universal and/or existential first-order properties with theories, which is not the case in Kobayashi et al. (2020); Chakraborty et al. (2020); Dillig et al. (2010); Cousot et al. (2011). Verifying universal loop properties with arrays by implicitly finding invariants is addressed in Gurfinkel et al. (2018); Fedyukovich et al. (2019); Komuravelli et al. (2015); Fedyukovich et al. (2017); Fedyukovich and Bodík (2018); Matsushita et al. (2020), and by using constraint horn clause reasoning within property-driven reachability analysis in Hoder and Bjørner (2012); Cimatti and Griggio (2012).

Another line of research proposes abstraction and lazy interpolation 

Alberti et al. (2012); Afzal et al. (2020), as well as recurrence solving with SMT-based reasoning Rajkhowa and Lin (2018). Synthesis-based approaches, such as Fedyukovich et al. (2019), are shown to be successful when it comes to inferring universally quantified invariants and proving program correctness from these invariants. Synthesis-based term enumeration is used also in Yang et al. (2019) in combination with user-provided invariant templates. Compared to these works, we do not consider programs only as a sequence of states, but model program values as functions of loop iterations and timepoints. We synthesize bounds on loop iterations and infer first-order loop invariants as logical consequences of our trace lemmas and program semantics in trace logic.

Ix Conclusion

We introduced trace logic to reason about safety loop properties over arrays. Trace logic supports explicit timepoint reasoning to allow arbitrary quantification over loop iterations. We use trace lemmas as consequences of bounded induction to automated inductive loop reasoning in trace logic. We formalize the axiomatic semantics of programs in trace logic and prove it to be both sound and complete. We report on our implementation in the Rapid framework, allowing us to use superposition-based reasoning in trace logic for verifying challenging verification examples. Generalizing our work to termination analysis and extending our programming language, and its semantics in trace logic, with more complex constructs are interesting tasks for future work.

Acknowledgements. This work was funded by the ERC Starting Grant 2014 SYMCAR 639270, the ERC Proof of Concept Grant 2018 SYMELS 842066, the Wallenberg Academy Fellowship 2014 TheProSE, and the Austrian FWF research project W1255-N23.


  • M. Afzal, S. Chakraborty, A. Chauhan, B. Chimdyalwar, P. Darke, A. Gupta, S. Kumar, C. Babu, D. Unadkat, and R. Venkatesh (2020) VeriAbs: verification by abstraction and test generation (competition contribution). In TACAS, pp. 383–387. Cited by: §VIII.
  • F. Alberti, R. Bruttomesso, S. Ghilardi, S. Ranise, and N. Sharygina (2012) Lazy abstraction with interpolants for arrays. In LPAR, pp. 46–61. Cited by: §VIII.
  • C. Barrett, C. L. Conway, M. Deters, L. Hadarean, D. Jovanović, T. King, A. Reynolds, and C. Tinelli (2011) CVC4. In CAV, pp. 171–177. Cited by: §I.
  • C. Barrett, P. Fontaine, and C. Tinelli (2017) The SMT-LIB standard: version 2.6. Technical report Department of Computer Science, The University of Iowa. Note: Available at Cited by: §VII-A.
  • G. Barthe, R. Eilers, P. Georgiou, B. Gleiss, L. Kovács, and M. Maffei (2019) Verifying relational properties using trace logic. In FMCAD, pp. 170–178. Cited by: §I, §IV, §V, §VII-B, §VIII.
  • D. Beyer (2019) Automatic verification of c and java programs: sv-comp 2019. In TACAS, pp. 133–155. Cited by: §I, §VII-B.
  • N. Bjørner, A. Gurfinkel, K. McMillan, and A. Rybalchenko (2015) Horn clause solvers for program verification. In Fields of Logic and Computation II, pp. 24–51. Cited by: §I, §V-B.
  • S. Chakraborty, A. Gupta, and D. Unadkat (2020) Verifying array manipulating programs with full-program induction. In TACAS, pp. 22–39. Cited by: §I, §VIII, §VIII.
  • A. Cimatti and A. Griggio (2012) Software model checking via ic3. In CAV, pp. 277–293. Cited by: §VIII.
  • P. Cousot, R. Cousot, and F. Logozzo (2011) A Parametric Segmentation Functor for Fully Automatic and Scalable Array Content Analysis. In POPL, pp. 105–118. Cited by: §VIII.
  • L. De Moura and N. Bjørner (2008) Z3: an efficient SMT solver. In TACAS, pp. 337–340. Cited by: §I.
  • I. Dillig, T. Dillig, and A. Aiken (2010) Fluid Updates: Beyond Strong vs. Weak Updates. In ESOP, pp. 246–266. Cited by: §VIII.
  • G. Fedyukovich and R. Bodík (2018) Accelerating syntax-guided invariant synthesis. In TACAS, pp. 251–269. Cited by: §VIII.
  • G. Fedyukovich, S. J. Kaufman, and R. Bodík (2017) Sampling invariants from frequency distributions. In FMCAD, pp. 100–107. Cited by: §VIII.
  • G. Fedyukovich, S. Prabhu, K. Madhukar, and A. Gupta (2019) Quantified invariants via syntax-guided synthesis. In CAV, pp. 259–277. Cited by: §I, §II, §V-B, §VIII, §VIII.
  • B. Gleiss, L. Kovács, and J. Rath (2020) Subsumption demodulation in first-order theorem proving. In IJCAR, Cited by: §I, §VII-A, §VII-B.
  • B. Gleiss, L. Kovács, and S. Robillard (2018) Loop analysis by quantification over iterations. In LPAR, pp. 381–399. Cited by: §VII-B, §VIII.
  • B. Gleiss and M. Suda (2020) Layered clause selection for theory reasoning. In IJCAR, Cited by: §I, §VII-A, §VII-B.
  • A. Gurfinkel, T. Kahsai, A. Komuravelli, and J. A. Navas (2015) The seahorn verification framework. In CAV, pp. 343–361. Cited by: §II.
  • A. Gurfinkel, S. Shoham, and Y. Vizel (2018) Quantifiers on demand. In ATVA, pp. 248–266. Cited by: §I, §II, §VIII.
  • C. A. R. Hoare (1969) An axiomatic basis for computer programming. Communications of the ACM 12 (10), pp. 576–580. Cited by: §V-D.
  • K. Hoder and N. Bjørner (2012) Generalized property directed reachability. In SAT, pp. 157–171. Cited by: §V-B, §VIII.
  • O. Ish-Shalom, S. Itzhaky, N. Rinetzky, and S. Shoham (2020) Putting the squeeze on array programs: loop verification via inductive rank reduction. In VMAI, pp. 112–135. Cited by: §I, §V-B, §VIII.
  • A. Karbyshev, N. Bjørner, S. Itzhaky, N. Rinetzky, and S. Shoham (2015) Property-directed inference of universal invariants or proving their absence. In CAV, pp. 583–602. Cited by: §I, §II.
  • N. Kobayashi, G. Fedyukovich, and A. Gupta (2020) Fold/unfold transformations for fixpoint logic. In TACAS, pp. 195–214. Cited by: §I, §VIII, §VIII.
  • A. Komuravelli, N. Bjorner, A. Gurfinkel, and K. L. McMillan (2015) Compositional verification of procedural programs using horn clauses over integers and arrays. In FMCAD, pp. 89–96. Cited by: §VIII.
  • L. Kovács, S. Robillard, and A. Voronkov (2017) Coming to terms with quantified reasoning. In POPL, pp. 260–270. Cited by: §I.
  • L. Kovács and A. Voronkov (2009) Finding loop invariants for programs over arrays using a theorem prover. In FASE, pp. 470–485. Cited by: §VII-B, §VIII.
  • L. Kovács and A. Voronkov (2013) First-order theorem proving and vampire. In CAV, pp. 1–35. Cited by: §VII-A.
  • Y. Matsushita, T. Tsukada, and N. Kobayashi (2020) RustHorn: chc-based verification for rust programs. In ESOP, pp. 484–514. Cited by: §VIII.
  • P. Rajkhowa and F. Lin (2018) Extending viap to handle array programs. In VSTTE, pp. 38–49. Cited by: §VIII.
  • W. Yang, G. Fedyukovich, and A. Gupta (2019) Lemma synthesis for automating induction over algebraic data types. In CP, pp. 600–617. Cited by: §VIII, §VIII.