Modern software verification techniques employ a number of heuristics for reasoning about loops. While these heuristics are often effective, they are unpredictable. For example, an abstract interpreter may fail to find the most precise invariant expressible in the language of its abstract domain due to imprecise widening; similarly, a software-model checker might fail to terminate because it generates interpolants that are insufficiently general. This paper presents a loop summarization technique that is powerful, in the sense that it generates expressive loop invariants, andpredictable, in the sense that we can make theoretical guarantees about invariant quality.
The key idea behind our technique is to leverage reachability results of vector addition systems (VAS) for invariant generation. Vector addition systems are a class of infinite-state transition systems with decidable reachability; VAS are classically used as a model of parallel systems . We consider a variation of VAS, rational VAS with resets (-VASR), wherein there is a finite number of rational-typed variables and a finite set of transitions that simultaneously update each variable in the system by either adding a constant value or (re)setting the variable to a constant value. Our interest in -VASRs stems from the fact that there is (polytime) procedure to compute a linear arithmetic formula that represents a -VASR’s reachability relation .
Since the reachability relation of a -VASR is computable, the dynamics of -VASR can be analyzed without relying on heuristic techniques. However, there is a gap between -VASR and the loops that we are interested in summarizing. The latter typically use a rich set of operations (memory manipulation, conditionals, non-constant increments, non-linear arithmetic, etc) and cannot be analyzed precisely. We bridge the gap with a procedure that, for any loop, synthesizes a -VASR that simulates the loop. The reachability relation of the -VASR can be used to over-approximate the behavior of the loop. Moreover, we prove that if a loop is expressed in linear rational arithmetic (LRA), our procedure synthesizes a best -VASR abstraction, in the sense that it simulates any other -VASR that simulates the loop. That is, the procedure does not make arbitrary heuristic choices, but rather synthesizes a best approximation of the loop in the language of -VASR.
-VASRs over-approximate multi-paths loops by treating the choice between paths as non-deterministic. We show that we can recover some conditional control flow information and inter-path control dependencies by partitioning the states of the loop, and encoding this partitioning by extending -VASR with control states (-VASR with states, -VASRS). We give a procedure for synthesizing a -VASRS that simulates a given loop; we may then use the reachability relation of a -VASRS to summarize the loop. We prove that, for a fixed program state partition, this procedure computes best -VASRS abstractions for LRA formulas. Additionally, we give a state-partitioning algorithm that yields a monotone loop summarization procedure (more accurate information about loop bodies result in more accurate loop summaries).
Finally, we note that our analysis techniques extend to complex control structures, such as nested loops, by employing summarization compositionally (i.e., “bottom-up”). For example, our analysis summarizes a nested loop by first summarizing its inner loops, and then uses the summaries to analyze the outer loop. As a result of compositionality, our analysis can be applied to partial programs, is easy to parallelize, and has the potential to scale to large code bases.
The main contributions of the paper are as follows:
We present a procedure to synthesize -VASR abstractions of transition formulas. For transition formula in linear rational arithmetic, this -VASR is a best abstraction.
We present a technique for improving the precision of our analysis by using -VASR with states to capture loop control structure.
We implement the proposed invariant generation technique and show that its ability to verify user assertions is comparable to software model checkers, while providing theoretical guarantees of termination and invariant quality.
This section illustrates the high-level structure of our invariant generation scheme. The goal is to compute a transition formula that summarizes the behavior of a given program. A transition formula is a formula over a set of program variables Var along with primed copies , representing the state of the program before and after executing a computation (respectively). For any given program , a transition formula can be computed by recursion on syntax:111This style of analysis can be extended from a simple block-structured language to one with control flow and recursive procedures using the framework of algebraic program analysis [18, 11].
where is a function that computes an over-approximation of the transitive closure of a transition formula. The contribution of this paper is a method for computing this operation, which is based on first over-approximating the input transition formula by a -VASR, and then computing the (exact) reachability relation of the -VASR.
We illustrate the analysis on the integer model of a persistent queue data structure pictured in Figure 1. The example consists of two operations (enqueue and dequeue), as well as a test harness (harness) that non-deterministially executes enqueue and dequeue operations. The queue achieves amortized time enqueue and queue by implementing the queue as two lists, front and back (whose lengths are modeled as front_len and back_len, respectively); the sequence of elements in the queue is the front list followed by the reverse of the back list. We will show that the queue operates in amortized time by finding a summary for harness that implies a linear bound on mem_ops (the number of memory operations in the computation) in terms of nb_ops (the total number of enqueue/dequeue operations executed in some sequence of operations).
To analyze the queue, we procede compositionally, in “bottom-up” fashion (i.e., starting from deeply-nested code and working our way back up to a summary for harness). There are two loops of interest, one in dequeue and one in harness. Since the dequeue loop is nested inside the harness loop, dequeue is analyzed first. We first compute a transition formula that represents one exeution of the body of the loop:
Observe that each variable in the loop is incremented by a constant value. As a result, the loop update can be captured faithfully by a vector addition system. In particular, we see that this loop body formula is simulated by a four-dimensional vector addition system, where the simulation relation and -VASR are as follows:
A formula representing the reachability relation of a vector addition system can be computed in polytime. For the case of , a formula representing steps of the -VASR is simply
To capture information about the pre-condition (post-condition) of the loop, we may project the primed variables to obtain , and similarly project the unprimed variables to obtain . Finally, combining the -VASR update formula, the simulation formula , and the pre/post-condition, we get the following approximation of the dequeue loop’s behavior:
Using this summary for the dequeue loop, we may proceed to compute a transition formula for the body of the harness loop (omitted for brevity). Just as with the dequeue loop, we analyze the harness loop by computing a simulation -VASR, , that simulates () it:
Unlike the dequeue loop, we do not get an exact characterization of the dynamics of each changed variable. In particular, in the slow dequeue path through the loop, the value of front_len, back_len, and mem_ops change by a variable amount. The variable back_len is set to 0, so its behavior can be captured by a reset. The dynamics of front_len and mem_ops cannot be captured by a -VASR, but (using our dequeue summary) we can observe that the sum of front_len + back_len is decremented by 1, and the sum of mem_ops + 3back_len is incremented by 2.
We compute the following formula that captures the reachability relation of (taking steps of enqueue, steps of dequeue fast, and steps of dequeue slow):
Using this update formula (along with pre/post-condition formulas), we obtain a summary for the harness loop (omitted for brevity). Using this summary we can prove some interesting features of the data structure (supposing that we start in a state where all variables are zero): mem_ops is at most 4 times nb_ops (i.e., enqueue and dequeue use O(1) amortized memory operations), and size is the sum of front_len and back_len.
We now take a moment to define what a transition system is, the transition systems of interest in this paper (transition formulas,-VASR, -VASRS), and the notation used throughout the paper.
A transition system is a pair where
is a potentially infinite set of configurations
is a transition relation.
For a transition relation , we use to denote its reflexive, transitive closure.
For a vector , we use to denote the diagonal matrix with on the diagonal. For two vectors and of the same dimension , we use to denote the inner product and to denote the pointwise (aka Hadamard) product (i.e., the -vector where entry is equal to ).
For any natural number , we use to denote the standard basis vector in the th direction (i.e., the consisting of all zeros except the th entry is 1), where the dimension of is understood from context. We use to denote the identity matrix, or simply if is understood from context.
For any natural number pair, , matrix , and set , define to be the submatrix of obtained by deleting the rows not in (i.e., if we enumerate in order as then is the matrix whose th row is the th row of ). Observe that for any and , .
An n-transition formula is a (or ) formula whose free variables range over and . The free variables designate the state before and after a transition and () denotes the existential fragment of linear integer (rational) arithmetic.
The syntax for is as follows:
where is a variable symbol and is a rational number. Observe that (without loss of generality) we assume that formulas are free of negation.
An -transition formula, , defines a transition system where
A d-dimensional rational vector addition system with resets (-VASR) is a finite set . Each consists of a binary reset vector , and a rational addition vector , both of dimension . defines a transition system where
A d-dimensional rational vector addition system with resets and states (-VASRS), , is a finite set of states, , together with a finite set of edges, . defines a transition system where
The reachability relation of a -VASRS is definable in Presburger arithmetic.
Theorem 2.1 ()
There is a polytime algorithm which, given a -dimensional -VASRS, , computes a transition formula such that for all for some states if and only if .
-VASRs are a special case of -VASRSs with a single state and so this theorem applies to -VASRs as well.
3 Approximating loops with vector addition systems
In this section, we describe a method for over-approximating the transitive closure of a transition formula using a -VASR. This procedure immediately extends to computing summaries for programs using the method outlined in Section 1.1.
The core algorithmic problem that we answer in this section is: given a transition formula, how can we synthesize a best abstraction of that formulas dynamics as a -VASR? We begin by formalizing the problem: in particular, we define what it means for a -VASR to simulate a transition formula and what it means for an abstraction to be “best.”
Let and be transition systems operating over rational vector spaces. A linear simulation from tosuch that for all in which , we have . We use to denote that is a linear simulation from to .
In particular, if is an -transition formula,V is a -dimensional -VASR, and is matrix, then exactly when , where
is a transition formula representing the transitions that simulates under the transformation . The key property of simulations is that if , then (the reachability relation of under the inverse image of ) over-approximates the transitive closure of .
Our task is to synthesize a linear transformation and a -VASR such that . We call a pair , consisting of a rational matrix , and a -dimensional -VASR a -VASR abstraction; we say that is the concrete dimension of and is the abstract dimension. We say that is a -VASR abstraction of if . A transition formula has many -VASR abstractions, and we so we are interested in comparing their precision. We define a preorder on -VASR abstractions, where iff there exists a linear transformation such that and ( and are the abstract dimensions of and , respectively).
Thus, our problem can be stated as follows: given a transition formula , synthesize a -VASR abstraction of such that is best in the sense that we have for any -VASR abstraction of . A solution to this problem is given in Algorithm 1.
Algorithm 1 follows the familiar pattern of an AllSat-style loop. We begin with an empty -VASR abstraction (), and build the abstraction up to over-approximate all possible behaviors of iteratively. The formula maintains the set of transitions that are allowed by but not simulated by the current -VASR abstraction. Each abstraction round proceeds as follows: First, we sample a model of (i.e., a transition that is allowed by but not simulated by . We then generalize that transition to a set of transitions by using to select a cube of the DNF of contains . We then compute a -VASR abstraction of , using the procedure described in Section 3.1. We combine this -VASR abstraction with the current one () by computing a least upper bound (in order), using the procedure described in Section 3.2. Finally, we block any transition in from being sampled again by conjoining to . The loop terminates when is unsatisfiable, in which case we have that .
Theorem 3.1 ()
3.1 Abstracting conjunctive transition formulas
In this section, we show how to compute a -VASR abstraction for a consistent conjunctive formula. The intuition is that, since is a convex theory, the best -VASR abstraction consists of a single transition. (For formulas, our procedure produces a -VASR abstract that is not guaranteed to be best, precisely because is not convex).
Let be formula that is consistent and conjunctive. Observe that the set , which represents linear combinations of variables that are reset across , forms a vector space. Similarly, the set representing linear combinations of variables that are incremented across forms a vector space, . We compute bases for both and , and respectively. We then define to be the -VASR abstraction , where
Let be the formula , where is a Skolem constant. The vector space of resets has basis (representing that is reset to 1). The vector space of increments has basis (representing that increases by 2 and decreases by 1). A best abstraction of is thus the three-dimensional -VASR , with simulation matrix . In particular, notice that since the variable is both incremented and reset, it is represented by two different dimensions in .
Proposition 1 ()
For any consistent, conjunctive transition formula , is a -VASR abstraction of . If is expressed in , is best.
3.2 Least upper bound
In this section, we show how to compute least upper bounds w.r.t. the order. Given two -VASR abstractions and , their least upper bound is a -VASR abstraction such that (1) (2) , and (3) for any -VASR abstraction satisfying (1) and (2), we have .
Supposing that and , there must exist linear simulations and such that , , and . The intuition behind our approach is that we will compute and , and derive -VASR as the union of the image of under and the image of under . Computation of and relies on (1) the constraints on and induced by the expected equation (2) the fact that if is a linear simulation from to any other -VASR, then must satisfy a certain structural property. This property is called coherence, as defined in the following.
Let be a -dimensional -VASR. are coherent dimensions of if for all transitions (i.e., every transition of that resets also resets and vice versa). denotes that and are coherent dimensions of . Observe that forms an equivalence relation on . We refer to the equivalence classes of as coherence classes.
A row vector is coherent with respect to if and only if for all , and implies . Equivalently, is coherent if there is some coherence class and some row vector such that . If is non-zero then the coherence class is uniquely determined; in this case we use to denote .
A matrix is coherent with respect to if and only if each of its rows is coherent with respect to .
Lemma 1 ()
Let be a -dimensional -VASR, let be an -dimensional -VASR, and let be a linear transformation such that . Then is coherent with respect to .
For a -dimensional -VASR, , and a linear transformation, , that is both coherent with respect to and has no zero rows, there is a unique -dimensional -VASR, , such that and is minimal in the inclusion order; we use to denote this -VASR. More explicitly, is defined as
where is the reset vector translated along , defined as the -dimensional vector with for some arbitrary representative .222Recall that since is coherent and non-zero it is associated with a unique coherence class , and that we must have for all so that the the choice of representative is irrelevant. The intuition behind is that each row of corresponds to a unique coherence class of , and either all the dimensions in are reset (in which case we take ) or none of them are (in which case we take ). Observe that for any , . The following lemma gives an intuitive characterization of image.
Lemma 2 ()
Let be a -dimensional -VASR and let be a matrix that is coherent with respect to . For all , iff .
Before describing our least upper bound algorithm, we must define a technical condition that is both assumed and preserved by the procedure:
A -VASR abstraction is normal if there is no non-zero vector that is coherent with respect to such that (i.e., the rows of that correspond to any coherence class of are linearly independent).
Intuitively, a -VASR abstraction that is not normal contains information that is either inconsistent or redundant.
We now present Algorithm 2, our algorithm for computing least upper bounds of -VASR abstractions. Let and be -VASR abstractions. Our goal is to find two matrices and such that (1) , (2) is coherent w.r.t. , and (3) is coherent w.r.t. . We find the best such and iteratively. For each pair of coherence classes of and of , we compute matrices and such that (i) , (ii) , (iii) , and (iv) are maximal, in the sense that the rows of form a basis vector space that contains the rowspace of any matrix such that and satisfy (i)-(iii). We form and simply by collecting all such and . Properties (1)-(3) together ensure that the -VASR abstraction where and is an upper bound on and . The fact that and are constructed from matrices that satisfy (iv) over all pairs of coherence classes ensures that is any other upper bound.
Proposition 2 ()
Let and be normal -VASR abstractions of equal concrete dimension. Then the -VASR abstraction computed by Algorithm 2 is normal and is a least upper bound on and .
4 Control Flow and -Vasrs
In this section, we give a method for improving the precision of our loop summarization technique by using -VASRS, -VASR extended with control states. While -VASRs over-approximate control flow using non-determinism, -VASRSs can encode patterns such as oscillating and multi-phase loops. Section 5 demonstrates that the ability to analyze such patterns greatly increases the accuracy of loop summaries for some loops.
We begin with an example that demonstrates the precision gained by -VASRS. The loop in Figure 2 oscillates between (1) incrementing variable by and (2) incrementing both variables and by . Suppose that we wish to prove that, starting with the configuration , the loop maintains the invariant that . The (best) -VASR abstraction of the loop, pictured in Figure 2, over-approximates the control flow of the loop by treating the conditional branch in the loop as a non-deterministic branch. This over-approximation may violate the invariant by repeatedly executing the path where both variables are incremented. On the other hand, the -VASRS abstraction of the loop pictured in Figure 2 captures the understanding that the loop must oscillate between the two paths. The loop summary obtained from the reachability relation of this -VASRS, is powerful enough to prove the invariant holds (under the precondition ).
4.1 Technical details
An -predicate -VASRS is a -VASRS, , such that each control state is a predicate over the variables and the predicates in are pairwise inconsistent (for all , is unsatisfiable).
We extend linear simulations to -predicate -VASRS as follows:
Let be an -state transition formula and let be an -predicate -VASRS of dimension . We say that a linear transformation is a linear simulation from to if for all such that , (1) there is a (unique) such that , (2) there is a (unique) such that , and (3) .
Let and be -predicate -VASRS of dimensions and , respectively. We say that a linear transformation is a linear simulation from to if for all and for all such that , there exists (unique) such that (1) , (2) , and (3)
Observe that if has a linear simulation to , then must be finer than in the sense that (1) for each there is a (unique) such that .
We define a -VASRS abstraction to be a pair consisting of a rational matrix and an -predicate -VASRS of dimension . We extend the simulation preorder to -VASRS abstractions in the natural way. Extending the definition of “best” abstractions requires more care, since we can always find a “better” -VASRS abstraction (strictly smaller in order) by using a finer state partition. However, if we consider only -predicate -VASRS that share the same set of control states, then best abstractions do exist and can be computed using Algorithm 3.
Algorithm 3 works as follows. First, for each pair of formulas , compute a best -VASR abstraction of the formula (where denotes with unprimed variables replaced by primed ones) and call it . over-approximates the transitions of that begin in a program state satisfying and ending in a program state satisfying . Second, we compute the least upper bound of all -VASR abstractions to get a best -VASR abstraction for . Computing the least upper bound has the effect of reconciling the -VASR abstractions corresponding to different edges in the -VASRS, but does not maintain the provenance of the -VASR transitions (i.e., which transformers correspond to which edges). To reconstruct provenance, we compute the linear simulation from to , and compute the edges from to as the image of under .
Given a transition -transition formula and control states , Algorithm 3 computes the best -predicate -VASRS abstraction of with control states .
We now describe Algorithm 4, which uses -VASRS to over-approximate transitive closure of transition formulas. Towards our goal of predictable program analysis, we desire our analysis to be monotone in the sense that if and are transition formulas such that entails , then the over-approximate transitive summary of entails the over-approximate transitive summary of . The key property we desire in a procedure for generating control states predicates is monotonicity: if , then control states of should be at least as fine as control state of . We can achieve this by taking the set of control states of to be the set of topologically connected regions of (lines 4-4). Unfortunately, this set of predicates fails the contract of abstract-VASRS, because there may exist a transition such that . As a result, does not necessarily approximate ; however, it does over-approximate . An over-approximation of the transitive closure of can easily be obtained from (the over-approximation of the transitive closure of obtained from the -VASRS abstraction ()) by sequentially composing with the identity relation or (line 4).
The abstract-VASRS algorithm uses predicates to infer the control structure of a -VASRS, but after computing the -VASRS abstraction, iter-VASRS makes no further use of the predicates (i.e., the predicates are irrelevant in the computation of ). Predicates can be used to improve iter-VASRS as follows. The reachability relation of a -VASRS is expressed by a formula that uses auxiliary variables to represent the state at which the computation begins and ends . These variables can be used to encode that the pre-state of the transitive closure must satisfy the predicate corresponding to the begin state and the post-state must satisfy the predicate corresponding to the end state. As an example, consider the Figure 2 and suppose that we wish to prove the invariant under the pre-condition . While this invariant holds, we cannot prove it because there is counter example if the computation begins at . By applying the above improvement, we can prove that the computation must begin at , and the invariant is verified.
The goals of our evaluation is the answer the following questions:
Are -VASR sufficiently expressive to be able to generate accurate loop summaries?
Does the -VASRS technique improve upon the precision of -VASR?
Are the -VASR/-VASRS loop summarization algorithms performant?
We implemented our loop summarization procedure and the compositional whole-program summarization technique described in Section 1.1. We ran on a suite of 149 benchmarks, drawn from the C4B  and HOLA  suites, as well as the safe, integer-only benchmarks in the loops category of SV-Comp 2016 . We ran each benchmark with a time-out of 5 minutes, and recorded how many benchmarks were proved safe by our -VASR-based technique and our -VASRS-based technique. For context, we also compare with CRA  (a related loop summarization technique), as well as SeaHorn  and UltimateAutomizer  (state-of-the-art software model checkers). The results are shown in Figure 3.
The number of assertions proved correct using -VASR is comparable to both SeaHorn and UltimateAutomizer, demonstrating that -VASR can indeed model interesting loop phenomena. -VASRS-based summarization significantly improves precision, proving the correctness of 91% of assertions in the suite, and more than any other tool.
-VASR-based summarization is the most performant of all the compared techniques, followed by CRA and -VASRS. SeaHorn and UltimateAutomizer employ abstraction-refinement loops, and so take significantly longer to run the test suite.
6 Related work
Our analysis follows the same high-level structure as compositional recurrence analysis (CRA) [5, 12]. Our analysis differs from CRA in the way that it summarizes loops: we compute loop summaries by over-approximating loop by vector addition systems and computing reachability relations, whereas CRA computes loop summaries by extracting recurrence relations and computing closed forms. The advantage of our approach is that is that we can use -VASR to accurately model multi-path loops and can make theoretical guarantees about the precision of our analysis; the advantage of CRA is its ability to generate non-linear invariants.
Vector addition systems
Our invariant generation method techniques upon Haase and Halfon’s polytime procedure for computing the reachability relation of integer vector addition systems with states and resets . Generalization from the integer case to the rational case is straightforward. Continuous Petri nets  are a related generalization of vector addition systems, where time is taken to be continuous (-VASR, in contrast, have rational state spaces but discrete time). Reachability is for continuous Petri nets is polytime  and transitive closure is definable in linear arithmetic .
Sinn et al. present a technique for resource bound analysis which is related to our loop summarization procedure . Sinn et al.’s method is based on computing a lossy vector addition system with states that simulates a piece of code, proving termination of the VASS, and then extracting resource bounds from the ranking function. Our method differs in several respects. First, Sinn et al. model programs using vector addition systems with states over the natural numbers, which enables them to use termination bounds for VASS to compute upper bounds on resource usage. We use VASS with resets over the rationals, which (in contrast to VASS) have a Presburger-definable reachability relation, enabling us to summarize loops. Moreover, Sinn et al.’s method for extracting VASS models of program is heuristic, whereas our method gives precision guarantees.
The main contribution of this paper is a technique for synthesizing the best abstraction of a transition formula expressible in the language of -VASR (with or without states). This is closely related to the symbolic abstraction problem, which computes the best abstraction of a formula within an abstract domain. The problem of computing best abstractions has been undertaken for finite-height abstract domains , template constraint matrices (including intervals and octagons) , and polyhedra [19, 5]. Our best abstraction result differs in that (1) it is for a disjunctive domain and (2) the notion of “best” is based on simulation rather than the typical order-theoretic framework.
-  M. Blondin, A. Finkel, C. Haase, and S. Haddad. Approaching the coverability problem continuously. In TACAS, pages 480–496, 2016.
-  Q. Carbonneaux, J. Hoffmann, and Z. Shao. Compositional certified resource bounds. In PLDI, 2015.
-  R. David and H. Alla. Continuous Petri nets. In Proc. 8th Eur. Workshop Applic. Theory Petri Nets, pages 275–294, 1987.
-  I. Dillig, T. Dillig, B. Li, and K. McMillan. Inductive invariant generation via abductive inference. In OOPSLA, 2013.
-  A. Farzan and Z. Kincaid. Compositional recurrence analysis. In FMCAD, 2015.
-  E. Fraca and S. Haddad. Complexity analysis of continuous petri nets. Fundam. Inf., 137(1):1–28, Jan. 2015.
-  A. Gurfinkel, T. Kahsai, A. Komuravelli, and J. Navas. The SeaHorn verification framework. In CAV, 2015.
-  C. Haase and S. Halfon. Integer vector addition systems with states. In Reachability Problems, pages 112–124, 2014.
-  M. Heizmann, Y. Chen, D. Dietsch, M. Greitschus, J. Hoenicke, Y. Li, A. Nutz, B. Musa, C. Schilling, T. Schindler, and A. Podelski. Ultimate automizer and the search for perfect interpolants - (competition contribution). In TACAS, pages 447–451, 2018.
-  R. M. Karp and R. E. Miller. Parallel program schemata. J. Comput. Syst. Sci., 3(2):147–195, May 1969.
-  Z. Kincaid, J. Breck, A. Forouhi Boroujeni, and T. Reps. Compositional recurrence analysis revisited. In PLDI, 2017.
-  Z.