1 Introduction
Modern software verification techniques employ a number of heuristics for reasoning about loops. While these heuristics are often effective, they are unpredictable. For example, an abstract interpreter may fail to find the most precise invariant expressible in the language of its abstract domain due to imprecise widening; similarly, a softwaremodel checker might fail to terminate because it generates interpolants that are insufficiently general. This paper presents a loop summarization technique that is powerful, in the sense that it generates expressive loop invariants, and
predictable, in the sense that we can make theoretical guarantees about invariant quality.The key idea behind our technique is to leverage reachability results of vector addition systems (VAS) for invariant generation. Vector addition systems are a class of infinitestate transition systems with decidable reachability; VAS are classically used as a model of parallel systems [10]. We consider a variation of VAS, rational VAS with resets (VASR), wherein there is a finite number of rationaltyped variables and a finite set of transitions that simultaneously update each variable in the system by either adding a constant value or (re)setting the variable to a constant value. Our interest in VASRs stems from the fact that there is (polytime) procedure to compute a linear arithmetic formula that represents a VASR’s reachability relation [8].
Since the reachability relation of a VASR is computable, the dynamics of VASR can be analyzed without relying on heuristic techniques. However, there is a gap between VASR and the loops that we are interested in summarizing. The latter typically use a rich set of operations (memory manipulation, conditionals, nonconstant increments, nonlinear arithmetic, etc) and cannot be analyzed precisely. We bridge the gap with a procedure that, for any loop, synthesizes a VASR that simulates the loop. The reachability relation of the VASR can be used to overapproximate the behavior of the loop. Moreover, we prove that if a loop is expressed in linear rational arithmetic (LRA), our procedure synthesizes a best VASR abstraction, in the sense that it simulates any other VASR that simulates the loop. That is, the procedure does not make arbitrary heuristic choices, but rather synthesizes a best approximation of the loop in the language of VASR.
VASRs overapproximate multipaths loops by treating the choice between paths as nondeterministic. We show that we can recover some conditional control flow information and interpath control dependencies by partitioning the states of the loop, and encoding this partitioning by extending VASR with control states (VASR with states, VASRS). We give a procedure for synthesizing a VASRS that simulates a given loop; we may then use the reachability relation of a VASRS to summarize the loop. We prove that, for a fixed program state partition, this procedure computes best VASRS abstractions for LRA formulas. Additionally, we give a statepartitioning algorithm that yields a monotone loop summarization procedure (more accurate information about loop bodies result in more accurate loop summaries).
Finally, we note that our analysis techniques extend to complex control structures, such as nested loops, by employing summarization compositionally (i.e., “bottomup”). For example, our analysis summarizes a nested loop by first summarizing its inner loops, and then uses the summaries to analyze the outer loop. As a result of compositionality, our analysis can be applied to partial programs, is easy to parallelize, and has the potential to scale to large code bases.
The main contributions of the paper are as follows:

We present a procedure to synthesize VASR abstractions of transition formulas. For transition formula in linear rational arithmetic, this VASR is a best abstraction.

We present a technique for improving the precision of our analysis by using VASR with states to capture loop control structure.

We implement the proposed invariant generation technique and show that its ability to verify user assertions is comparable to software model checkers, while providing theoretical guarantees of termination and invariant quality.
1.1 Outline
This section illustrates the highlevel structure of our invariant generation scheme. The goal is to compute a transition formula that summarizes the behavior of a given program. A transition formula is a formula over a set of program variables Var along with primed copies , representing the state of the program before and after executing a computation (respectively). For any given program , a transition formula can be computed by recursion on syntax:^{1}^{1}1This style of analysis can be extended from a simple blockstructured language to one with control flow and recursive procedures using the framework of algebraic program analysis [18, 11].
where is a function that computes an overapproximation of the transitive closure of a transition formula. The contribution of this paper is a method for computing this operation, which is based on first overapproximating the input transition formula by a VASR, and then computing the (exact) reachability relation of the VASR.
We illustrate the analysis on the integer model of a persistent queue data structure pictured in Figure 1. The example consists of two operations (enqueue and dequeue), as well as a test harness (harness) that nondeterministially executes enqueue and dequeue operations. The queue achieves amortized time enqueue and queue by implementing the queue as two lists, front and back (whose lengths are modeled as front_len and back_len, respectively); the sequence of elements in the queue is the front list followed by the reverse of the back list. We will show that the queue operates in amortized time by finding a summary for harness that implies a linear bound on mem_ops (the number of memory operations in the computation) in terms of nb_ops (the total number of enqueue/dequeue operations executed in some sequence of operations).
To analyze the queue, we procede compositionally, in “bottomup” fashion (i.e., starting from deeplynested code and working our way back up to a summary for harness). There are two loops of interest, one in dequeue and one in harness. Since the dequeue loop is nested inside the harness loop, dequeue is analyzed first. We first compute a transition formula that represents one exeution of the body of the loop:
Observe that each variable in the loop is incremented by a constant value. As a result, the loop update can be captured faithfully by a vector addition system. In particular, we see that this loop body formula is simulated by a fourdimensional vector addition system, where the simulation relation and VASR are as follows:
A formula representing the reachability relation of a vector addition system can be computed in polytime. For the case of , a formula representing steps of the VASR is simply
To capture information about the precondition (postcondition) of the loop, we may project the primed variables to obtain , and similarly project the unprimed variables to obtain . Finally, combining the VASR update formula, the simulation formula , and the pre/postcondition, we get the following approximation of the dequeue loop’s behavior:
Using this summary for the dequeue loop, we may proceed to compute a transition formula for the body of the harness loop (omitted for brevity). Just as with the dequeue loop, we analyze the harness loop by computing a simulation VASR, , that simulates () it:
Unlike the dequeue loop, we do not get an exact characterization of the dynamics of each changed variable. In particular, in the slow dequeue path through the loop, the value of front_len, back_len, and mem_ops change by a variable amount. The variable back_len is set to 0, so its behavior can be captured by a reset. The dynamics of front_len and mem_ops cannot be captured by a VASR, but (using our dequeue summary) we can observe that the sum of front_len + back_len is decremented by 1, and the sum of mem_ops + 3back_len is incremented by 2.
We compute the following formula that captures the reachability relation of (taking steps of enqueue, steps of dequeue fast, and steps of dequeue slow):
Using this update formula (along with pre/postcondition formulas), we obtain a summary for the harness loop (omitted for brevity). Using this summary we can prove some interesting features of the data structure (supposing that we start in a state where all variables are zero): mem_ops is at most 4 times nb_ops (i.e., enqueue and dequeue use O(1) amortized memory operations), and size is the sum of front_len and back_len.
2 Background
We now take a moment to define what a transition system is, the transition systems of interest in this paper (transition formulas,
VASR, VASRS), and the notation used throughout the paper.Definition 1
A transition system is a pair where

is a potentially infinite set of configurations

is a transition relation.
For a transition relation , we use to denote its reflexive, transitive closure.
For a vector , we use to denote the diagonal matrix with on the diagonal. For two vectors and of the same dimension , we use to denote the inner product and to denote the pointwise (aka Hadamard) product (i.e., the vector where entry is equal to ).
For any natural number , we use to denote the standard basis vector in the th direction (i.e., the consisting of all zeros except the th entry is 1), where the dimension of is understood from context. We use to denote the identity matrix, or simply if is understood from context.
For any natural number pair, , matrix , and set , define to be the submatrix of obtained by deleting the rows not in (i.e., if we enumerate in order as then is the matrix whose th row is the th row of ). Observe that for any and , .
An ntransition formula is a (or ) formula whose free variables range over and . The free variables designate the state before and after a transition and () denotes the existential fragment of linear integer (rational) arithmetic.
The syntax for is as follows:
where is a variable symbol and is a rational number. Observe that (without loss of generality) we assume that formulas are free of negation.
An transition formula, , defines a transition system where


.
Definition 2
A ddimensional rational vector addition system with resets (VASR) is a finite set . Each consists of a binary reset vector , and a rational addition vector , both of dimension . defines a transition system where
Definition 3
A ddimensional rational vector addition system with resets and states (VASRS), , is a finite set of states, , together with a finite set of edges, . defines a transition system where

.

The reachability relation of a VASRS is definable in Presburger arithmetic.
Theorem 2.1 ([8])
There is a polytime algorithm which, given a dimensional VASRS, , computes a transition formula such that for all for some states if and only if .
VASRs are a special case of VASRSs with a single state and so this theorem applies to VASRs as well.
3 Approximating loops with vector addition systems
In this section, we describe a method for overapproximating the transitive closure of a transition formula using a VASR. This procedure immediately extends to computing summaries for programs using the method outlined in Section 1.1.
The core algorithmic problem that we answer in this section is: given a transition formula, how can we synthesize a best abstraction of that formulas dynamics as a VASR? We begin by formalizing the problem: in particular, we define what it means for a VASR to simulate a transition formula and what it means for an abstraction to be “best.”
Definition 4
Let and be transition systems operating over rational vector spaces. A linear simulation from to
such that for all in which , we have . We use to denote that is a linear simulation from to .In particular, if is an transition formula,V is a dimensional VASR, and is matrix, then exactly when , where
is a transition formula representing the transitions that simulates under the transformation . The key property of simulations is that if , then (the reachability relation of under the inverse image of ) overapproximates the transitive closure of .
Our task is to synthesize a linear transformation and a VASR such that . We call a pair , consisting of a rational matrix , and a dimensional VASR a VASR abstraction; we say that is the concrete dimension of and is the abstract dimension. We say that is a VASR abstraction of if . A transition formula has many VASR abstractions, and we so we are interested in comparing their precision. We define a preorder on VASR abstractions, where iff there exists a linear transformation such that and ( and are the abstract dimensions of and , respectively).
Thus, our problem can be stated as follows: given a transition formula , synthesize a VASR abstraction of such that is best in the sense that we have for any VASR abstraction of . A solution to this problem is given in Algorithm 1.
Algorithm 1 follows the familiar pattern of an AllSatstyle loop. We begin with an empty VASR abstraction (), and build the abstraction up to overapproximate all possible behaviors of iteratively. The formula maintains the set of transitions that are allowed by but not simulated by the current VASR abstraction. Each abstraction round proceeds as follows: First, we sample a model of (i.e., a transition that is allowed by but not simulated by . We then generalize that transition to a set of transitions by using to select a cube of the DNF of contains . We then compute a VASR abstraction of , using the procedure described in Section 3.1. We combine this VASR abstraction with the current one () by computing a least upper bound (in order), using the procedure described in Section 3.2. Finally, we block any transition in from being sampled again by conjoining to . The loop terminates when is unsatisfiable, in which case we have that .
Theorem 3.1 ()
3.1 Abstracting conjunctive transition formulas
In this section, we show how to compute a VASR abstraction for a consistent conjunctive formula. The intuition is that, since is a convex theory, the best VASR abstraction consists of a single transition. (For formulas, our procedure produces a VASR abstract that is not guaranteed to be best, precisely because is not convex).
Let be formula that is consistent and conjunctive. Observe that the set , which represents linear combinations of variables that are reset across , forms a vector space. Similarly, the set representing linear combinations of variables that are incremented across forms a vector space, . We compute bases for both and , and respectively. We then define to be the VASR abstraction , where
Example 1
Let be the formula , where is a Skolem constant. The vector space of resets has basis (representing that is reset to 1). The vector space of increments has basis (representing that increases by 2 and decreases by 1). A best abstraction of is thus the threedimensional VASR , with simulation matrix . In particular, notice that since the variable is both incremented and reset, it is represented by two different dimensions in .
Proposition 1 ()
For any consistent, conjunctive transition formula , is a VASR abstraction of . If is expressed in , is best.
3.2 Least upper bound
In this section, we show how to compute least upper bounds w.r.t. the order. Given two VASR abstractions and , their least upper bound is a VASR abstraction such that (1) (2) , and (3) for any VASR abstraction satisfying (1) and (2), we have .
Supposing that and , there must exist linear simulations and such that , , and . The intuition behind our approach is that we will compute and , and derive VASR as the union of the image of under and the image of under . Computation of and relies on (1) the constraints on and induced by the expected equation (2) the fact that if is a linear simulation from to any other VASR, then must satisfy a certain structural property. This property is called coherence, as defined in the following.
Definition 5
Let be a dimensional VASR. are coherent dimensions of if for all transitions (i.e., every transition of that resets also resets and vice versa). denotes that and are coherent dimensions of . Observe that forms an equivalence relation on . We refer to the equivalence classes of as coherence classes.
A row vector is coherent with respect to if and only if for all , and implies . Equivalently, is coherent if there is some coherence class and some row vector such that . If is nonzero then the coherence class is uniquely determined; in this case we use to denote .
A matrix is coherent with respect to if and only if each of its rows is coherent with respect to .
Lemma 1 ()
Let be a dimensional VASR, let be an dimensional VASR, and let be a linear transformation such that . Then is coherent with respect to .
For a dimensional VASR, , and a linear transformation, , that is both coherent with respect to and has no zero rows, there is a unique dimensional VASR, , such that and is minimal in the inclusion order; we use to denote this VASR. More explicitly, is defined as
where is the reset vector translated along , defined as the dimensional vector with for some arbitrary representative .^{2}^{2}2Recall that since is coherent and nonzero it is associated with a unique coherence class , and that we must have for all so that the the choice of representative is irrelevant. The intuition behind is that each row of corresponds to a unique coherence class of , and either all the dimensions in are reset (in which case we take ) or none of them are (in which case we take ). Observe that for any , . The following lemma gives an intuitive characterization of image.
Lemma 2 ()
Let be a dimensional VASR and let be a matrix that is coherent with respect to . For all , iff .
Before describing our least upper bound algorithm, we must define a technical condition that is both assumed and preserved by the procedure:
Definition 6
A VASR abstraction is normal if there is no nonzero vector that is coherent with respect to such that (i.e., the rows of that correspond to any coherence class of are linearly independent).
Intuitively, a VASR abstraction that is not normal contains information that is either inconsistent or redundant.
We now present Algorithm 2, our algorithm for computing least upper bounds of VASR abstractions. Let and be VASR abstractions. Our goal is to find two matrices and such that (1) , (2) is coherent w.r.t. , and (3) is coherent w.r.t. . We find the best such and iteratively. For each pair of coherence classes of and of , we compute matrices and such that (i) , (ii) , (iii) , and (iv) are maximal, in the sense that the rows of form a basis vector space that contains the rowspace of any matrix such that and satisfy (i)(iii). We form and simply by collecting all such and . Properties (1)(3) together ensure that the VASR abstraction where and is an upper bound on and . The fact that and are constructed from matrices that satisfy (iv) over all pairs of coherence classes ensures that is any other upper bound.
Proposition 2 ()
Let and be normal VASR abstractions of equal concrete dimension. Then the VASR abstraction computed by Algorithm 2 is normal and is a least upper bound on and .
4 Control Flow and Vasrs
In this section, we give a method for improving the precision of our loop summarization technique by using VASRS, VASR extended with control states. While VASRs overapproximate control flow using nondeterminism, VASRSs can encode patterns such as oscillating and multiphase loops. Section 5 demonstrates that the ability to analyze such patterns greatly increases the accuracy of loop summaries for some loops.
We begin with an example that demonstrates the precision gained by VASRS. The loop in Figure 2 oscillates between (1) incrementing variable by and (2) incrementing both variables and by . Suppose that we wish to prove that, starting with the configuration , the loop maintains the invariant that . The (best) VASR abstraction of the loop, pictured in Figure 2, overapproximates the control flow of the loop by treating the conditional branch in the loop as a nondeterministic branch. This overapproximation may violate the invariant by repeatedly executing the path where both variables are incremented. On the other hand, the VASRS abstraction of the loop pictured in Figure 2 captures the understanding that the loop must oscillate between the two paths. The loop summary obtained from the reachability relation of this VASRS, is powerful enough to prove the invariant holds (under the precondition ).
4.1 Technical details
Definition 7
An predicate VASRS is a VASRS, , such that each control state is a predicate over the variables and the predicates in are pairwise inconsistent (for all , is unsatisfiable).
We extend linear simulations to predicate VASRS as follows:

Let be an state transition formula and let be an predicate VASRS of dimension . We say that a linear transformation is a linear simulation from to if for all such that , (1) there is a (unique) such that , (2) there is a (unique) such that , and (3) .

Let and be predicate VASRS of dimensions and , respectively. We say that a linear transformation is a linear simulation from to if for all and for all such that , there exists (unique) such that (1) , (2) , and (3)
Observe that if has a linear simulation to , then must be finer than in the sense that (1) for each there is a (unique) such that .
We define a VASRS abstraction to be a pair consisting of a rational matrix and an predicate VASRS of dimension . We extend the simulation preorder to VASRS abstractions in the natural way. Extending the definition of “best” abstractions requires more care, since we can always find a “better” VASRS abstraction (strictly smaller in order) by using a finer state partition. However, if we consider only predicate VASRS that share the same set of control states, then best abstractions do exist and can be computed using Algorithm 3.
Algorithm 3 works as follows. First, for each pair of formulas , compute a best VASR abstraction of the formula (where denotes with unprimed variables replaced by primed ones) and call it . overapproximates the transitions of that begin in a program state satisfying and ending in a program state satisfying . Second, we compute the least upper bound of all VASR abstractions to get a best VASR abstraction for . Computing the least upper bound has the effect of reconciling the VASR abstractions corresponding to different edges in the VASRS, but does not maintain the provenance of the VASR transitions (i.e., which transformers correspond to which edges). To reconstruct provenance, we compute the linear simulation from to , and compute the edges from to as the image of under .
Proposition 3
Given a transition transition formula and control states , Algorithm 3 computes the best predicate VASRS abstraction of with control states .
We now describe Algorithm 4, which uses VASRS to overapproximate transitive closure of transition formulas. Towards our goal of predictable program analysis, we desire our analysis to be monotone in the sense that if and are transition formulas such that entails , then the overapproximate transitive summary of entails the overapproximate transitive summary of . The key property we desire in a procedure for generating control states predicates is monotonicity: if , then control states of should be at least as fine as control state of . We can achieve this by taking the set of control states of to be the set of topologically connected regions of (lines 44). Unfortunately, this set of predicates fails the contract of abstractVASRS, because there may exist a transition such that . As a result, does not necessarily approximate ; however, it does overapproximate . An overapproximation of the transitive closure of can easily be obtained from (the overapproximation of the transitive closure of obtained from the VASRS abstraction ()) by sequentially composing with the identity relation or (line 4).
Precision improvements
The abstractVASRS algorithm uses predicates to infer the control structure of a VASRS, but after computing the VASRS abstraction, iterVASRS makes no further use of the predicates (i.e., the predicates are irrelevant in the computation of ). Predicates can be used to improve iterVASRS as follows. The reachability relation of a VASRS is expressed by a formula that uses auxiliary variables to represent the state at which the computation begins and ends [8]. These variables can be used to encode that the prestate of the transitive closure must satisfy the predicate corresponding to the begin state and the poststate must satisfy the predicate corresponding to the end state. As an example, consider the Figure 2 and suppose that we wish to prove the invariant under the precondition . While this invariant holds, we cannot prove it because there is counter example if the computation begins at . By applying the above improvement, we can prove that the computation must begin at , and the invariant is verified.
5 Evaluation
The goals of our evaluation is the answer the following questions:

Are VASR sufficiently expressive to be able to generate accurate loop summaries?

Does the VASRS technique improve upon the precision of VASR?

Are the VASR/VASRS loop summarization algorithms performant?
VASR  VASRS  CRA  SeaHorn  UltAuto  

#safe  time  #safe  time  #safe  time  #safe  time  #safe  time  
C4B  35  21  22.1  31  28.6  27  23.4  25  3020.0  25  3053.8 
HOLA  46  32  41.3  39  63.1  40  38.3  39  2116.0  38  2655.1 
svcompint  69  59  40.6  66  160.8  64  53.1  61  2132.6  56  4856.1 
We implemented our loop summarization procedure and the compositional wholeprogram summarization technique described in Section 1.1. We ran on a suite of 149 benchmarks, drawn from the C4B [2] and HOLA [4] suites, as well as the safe, integeronly benchmarks in the loops category of SVComp 2016 [17]. We ran each benchmark with a timeout of 5 minutes, and recorded how many benchmarks were proved safe by our VASRbased technique and our VASRSbased technique. For context, we also compare with CRA [12] (a related loop summarization technique), as well as SeaHorn [7] and UltimateAutomizer [9] (stateoftheart software model checkers). The results are shown in Figure 3.
The number of assertions proved correct using VASR is comparable to both SeaHorn and UltimateAutomizer, demonstrating that VASR can indeed model interesting loop phenomena. VASRSbased summarization significantly improves precision, proving the correctness of 91% of assertions in the suite, and more than any other tool.
VASRbased summarization is the most performant of all the compared techniques, followed by CRA and VASRS. SeaHorn and UltimateAutomizer employ abstractionrefinement loops, and so take significantly longer to run the test suite.
6 Related work
Compositional analysis
Our analysis follows the same highlevel structure as compositional recurrence analysis (CRA) [5, 12]. Our analysis differs from CRA in the way that it summarizes loops: we compute loop summaries by overapproximating loop by vector addition systems and computing reachability relations, whereas CRA computes loop summaries by extracting recurrence relations and computing closed forms. The advantage of our approach is that is that we can use VASR to accurately model multipath loops and can make theoretical guarantees about the precision of our analysis; the advantage of CRA is its ability to generate nonlinear invariants.
Vector addition systems
Our invariant generation method techniques upon Haase and Halfon’s polytime procedure for computing the reachability relation of integer vector addition systems with states and resets [8]. Generalization from the integer case to the rational case is straightforward. Continuous Petri nets [3] are a related generalization of vector addition systems, where time is taken to be continuous (VASR, in contrast, have rational state spaces but discrete time). Reachability is for continuous Petri nets is polytime [6] and transitive closure is definable in linear arithmetic [1].
Sinn et al. present a technique for resource bound analysis which is related to our loop summarization procedure [16]. Sinn et al.’s method is based on computing a lossy vector addition system with states that simulates a piece of code, proving termination of the VASS, and then extracting resource bounds from the ranking function. Our method differs in several respects. First, Sinn et al. model programs using vector addition systems with states over the natural numbers, which enables them to use termination bounds for VASS to compute upper bounds on resource usage. We use VASS with resets over the rationals, which (in contrast to VASS) have a Presburgerdefinable reachability relation, enabling us to summarize loops. Moreover, Sinn et al.’s method for extracting VASS models of program is heuristic, whereas our method gives precision guarantees.
Symbolic abstraction
The main contribution of this paper is a technique for synthesizing the best abstraction of a transition formula expressible in the language of VASR (with or without states). This is closely related to the symbolic abstraction problem, which computes the best abstraction of a formula within an abstract domain. The problem of computing best abstractions has been undertaken for finiteheight abstract domains [15], template constraint matrices (including intervals and octagons) [13], and polyhedra [19, 5]. Our best abstraction result differs in that (1) it is for a disjunctive domain and (2) the notion of “best” is based on simulation rather than the typical ordertheoretic framework.
References
 [1] M. Blondin, A. Finkel, C. Haase, and S. Haddad. Approaching the coverability problem continuously. In TACAS, pages 480–496, 2016.
 [2] Q. Carbonneaux, J. Hoffmann, and Z. Shao. Compositional certified resource bounds. In PLDI, 2015.
 [3] R. David and H. Alla. Continuous Petri nets. In Proc. 8th Eur. Workshop Applic. Theory Petri Nets, pages 275–294, 1987.
 [4] I. Dillig, T. Dillig, B. Li, and K. McMillan. Inductive invariant generation via abductive inference. In OOPSLA, 2013.
 [5] A. Farzan and Z. Kincaid. Compositional recurrence analysis. In FMCAD, 2015.
 [6] E. Fraca and S. Haddad. Complexity analysis of continuous petri nets. Fundam. Inf., 137(1):1–28, Jan. 2015.
 [7] A. Gurfinkel, T. Kahsai, A. Komuravelli, and J. Navas. The SeaHorn verification framework. In CAV, 2015.
 [8] C. Haase and S. Halfon. Integer vector addition systems with states. In Reachability Problems, pages 112–124, 2014.
 [9] M. Heizmann, Y. Chen, D. Dietsch, M. Greitschus, J. Hoenicke, Y. Li, A. Nutz, B. Musa, C. Schilling, T. Schindler, and A. Podelski. Ultimate automizer and the search for perfect interpolants  (competition contribution). In TACAS, pages 447–451, 2018.
 [10] R. M. Karp and R. E. Miller. Parallel program schemata. J. Comput. Syst. Sci., 3(2):147–195, May 1969.
 [11] Z. Kincaid, J. Breck, A. Forouhi Boroujeni, and T. Reps. Compositional recurrence analysis revisited. In PLDI, 2017.
 [12] Z.
Comments
There are no comments yet.