## 1 Overview

2LS is a static analysis and verification tool for sequential C programs. At its core, it uses the II algorithm (-invariants and -induction) [1], which integrates bounded model checking, -induction, and abstract interpretation into a single, scalable framework. II relies on incremental SAT solving in order to find proofs and refutations of assertions, as well as to perform termination analysis [2].

This year’s competition version introduces new product and power domain combinations to support invariant inference for programs that manipulate shape and content of dynamic data structures [4]. Moreover, there is an improved encoding of memory safety properties.

#### Architecture.

The architecture of 2LS has been described in previous
competition contributions [6, 5].
In brief, 2LS is built upon the CPROVER
infrastructure [3] and thus uses
*GOTO programs* as the internal program representation.
The analysed program is
translated into an acyclic,
over-approximate single static assignment (SSA) form, in which loops
are cut at the edges returning to the loop head. Subsequently, 2LS
refines this over-approximation by computing inductive invariants in
various abstract domains represented by parametrised logical formulae,
so-called templates [1]. The competition version uses
the zones domain for numerical variables in combination with
our shape domain for pointer-typed variables and the symbolic paths
domain described below.
The SSA form is bit-blasted into a propositional formula and given to a
SAT solver. The II algorithm then incrementally amends the formula
to perform loop unwindings and invariant inference based on
template-based synthesis [1].

## 2 New Features

The major improvements for SV-COMP’19 are all related to analysis of heap-manipulating programs. We build on the shape domain presented last year [5] and introduce abstract domain combinations that allow us to analyse both shape and content of dynamic data structures. Furthermore, we present encoding of assertions that are used for verifying memory safety properties.

### 2.1 Combinations of Abstract Domains

The new capability of 2LS to jointly analyse shape and content of dynamic data structures takes advantage of the template-based synthesis engine of 2LS. Invariants are computed in various abstract domains where each domain has the form of a template while relying on the analysis engine to handle the domain combinators.

#### Memory model

In our memory model, we represent dynamically allocated objects by
so-called *abstract dynamic objects*. Each such object is an
abstraction of a number of concrete dynamic objects allocated by the
same malloc call (i.e. at the same program location) [4].

#### Shape Domain

For analysing the shape of the heap, we use an improved version of the
shape domain that we introduced last year [5]. The domain
over-approximates the *points-to* relation between pointers and
symbolic addresses of memory objects in the analysed program: for each
pointer-typed variable and each pointer-typed field of an abstract
dynamic object , we compute the set of all addresses that may
point to [4].

#### Template Polyhedra Domain

For analysing numerical values, we use the template polyhedra abstract
domains, particularly the *interval* and the *zones*
domains [1].

#### Shape and Polyhedra Domain Combination

Since both domains have the form of a template formula, we simply use them side-by-side in a product domain combination—the resulting formula is a conjunction of the two template formulae [4].

This combination allows 2LS to infer, e.g., invariants describing an unbounded singly-linked list whose nodes contain values between 1 and 10. We show an example of such a list in Figure 1. Here, all list nodes are abstracted by a single abstract dynamic object (i.e. we assume that they are all allocated at the same program location). The invariant inferred by 2LS for such a list might look as follows:

(1) |

The first disjunction describes the shape of the list—the *next* field
of each node points to some node of the list or to ^{1}^{1}1Here,
is an abstraction of the fields of all concrete objects
represented by . Analogously, is an
abstraction of symbolic addresses of all represented objects.. The second
part of the conjunct is then an invariant in the interval domain over all
values stored in the list—it expresses the fact that the value of each node
lies in the interval between 1 and 10.

### 2.2 Symbolic Paths

To improve precision of the analysis, we let 2LS compute different
invariants for different *symbolic paths* taken by the analysed
program. We require a symbolic path to express which loops were
executed at least once. This allows us to distinguish situations when
an abstract dynamic object does not represent any really allocated
object and hence the invariant for such abstract dynamic object is not
valid [4].

The symbolic path domain allows us to iteratively compute a set of symbolic paths (represented by guard variables in the SSA) with associated shape and data invariants . The aggregated invariant is then , which corresponds to a power domain combination.

### 2.3 Memory Safety

To verify memory safety, appropriate assertions are inserted into the program. We now describe the structure of these assertions for different types of memory errors.

#### Dereferencing/Freeing a Pointer

To check for this kind of errors, we add an assertion to each location where *p or free(p) occurs [4]. Since the shape domain over-approximates the set of all addresses that may point to, absence of such errors can be proven. If an error is found, we use BMC to check its reachability.

#### Dereferencing/Freeing a Freed Pointer

Using a single abstract dynamic object to represent multiple concrete
objects poses problems when trying to determine if a particular
concrete object (within the abstract one) has already been freed or
not. To resolve this, for each abstract dynamic object ,
we non-deterministically select a single concrete object represented
by and materialize it as . After that,
every time is freed, we non-deterministically set a
special variable to *true*. This allows us to
generate an assertion for each location
containing *p or free(p) [4].

#### Memory Leaks

To find a memory leak, we check whether, at the end of the program, there is an object such that . If such exists, there is a memory leak present (some object represented by the corresponding has has not been freed). On the other hand, absence from memory leaks can only be proven for programs without loops (or with loops that can be fully unwound). This is because checking that may be equal to a materialized object is not sufficient to prove that all objects represented by the corresponding abstract object were freed [4].

## 3 Strengths and Weaknesses

This year’s improvements mostly influenced results in the MemSafety category where 2LS narrowly missed the podium in 4th place. There were many new benchmarks and a new sub-category (MemCleanup) whose benchmarks were handled well by 2LS.

One of the main strengths of 2LS is verification of programs requiring joint reasoning about shape and content of dynamic data structures. There were no such benchmarks in previous SV-COMP editions, thus, we contributed 10 of our own benchmarks. Combining our shape domain with the zones domain for value analysis allows 2LS to successfully verify 9 out of 10 of these benchmarks (the last one has timed out). None of the other tools was able to verify more than 3 of these benchmarks.

Still, there remain a lot of challenges and limitations. The heap domain is quite simple and over-approximates the heap too much to allow us to analyse complicated properties of dynamic data structures. Moreover, reasoning about array contents is still lacking, and the 2LS’ algorithm II does not yet support recursion. Moreover, there is a large number of unconfirmed witnesses, especially in the termination analysis (500 points lost).

## 4 Tool Setup

The competition submission is based on 2LS version 0.7.^{2}^{2}2Executable available at
https://gitlab.com/sosy-lab/sv-comp/archives-2019
The archive contains the binaries needed to run 2LS
(2ls-binary, goto-cc), and so no further installation is needed.
There is also a wrapper script 2ls which is used by Benchexec
to run the tools over the verification benchmarks. See the wrapper
script also for the relevant command line options given to 2LS. The
further information about the contents of the archive could be find in the
README file. The tool info module for 2LS is called
two_ls.py and the benchmark definition file
2ls.xml.
As a back end, the competition submission of 2LS uses Glucose 4.0.
2LS competes in all categories except Concurrency and Java.

## 5 Software Project

2LS is maintained by Peter Schrammel with pull requests contributed
by the community.^{3}^{3}3https://github.com/diffblue/2ls/graphs/contributors
It is publicly available under a BSD-style license.
The source code is available at http://www.github.com/diffblue/2ls.

## References

- [1] Brain, M., Joshi, S., Kroening, D., Schrammel, P.: Safety Verification and Refutation by -Invariants and -Induction. In: SAS. LNCS, vol. 9291, pp. 145–161. Springer (2015)
- [2] Chen, H.Y., David, C., Kroening, D., Schrammel, P., Wachter, B.: Bit-Precise Procedure-Modular Termination Proofs. TOPLAS 40 (2017)
- [3] Clarke, E.M., Kroening, D., Lerda, F.: A tool for checking ANSI-C programs. In: TACAS. LNCS, vol. 2988, pp. 168–176. Springer (2004)
- [4] Malík, V., Hruska, M., Schrammel, P., Vojnar, T.: Template-based verification of heap-manipulating programs. In: FMCAD. pp. 103–111 (2018)
- [5] Malík, V., Martiček, Š., Schrammel, P., Srivas, M., Vojnar, T., Wahlang, J.: 2LS: Memory Safety and Non-termination (Competition Contribution). In: TACAS. pp. 417–421. Springer (2018)
- [6] Schrammel, P., Kroening, D.: 2LS for Program Analysis (Competition Contribution). In: TACAS. LNCS, vol. 9636, pp. 905–907. Springer (2016)

Comments

There are no comments yet.