1 Introduction
It is common to use symbolic methods in program analysis and verification and related disciplines. Symbolic execution has found numerous use cases in test generation and concolic testing and is widely deployed in practice. Likewise, many modern software verification tools are based on bounded model checking, which combines symbolic execution with SMT solvers to successfully attack hard problems in their problem domain.
On one hand, multiple productionquality SMT solvers are readily available and even provide a common interface [Barrett et al., 2016]. While a certain degree of integration is required to achieve optimal performance, solvers have attained nearly commodity status. This is in stark contrast to symbolic interpretation, which is usually implemented adhoc and is not reusable across tools at all. The only exception may be KLEE [Cadar et al., 2008], a symbolic interpreter for bitcode, which is used as a backend by a few analysis tools. Undoubtedly, the fact that it is based on the (ubiquitous) intermediate language has helped it foster wider adoption.
Arguably, interpreters (virtual machines) for controlled program execution, as required by dynamical analysis tools, are already complex enough, without involving symbolic computation. To faithfully interpret realworld programs, many features are required, including an efficient memory representation, support for threads, exceptions and a mechanism to deal with system calls. Complexity is, however, undesirable in any system and even more so in verification tools.
For these reasons, we propose to lift symbolic computation into a separate, selfcontained module with minimal interfaces to the rest of the verification or analysis system (see Figure 1). The best way to achieve this is to make it compilationbased, that is, provide a transformation that turns ordinary (explicit) programs into symbolic programs automatically. The transformed program only uses explicit operations, but it uses them to manipulate symbolic expressions and as a result can be executed using offtheshelf components.
The expected result is that the proposed transformation can be combined with an existing solver and a standard explicit interpreter of bitcode. Depending on how one combines those ingredients, they will obtain different analysis tools. As an example, in Section 5.3, we use the transformation, an existing explicitstate model checker and an SMT solver STP [Ganesh and Dill, 2007] to build a simple controlexplicit, datasymbolic (CEDS) [Bauch et al., 2016] model checker. Building a tool which implements symbolic execution would be even simpler.^{1}^{1}1In fact, a controlexplicit, datasymbolic model checker already contains a subroutine (in our case about 200 lines) which effectively implements a symbolic executor.
1.1 Goals
Our primary goal is to design a selfcontained program transformation that can be used in conjunction with other components to piece together symbolic analysis and verification tools. We would like the transformation to exhibit the following properties:

allow mixing of explicit and symbolic computation in a single program,

expose a small interface to the rest of the system, and finally

impose minimal runtime overhead.
The first property is important because it often does not make sense to perform all computation within a program symbolically. For instance, a symbolic execution engine may wish to natively perform library calls requested by the program. Therefore, it ought to be possible to request, from the outset, that a particular value in the program is symbolic or explicit.
It is unfortunately not possible to execute the symbolised program in a context that is completely unaware of symbolic computation. However, the requirements imposed on the execution environment can be minimised and defined clearly (see Section 5.4). Finally, exploring all possible executions given a single input sequence is already expensive and when used in the context of model checking, we would like to incur as small a penalty as possible.
1.2 Contribution
The idea that various tasks can be shifted between compile time and run time is as old as higherlevel programming languages. In the context of verification, there is a large variety of approaches that put different tasks at different points between these two extremes. Symbolic computation is traditionally found near the interpretation end of the spectrum.
Our contribution is to challenge this conventional wisdom and show that this technique can be shifted much farther towards the compilation end. Further, by treating symbolic computation as an abstract domain, we pave the way for other abstract domains to be approached in this manner. Finally, all relevant source code and benchmark data is freely available online.^{2}^{2}2https://divine.fi.muni.cz/2018/sym
2 Related Work
Program verification techniques based on symbolic execution [King, 1976], symbolic program code analysis [nie, 1999] and symbolic approach to model checking [McMillan, 1993] have been the subject of extensive research.
As for symbolic execution, the approach most closely related to ours is represented by the KLEE symbolic execution engine [Cadar et al., 2008] that performs symbolic execution on top of IR [Lattner and Adve, 2004]. Besides standalone usage as a symbolic executor, KLEE has become also a backend tool for other types of analyses and for verification. For example, the tool Symbiotic [Chalupa et al., 2017] combines code instrumentation and slicing with KLEE to detect errors in C programs.
Besides symbolic execution, other forms of abstract interpretation, like predicate abstraction, is often used in code analysis. The most successful approaches are based either on counterexampleguided abstraction refinement (CEGAR) [Clarke et al., 2000]
or lazy abstraction with interpolants
[Albarghouthi et al., 2012], which are implemented in tools such as BLAST [Beyer et al., 2007] and CPAchecker [Beyer and Keremoglu, 2011]. There are numerous research results in this direction, summarised in e.g. [Beyer and Löwe, 2015, Sousa et al., 2017, Weißenbacher, 2010].A verification algorithm that goes beyond static program code analysis and combines predicate abstraction with concrete execution and dynamic analysis has been also introduced [Daniel and Parízek, 2015]. This approach can successfully verify programs that feature unbounded loops and recursion, unlike standard symbolic execution.
Using instrumentation (as opposed to interpretation) for symbolic verification was proposed a few times, but the only extant implementation that works with realistic programs is derived from the CUTE [Sen et al., 2005] family of concolic testing systems, i.e. the tools CREST [Burnim and Sen, 2008] and jCUTE [Sen and Agha, 2006]. In particular, CREST uses the CIL toolkit^{3}^{3}3CIL is short for C Intermediate Language [Necula et al., 2002], and is a simplified subset of the C language. The toolkit can automatically translate standard C into the intermediate (CIL) form. The CIL form can be optionally brought into the form of threeaddress code and this feature is used in CREST. to insert additional calls into the program to perform the symbolic part of concolic execution. The approach as described in [Sen et al., 2005] is limited to symbolic computation, unlike the present paper, which works with arbitrary abstract domains.
A related, process was proposed by Khurshid et al. [Khurshid et al., 2003]: in this case, handannotated code was processed by Java PathFinder [Havelund and Pressburger, 2000], an explicit state model checker. Our approach, in contrast, is fully automatic and more general.
Finally, besides symbolic code analysis and symbolic execution, there are approaches that perform symbolic model checking as such. The key differentiating aspect of symbolic model checking is the ability to decide equality of symbolically represented states. This is important in particular for verification of parallel and reactive programs where the state space contains diamonds or loops, respectively. The tool Sym [Mrázek et al., 2016] is focused on bitprecise symbolic model checking of parallel C and C++ programs. It extends standard explicit state space exploration with SMT machinery to handle nondeterministic data values. As such, Sym is halfway between a symbolic executor and an explicitstate model checker. Unlike the solution presented in this paper, Sym does not separate the symbolic interpreter from the core of the model checker. In general, symbolic model checking is more often used with synchronous systems, for example [Cavada et al., 2014].
3 Abstraction as a Transformation
While in the present work, our main goal is to transform a concrete program into one that performs symbolic computation, it is expedient to formulate the problem more generally. We will think in terms of an abstraction, in the sense of abstract interpretation, which has two main components: it affects how program states are represented and it affects the computation of transitions between those states. There are two levels on which the abstraction operates:

the static level, concerning syntactic constructs and the type system

dynamic, or semantic, which concerns actual execution and values
In the rest of this section, we will define syntactic abstraction (which covers the static aspects) as means of encoding abstract semantics into a concrete program. While it is convenient to think of the transformed program in terms of abstract values and abstract operations, it is also important to keep in mind that at a lower level, each abstract value is concretely represented (encoded). Likewise, abstract operations (instructions) are realised as sequences of concrete instructions which operate on the concrete representation of abstract values (see Figure 4, left). Those considerations are at the core of the second, dynamic, aspect of abstraction. Reflecting this structure, the program transformation therefore proceeds in two steps:

the input program is (syntactically) abstracted

concrete values are replaced with abstract values

concrete instructions are replaced with abstract instructions


abstract instructions are replaced by their concrete realisation
The remainder of this section is organised as follows: in Section 3.1, we describe the expected concrete semantics of the input program. Section 3.2 then introduces syntactic abstraction, Section 3.3 deals with representation and typing of values in the abstracted program, Section 3.6 goes on to describe the treatment of instructions. Section 3.7 briefly discusses interactions of multiple domains within a program and finally Section 3.8 gives an overview of relational abstract domains that we use to perform a symbolic computation.
3.1 States and Transitions
We are interested in general programs, e.g. those written in the C programming language. Abstraction is often described in terms of states and transitions. In case of C programs, a state is described by the content of memory (including registers). Transitions describe how a state changes during computation performed by a given program. In this paper, we will use smallstep semantics, partly because the prototype implementation is based on ,^{4}^{4}4Programs in are in a partial SSA form, a special case of threeaddress code [Aho, 2007]. Threeaddress code is essentially smallstep semantics in an executable form. and in part because it is a natural choice for describing parallel programs.
In this description, the transitions between program states are given by the effect of individual instructions on program state. Which instruction is executed and which part of the program state it affects is governed by the source state. Our discussion of abstract transitions will therefore focus on the effects of instructions: as an example, the add instruction obtains two values of a specified bit width from some locations in the program state, computes their sum and stores the result to a third location.
3.2 Syntactic Abstraction
The input program is given as a collection of functions, each consisting of a control flow graph where nodes are basic blocks – each a sequence of nonbranching instructions. Memory access is always explicit: there are instructions for reading and writing memory, but memory is never directly copied, or directly used in computation. While this further restricts the semantics of the input program, it is not at the expense of generality: programs can be easily put in this form, often using commodity tools.
With these considerations in mind, the goal of what we will call syntactic abstraction is to replace some of the concrete instructions with their abstract counterparts. The general idea is illustrated in Figure 2.
Apart from a few special cases, an abstract instruction takes abstract values as its inputs and produces an abstract value as its result. The specific meaning of those abstract instructions and abstract values then defines the semantic abstraction. The result of syntactic abstraction being performed on the program is, therefore, that the modified program now performs abstract computation. In other words, the transformed program directly operates on abstract states and the effect of the program on abstract states defines the abstract transition system.
We posit that syntactic abstraction, as explained in following sections, will automatically lead to a good semantic abstraction – i.e. one that fits the standard definition: a set of concrete states can be mapped to an abstract state, an abstract state can be realised as a set of concrete states and those operations are compatible in the usual sense.
3.3 Abstract Values and Static Types
A distinguishing feature of the syntactic approach to abstraction is that it admits a static type system. In other words, the variables in the program can be assigned consistent types which respect the boundary between abstract and concrete values. While a type system is a useful consistency check, its main importance lies in facilitating a description of how syntactic abstraction operates.^{5}^{5}5Additionally, since the SSA portion of the IR is already statically typed, we can take advantage of this existing type system in the implementation. Nonetheless, the treatment in this section does not depend on and would be applicable to any datafloworiented program representation.
We start by assuming existence of a set of concrete scalar types, , and of concrete pointer types. We define a map that builds up all relevant types from the set of scalar types. The set of all types derived from a set of scalars is defined inductively as follows:

, that is, each scalar type is included in

if then also the product type ,

if then also the disjoint union ,

if then ( denotes a pointer type)
In other words, the set describes finite (nonrecursive) algebraic types over the set of concrete scalars and pointers.
A fundamental building block of the syntactic abstraction is a bijective map , defined for each abstract domain separately,^{6}^{6}6Since multiple abstract domains can coexist in a single program, we use the lower index to distinguish them. from the set of concrete scalar types to the set of abstract scalar types (we let be the union of all the : ). Each value which exists in the abstracted program then belongs to a type in – in other words, values are built up from concrete and abstract scalars.
In particular, this means that the abstraction works with mixed types – products and unions with both concrete and abstract fields. Likewise, it is possible to form pointers to both abstract values and to mixed aggregates.
3.4 Semantic Abstraction
The maps and let us move from concrete to abstract scalar types (and back) and are strictly a syntactic construct. The semantic (dynamic) counterpart of are lift and lower: these are not maps, but rather abstract operations (instructions). Just as and translate between concrete and abstract types, lift goes from concrete to abstract values and lower the other way around. While both the and lift and lower are defined on scalar types and scalar values respectively, they can be all naturally extended to the set of all types (and their corresponding values).
3.5 Representation
Besides , there is another type map, which we will call which maps each abstract scalar type in to a concrete type in . This is the representation map, and describes how abstract values are represented at runtime. This is to emphasise that abstract values are, in the end, encoded using concrete values that belong to particular concrete types. Moreover, in general for , : the representation is unrelated to the original concrete type. An abstract floating point number may be, for instance, represented by a concrete pointer to a concrete aggregate made of two 32bit integers.
While lift and lower are the valuelevel counterparts of the map , we need another pair of operations to accompany the representation map . We will call them freeze and thaw, and they map between and . The idea is that memory manipulation (and manipulation of any concrete aggregates) is done entirely in terms of the representation types (using frozen values), but abstraction operations on scalar values are defined in terms of the abstract type (i.e. thawed values). The use of freezing and thawing is illustrated in Figure 3.
One challenge in the implementation of freeze and thaw is that the memory layout of a program should not change^{7}^{7}7The exact layout of data (structures, arrays, dynamic memory) is normally the responsibility of the program itself, more so in the case of intermediate or lowlevel languages. For this reason, it is often the case that the program will make various assumptions about relationships among addresses within the same memory object. It is impractical, if not impossible, to automatically adapt the program to a different data layout, e.g. in case the size of a scalar value would change due to abstraction. as a sideeffect of the transformation. This means that for many abstract domains, the freeze operation must be able to store additional data associated with a given address, and thaw must be able to obtain this data efficiently. While this is an implementation issue, it is an important part of the interface between the transformed program and the underlying execution or verification platform. However, since the program is transformed as a whole, there is no need to explicitly track this additional data at runtime.^{8}^{8}8The only way a value can be copied from one memory address to another is via a load instruction followed by a store, both of which are instrumented and as such also transfer the supplementary data.
An additional role of the freeze/thaw pair is to maintain dynamic type information at runtime. While it is easy to assign static types to instruction operands and results, this is not true for memory locations: different parts of the program can load values of different static types from the same memory address. For this reason, the type system which governs memory use must be dynamic and allow dispatch on the actual (runtime) type of the value stored at a given memory location.
3.6 Abstract Instructions
As indicated at the start of this section, it is advantageous to formulate the transformation in two phases, using intermediate abstract instructions. Abstract instructions take abstract values as operands and give back abstract values as their results. It is, however, of crucial importance that each abstract instruction can be realised as a suitable sequence of concrete instructions. This is what makes it possible to eventually obtain an abstract program that does not actually contain any abstract instructions and execute it using standard (concrete, explicit) methods.
In the first (abstraction) phase, concrete instructions are replaced with their abstract versions: instruction inst with a type is replaced with a_inst of type . Additionally, lift, lower, freeze and thaw are inserted as required.^{9}^{9}9For instance, concrete operands to abstract operations are lifted, arguments to necessarily concrete functions (e.g. real system calls) are lowered. Memory stores are replaced with freeze and loads with thaw. The implementation is free to decide which instructions to abstract and where to insert value lifting and lowering, as long as it obeys typing constraints. The specific approach we have taken is discussed in Section 3.7 and the implementation aspects are described in Section 5.2.
After the first phase is finished, the program may be further manipulated in its abstract form before continuing the second phase of the abstraction. This gives a practical, implementationdriven reason for performing the abstraction transformation in two steps, in addition to the conceptual clarity it provides.
In the second step, all abstract operations, including lift and lower, are realised using concrete subroutines. The realisation (implementation) of a_inst is of the type , clearly obviating the need for thawing and refreezing the value.
3.7 Abstract Domains
Necessarily, in an abstracted program, the values it manipulates will come from at least two different domains: the concrete domain and the chosen abstract domain, in line with the first requirement laid out in Section 1.1. This is because it is usually impractical to abstract all values that appear in the program. Additional abstract domains, therefore, do not pose any new conceptual problems.
For the sake of simplicity, we only consider instructions where all operands come from the same domain (this holds for both the concrete and for abstract domains). Moreover, the only instructions where the domain of the result does not agree with the domain of the operands are crossdomain conversion operations, which take care of transitioning values from one domain to another. The two most important instances of those operations are lift and lower^{10}^{10}10The names lift and lower allude to the relationship of the abstract and the concrete domain. In applications with multiple abstract domains, it may be expedient to include additional instructions that convert directly from one abstract domain to another, although in theory it is always possible to go through the concrete domain. introduced in Section 3.3.
Even though crossdomain conversions are necessary in the program, it is a major task of the proposed transformation to minimise their number. A natural approach that would minimise unwanted domain transitions is to propagate abstract domains along the data flow of the program. That is, if an abstract instruction a_inst is already in the program and its result is also used as an operand elsewhere, we prefer to lift all the users of into the abstract domain of a_inst (cf. Figure 4, right), instead of lowering into a set of concrete values. This simple technique, which we call value propagation, forms the core of our entire approach (see also Figure 2). It is worth noting that this is particularly simple to do for programs in (partial) SSA form^{11}^{11}11Again, this is true of bitcode – it is already in a partial SSA form. This simplifies our prototype implementation somewhat., although the variables which are not part of SSA are still somewhat challenging. Those are covered by the freeze and thaw operations, which are discussed in more detail in Section 5.1.
Given the above, a logical starting point is to pick an initial set of instructions that we wish to lift into an abstract domain. Those could be explicit lift instructions placed in the program by hand, they could be picked by static analysis, or could be the result of abstraction refinement. The abstract program can be then obtained by applying value propagation to this initial set of abstract instructions.
3.8 Constraints and Relational Domains
The last important aspect of abstraction is its effect on control flow of the program. It is often the case that control flow depends on specific values of variables via conditional branching. The condition on the branch is typically a predicate on some value, or a relationship among multiple values that appear in the program. If the involved values are, in fact, abstract values, it is quite possible that both results of the predicate or comparison are admissible and that the conditional branch can therefore go both ways. The way we deal with this in the transformation is that the program makes a nondeterministic choice on the direction of the branch. How this nondeterministic choice is implemented is again deferred to the execution environment. In any case, the choice of direction provides additional information – constraints – on the possible values of variables (cf. Figure 6).
We encode those constraints into assume instructions: given an abstract value and the constraint, assume computes the constrained value. Additionally, depending on the abstract domain, it may be desirable to constrain values other than those directly involved in the comparison. Alternatively, relational domains may be able to encode constraint information themselves: this is in particular the case in the term domain which realises symbolic computation. Therefore, for the purposes of the present paper, simply inserting a single assume instruction on each outgoing edge of the conditional is sufficient.
3.9 Summary
In the above, we have set up abstraction in such a way that it fits into a transformationbased approach. In particular, we have separated syntactic and semantic abstraction and shown how the former induces the latter. The proposed syntactic abstraction captures how the program is changed, while semantic abstraction captures the dynamic (execution) aspects of abstract interpretation.
4 Symbolic Computation
Now that we have described how to perform program abstraction as a transformation, the remaining task is to recast symbolic computation as an abstract domain. Fortunately, this is not very hard: the abstract values in the domain are terms, while the abstract instructions simply construct corresponding terms from their operands. In other words, symbolic computation is realised by a free algebra (that is, the term algebra). The input values
of the program correspond to nullary symbols – in practice, a unique nullary symbol is created each time the program obtains a value from its input. All the remaining values are built up as terms of bitvector operations and constants. We will refer to the abstract domain thus formed as the
term domain.It is not hard to see that a program transformed this way will simply perform part of its computation symbolically in the usual sense. Additionally, as the computation progresses, assume instructions impose a collection of constraints on the nullary symbols of the abstract algebra (i.e. the input values). Each constraint takes the form of a term with a relational symbol in the root position. These constraints become part of the abstract state, effectively ensuring that the term domain is fully relational.^{12}^{12}12An abstract domain is called relational when it is capable of preserving information about relationships among various abstract values that appear in the program.
It is a requirement of abstract interpretation that it is possible to construct an abstract state from a set of concrete states. In the term domain this can be achieved by assigning, to each memory location that differs in some of the concrete states^{13}^{13}13In the present paper, we only deal with abstract (symbolic) values. The structure of the program state, that is, the arrangement of the program memory, is taken to be always represented explicitly, i.e., it belongs squarely to the concrete domain., a fresh nullary symbol. We then impose constraints that ensure that exactly the input set of concrete states is represented by the resulting abstract state. For instance, if the input set of concrete states differs by the value of a single variable , and this variable takes values , , and in the 4 input states, a suitable constraint would be .
In some cases, it is impossible to construct the requisite constraints using only conjunction and relational operators. To ensure that the term domain forms a lattice (in particular that a least upper bound always exists), it is necessary to allow the constraints to use logical disjunction.
While the above considerations regarding constraints are an important part of the theoretical underpinnings of the approach, it is almost always entirely impractical to shift back and forth between concrete and abstract states. In practice, therefore, the constraints described in this section simply arise through the assume mechanism described in Section 3.8. As such, the constraints that appear in a given state form a path condition. Finally, we note that the least upper bound of abstract states defined above corresponds to path conditions which arise from path merging in symbolic execution.
5 Implementation
We have implemented the proposed program transformation on top of , using its C++ API. Both the transformation and all additional code (model checker and solver integration) was done in C++. The transformation itself is the largest component, totalling 3200 lines of code.
5.1 Freeze and Thaw
As mentioned in Section 3.7, our implementation is based on the simple idea of maximum propagation of abstract values along the data flow of the program. While the SSA part of the algorithm is essentially trivial, storing abstract values in program memory is slightly more challenging. The purpose of freeze and thaw is to overcome this issue.
While the dynamic type system that freeze and thaw provide to the transformed program and the ability to store additional data associated with a given memory address are largely orthogonal at the conceptual level, they are closely related at the level of implementation. This is because in principle, a dynamic type system only requires that additional information is attached to values manipulated by the program, and that this information is correctly propagated. Since apart from memory access, the program is statically typed, it is sufficient to perform dynamic type checks (and dispatch) when a value is thawed, while freeze simply stores the incoming static type.
Implementationwise, our target platform is a virtual machine with provisions for associating userdefined metadata to arbitrary memory addresses. This makes the implementation of freeze and thaw simple and efficient. However, in case such a mechanism is not available, it is sufficient to implement an associative map, using addresses as keys, inside the program.
5.2 Domains
In realworld programs, there are often variables which do not benefit from abstraction or from symbolic treatment, and are best represented explicitly. For this reason, the toplevel abstract domain that we use is the disjoint union (i.e. the typelevel sum) of the concrete domain and the term domain. If we denote the concrete domain with and the symbolic (term) domain with , the type toplevel type is .
Since the freeze and thaw operations maintain dynamic type information in the executing program, it is possible to quickly compute operations for which both operands are concrete (explicit). If both operands are symbolic, a symbolic operation is directly invoked, while in the remaining case – one symbolic and one concrete argument – the concrete argument is lifted into the symbolic (term) domain. The procedure is called a lifter and is automatically synthesized for each abstract operation that appears in the program. An example of a lifter is given in Figure 4 (right).
It is also possible to use the domain , which corresponds to concolic execution (i.e. it maintains both a concrete and a symbolic value at the same time). This requires the additional provision that assume instructions obtain concrete values that satisfy the symbolic constraints on their abstract counterparts (an SMT solver will typically provide a model in case the assumptions were feasible, which can then be used to reconstruct the requisite concrete values).
5.3 Execution & Model Checking
We represent the terms described in Section 4 by a simple tree data structure. The abstract instructions that correspond to operations on values construct a tree representation of the requisite term by joining their operands to a new root node, where only the operation in the root node depends on the specific abstract instruction. The approach is illustrated in Figure 5, 6, 7.
This arrangement makes it easy to extract the terms from program state and convert them to a form appropriate for further processing by the analysis tool. Recall that one of the motivating applications of the proposed approach was symbolic model checking. In this case, the state space is explored by an explicitstate model checker and the extracted terms are converted into SMT queries. To this end, the model checker must be slightly extended and coupled to an SMT solver, since:

transitions of the program must be checked for feasibility,

the state equality check must compare terms semantically, not syntactically.
Of course, the hitherto extracted terms must be left out of bytewise comparison that is performed on the remaining (concrete) parts of program states. In our case, the required changes in the model checker were quite minor, amounting to about 1200 lines of code.
5.4 Interfaces
One of the goals of the proposed approach was to minimise interfaces between the abstracted program and the verification or execution environment (recall goal 2 set in Section 1.1). In total, there are four interactions at play:

nondeterministic choice: under abstraction, conditionals in the program may be undetermined, and both branches may need to be explored; the abstraction uses a nondeterministic choice operator to capture this effect and defers an exploration strategy to the verifier

freeze and thaw must be provided as an interface for storing abstract values in program memory

enumeration of enabled (feasible) transitions must take the abstract values into account, if required by the domain(s) used in the program

state equality (if applicable in the verification approach) must be extended to take the employed abstract domains into account
The latter two points depend on the chosen abstract domains. For the term domains, both interfaces reduce to extracting abstract values (terms) from program state and executing an SMT query.
6 Evaluation
First of all, we have checked the performance of the transformation itself. On C programs from the SVCOMP suite, the transformation time was negligible. On more complex C++ programs, it took at most a few seconds, which is still fast compared to subsequent analysis.
As described in Section 5, we have built a simple tool which integrates the proposed transformation with an explicitstate model checker and an SMT solver. The experimental evaluation was done using this prototype integration (denoted ‘*’ in summary tables).
6.1 Code Complexity
One of our criteria for the approach presented in this paper was reduced code complexity. While counting lines of code is not a very sophisticated metric, it is a reasonably good proxy for complexity and is easily obtained.^{14}^{14}14We have used the utility sloccount
to get estimates of module size in terms of lines of code.
The results are summarised in Table 1.component  *  KLEE  Sym  CBMC 

transformation  3.2  0  0  (22) 
virtual machine  (10)  15  6  7.5 
exploration  (1.5)  1.2  1  2.3 
solver integration  1.2  8  0  14 
SAT solver  (45)  (45)  (23)  (5.5) 
SMT solver  (80)  (80)  (400)  16 
runtime support  1  0  0  0 
total unique  5.4  24.2  7  39.8 
total shared  136.5  125  423  27.5 
6.2 Benchmarks
For benchmarking, we have used a subset of the SVCOMP [Beyer, 2016] test cases, namely 7 categories, summarised in Table 2, along with statistics from our prototype tool. We have only taken examples with finite state spaces since the simple approach outlined in Section 5.3 cannot handle infinite recursion or infinite accumulation loops. In total, we have selected 1160 SVCOMP inputs. In many cases (especially in the array category), the benchmarks are parametric: we have included both the original SVCOMP instance and smaller instances to check that the approach works correctly, even if it takes a long time or exceeds the memory limit on the instances included in SVCOMP.
In all cases, the time limit, for each test case separately, was 10 minutes (wall time) and the memory limit was 10 GiB. The test machines were equipped with 4 Intel Xeon 5130 cores clocked at 2 GHz and 16 GiB of RAM.
In addition to the present approach, we have measured two additional tools: CBMC 5.8 and Sym, both of which are symbolic model checkers targeting C code. The overall results of the comparison, in terms of the number of cases solved, are presented in Table 3.
6.3 Comparison 1: CBMC
The results from CBMC 5.8 were obtained using the tool’s default configuration. CBMC [Kroening and Tautschnig, 2014] is a mature bounded model checker for C programs with a good track record in SVCOMP and is built around a symbolic interpreter for ‘goto programs’, its own intermediate form, not entirely dissimilar to CIL or in its spirit. Besides KLEE, the CBMC toolkit is among the best established members of the interpretationbased school of symbolic computation.
tag  solved  oot/oom  states  search  ce 

array  96  94  170.3 k  52:00  54:15 
bitvector  17  15  3166  3:12  2:33 
loops  72  106  14.0 k  53:52  11:40 
productlines  336  239  20.2 M  4:36:44  43:11 
pthread  9  36  609.4 k  3:31  0:54 
recursion  47  34  3955  16:16  7:41 
systemc  14  45  25.0 k  3:29  1:34 
total  591  569 
tag  total  *  Sym  CBMC 

array  190  96  68  93 
bitvector  32  17  9  2 
loops  178  72  67  9 
productlines  575  336  411  234 
pthread  45  9  0  1 
recursion  81  47  43  22 
systemc  59  14  27  0 
total  1160  591  625  361 
Besides the total number of test cases solved (within the 10 minute limit), we were interested in comparing the time required to do so. Time requirements are summarised in Table 4.
With regards to its state space exploration strategy, CBMC can be thought of as the middle ground between the approach taken by KLEE and that of our proposed tool. On one hand, KLEE, being a symbolic executor, does not attempt to identify alreadyvisited program states. CBMC is a bounded model checker, which means it stores a single formula representing the entire set of reachable states. Our present approach, being based on an explicitstate model checker, stores sets of program states and compares them for equality using an SMT solver.
tag  models₁  *  CBMC  models₂  *  Sym 

array  73  34:16  13:58  58  3:18  42:54 
bitvector  2  0:37  0:01  9  0:55  2:30 
loops  4  0:03  0:02  62  22:25  19:04 
productlin.  182  4:08:24  7:25  183  0:30  28:33 
pthread  0  0  0  0  0  0 
recursion  22  0:01  0:13  43  4:02  13:58 
systemc  0  0  0  14  3:29  6:43 
6.4 Comparison 2: Sym
Sym [Mrázek et al., 2016] is a preexisting, interpretationbased symbolic model checker which also works with bitcode. Similar to our approach, Sym relies on a state equality checker, in this case based on quantified bitvector formulae. In theory, this yields coarser state equivalence and consequently smaller state spaces, but we could not confirm this in our set of benchmarks: the total number of states stored across the benchmarks that finished using both tools was 802 thousand for Sym and 93 thousand with the approach described in this paper. Additionally, QBV satisfiability queries are typically much more expensive than those used by our prototype tool, which can help explain the speed difference between the tools.
7 Conclusion
We have presented an alternate approach to symbolic execution (and abstract interpretation in general), based on compilationbased techniques, instead of relying on the more traditional interpreterbased approach. We have shown that the proposed approach has important advantages and no serious drawbacks. Most importantly, our technique is modular to a degree not possible with symbolic or abstract interpreters. This makes implementation of software analysis and verification tools based on symbolic execution almost trivial. An important side benefit is that the approach allows for abstract domains other than the term domain, leading to a different class of verification algorithms with a comparatively small investment.
References
 nie [1999] Principles of program analysis, 1999. Springer. ISBN 9783540654100. doi: 10.1007/9783662038116.
 Aho [2007] Alfred V. Aho. Compilers: Principles, Techniques, & Tools. AddisonWesley series in computer science. Pearson/Addison Wesley, 2007. ISBN 9780321486813.
 Albarghouthi et al. [2012] Aws Albarghouthi, Arie Gurfinkel, and Marsha Chechik. From UnderApproximations to OverApproximations and Back. In Tools and Algorithms for the Construction and Analysis of Systems, volume 7214 of LNCS, pages 157–172, Berlin, Heidelberg, 2012. Springer.
 Barrett et al. [2016] Clark Barrett, Pascal Fontaine, and Cesare Tinelli. The Satisfiability Modulo Theories Library (SMTLIB). www.SMTLIB.org, 2016.
 Bauch et al. [2016] Petr Bauch, Vojtech Havel, and Jiri Barnat. Control explicitdata symbolic model checking. ACM Trans. Softw. Eng. Methodol., 25(2):15:1–15:48, 2016. doi: 10.1145/2888393.
 Beyer [2016] Dirk Beyer. Reliable and reproducible competition results with BenchExec and witnesses report on SVCOMP 2016. In TACAS, pages 887–904. Springer, 2016. ISBN 9783662496732. doi: 10.1007/9783662496749˙55.
 Beyer and Keremoglu [2011] Dirk Beyer and M. Erkan Keremoglu. CPAchecker: A Tool for Configurable Software Verification. In Computer Aided Verification, volume 6806 of LNCS, pages 184–190. Springer, 2011. ISBN 9783642221101.
 Beyer and Löwe [2015] Dirk Beyer and Stefan Löwe. Interpolation for Value Analysis. In Software Engineering & Management, volume 239 of LNI, pages 73–74. GI, 2015.
 Beyer et al. [2007] Dirk Beyer, Thomas A. Henzinger, Ranjit Jhala, and Rupak Majumdar. The software model checker Blast. International Journal on Software Tools for Technology Transfer, 9(5):505–525, Oct 2007. ISSN 14332787. doi: 10.1007/s100090070044z.
 Burnim and Sen [2008] J. Burnim and K. Sen. Heuristics for scalable dynamic test generation. In Proceedings of the 2008 23rd IEEE/ACM International Conference on Automated Software Engineering, ASE ’08, pages 443–446, Washington, DC, USA, 2008. IEEE Computer Society. ISBN 9781424421879. doi: 10.1109/ASE.2008.69.
 Cadar et al. [2008] Cristian Cadar, Daniel Dunbar, and Dawson R. Engler. KLEE: Unassisted and automatic generation of highcoverage tests for complex systems programs. In USENIX Symposium on Operating Systems Design and Implementation, pages 209–224. USENIX Association, 2008.
 Cavada et al. [2014] Roberto Cavada, Alessandro Cimatti, Michele Dorigatti, Alberto Griggio, Alessandro Mariotti, Andrea Micheli, Sergio Mover, Marco Roveri, and Stefano Tonetta. The nuXmv Symbolic Model Checker. In Computer Aided Verification (CAV), volume 8559 of LNCS, pages 334–342. Springer, 2014.
 Chalupa et al. [2017] Marek Chalupa, Martina Vitovská, Martin Jonáš, Jiří Slabý, and Jan Strejček. Symbiotic 4: Beyond reachability. In Tools and Algorithms for the Construction and Analysis of Systems (TACAS), volume 10206 of LNCS, pages 385–389. Springer, 2017. ISBN 9783662545805.
 Clarke et al. [2000] Edmund Clarke, Orna Grumberg, Somesh Jha, Yuan Lu, and Helmut Veith. CounterexampleGuided Abstraction Refinement. In Computer Aided Verification (CAV), volume 1855 of LNCS, pages 154–169. Springer, 2000.
 Daniel and Parízek [2015] Jakub Daniel and Pavel Parízek. PANDA: Simultaneous Predicate Abstraction and Concrete Execution. In Hardware and Software: Verification and Testing, volume 9434 of LNCS, pages 87–103. Springer International Publishing, 2015. ISBN 9783319262871.
 Ganesh and Dill [2007] Vijay Ganesh and David L. Dill. A decision procedure for bitvectors and arrays. In Werner Damm and Holger Hermanns, editors, Computer Aided Verification, pages 519–531. Springer, 2007. ISBN 9783540733683.
 Havelund and Pressburger [2000] Klaus Havelund and Thomas Pressburger. Model checking JAVA programs using JAVA PathFinder. International Journal on Software Tools for Technology Transfer, 2(4):366–381, 2000.
 Khurshid et al. [2003] Sarfraz Khurshid, Corina S. Păsăreanu, and Willem Visser. Generalized symbolic execution for model checking and testing. In Proceedings of the 9th International Conference on Tools and Algorithms for the Construction and Analysis of Systems, TACAS’03, pages 553–568. SpringerVerlag, 2003.
 King [1976] James C. King. Symbolic execution and program testing. Commun. ACM, 19(7):385–394, 1976. ISSN 00010782.
 Kroening and Tautschnig [2014] Daniel Kroening and Michael Tautschnig. CBMC – C bounded model checker. In TACAS, pages 389–391. Springer, 2014. ISBN 9783642548628. doi: 10.1007/9783642548628˙26.
 Lattner and Adve [2004] Chris Lattner and Vikram Adve. LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation. In Proceedings of the 2004 International Symposium on Code Generation and Optimization (CGO’04), Palo Alto, California, Mar 2004.
 McMillan [1993] Kenneth L. McMillan. Symbolic Model Checking. Kluwer Academic Publishers, 1993. ISBN 0792393805.
 Mrázek et al. [2016] Jan Mrázek, Petr Bauch, Henrich Lauko, and Jiří Barnat. SymDIVINE: Tool for ControlExplicit DataSymbolic State Space Exploration. In Model Checking Software, volume 9641 of LNCS, pages 208–213. Springer, 2016.
 Necula et al. [2002] George C. Necula, Scott McPeak, Shree P. Rahul, and Westley Weimer. Cil: Intermediate language and tools for analysis and transformation of c programs. In R. Nigel Horspool, editor, Compiler Construction, pages 213–228, Berlin, Heidelberg, 2002. Springer. ISBN 9783540459378.
 Sen and Agha [2006] Koushik Sen and Gul Agha. Cute and jcute: Concolic unit testing and explicit path modelchecking tools. In Thomas Ball and Robert B. Jones, editors, CAV, pages 419–423, 2006.
 Sen et al. [2005] Koushik Sen, Darko Marinov, and Gul Agha. Cute: a concolic unit testing engine for c. In Michel Wermelinger and Harald Gall, editors, ESEC/SIGSOFT FSE, pages 263–272, 2005.
 Sousa et al. [2017] Marcelo Sousa, César Rodríguez, Vijay D’Silva, and Daniel Kroening. Abstract Interpretation with Unfoldings. In Computer Aided Verification (CAV), volume 10427 of LNCS, pages 197–216. Springer, 2017.
 Weißenbacher [2010] Georg Weißenbacher. Program Analysis with Interpolants. PhD thesis, University of Oxford, 2010.
Comments
There are no comments yet.