Distributed Bounded Model Checking

Program verification is a resource-hungry task. This paper looks at the problem of parallelizing SMT-based automated program verification, specifically bounded model-checking, so that it can be distributed and executed on a cluster of machines. We present an algorithm that dynamically unfolds the call graph of the program and frequently splits it to create sub-tasks that can be solved in parallel. The algorithm is adaptive, controlling the splitting rate according to available resources, and also leverages information from the SMT solver to split where most complexity lies in the search. We implemented our algorithm by modifying CORRAL, the verifier used by Microsoft's Static Driver Verifier (SDV), and evaluate it on a series of hard SDV benchmarks.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

10/15/2021

BPPChecker: An SMT-based Model Checker on Basic Parallel Processes(Full Version)

Program verification on concurrent programs is a big challenge due to ge...
02/08/2022

Automated Instantiation of Control Flow Tracing Exercises

One of the first steps in learning how to program is reading and tracing...
05/07/2020

Checking Qualitative Liveness Properties of Replicated Systems with Stochastic Scheduling

We present a sound and complete method for the verification of qualitati...
10/24/2017

Bounded Quantifier Instantiation for Checking Inductive Invariants

We consider the problem of checking whether a proposed invariant ϕ expre...
07/02/2021

Model Checking C++ Programs

In the last three decades, memory safety issues in system programming la...
09/27/2019

LTL Model Checking of Self Modifying Code

Self modifying code is code that can modify its own instructions during ...
01/11/2019

Model Checking Clinical Decision Support Systems Using SMT

Individual clinical Knowledge Artifacts (KA) are designed to be used in ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Program verification has a long history of over five decades and it has been consistently challenged over this entire duration by the continued increase in the size and complexity of software. As the efficiency of techniques and solvers has increased, so has the amount of software that is written. For this reason, scalability remains central to the applicability of program verification in practice.

This paper studies the problem of automated program verification. In particular, we consider Bounded Model Checking (BMC) [DBLP:conf/dac/ClarkeKY03]: the problem of reasoning over the entire space of program inputs but only over a subset of program paths, typically up to a bound on the number of loop iterations and recursive calls. BMC side-steps the need for (expensive and undecidable) inductive invariant generation and instead directly harnesses the power of SAT/SMT solvers in a decidable fragment of logic. BMC techniques are popular; they are implemented in most program verification tools today [DBLP:conf/tacas/Beyer19, Table 5].

Our goal is to scale BMC by parallelizing the verification task and distributing it across multiple machines to make use of larger compute and memory resources. The presence of several public cloud providers has made it easy to set up and manage a cluster of machines. While this distributed platform is available to us, there is a shortage of verification tools that can exploit it.

Parallelizing BMC.  BMC works by generating logical encodings, often called verification conditions or VCs, for a subset of program paths that are then fed to an SMT solver to look for potential assertion violations in the program. We aim to retain the same architecture, where we continue to use the SMT solver as a black-box, but generate multiple different VCs in parallel to search over disjoint sets of program paths. This allows us to directly consume future improvements in SMT solvers, retaining one of the key advantages of BMC.

Our technique works by splitting the set of program paths into disjoint subsets that are then searched independently in parallel. The splitting is done by simply picking a control node and considering the set of paths that go through the node, and the set of paths that do not. Splitting can happen multiple times. The decisions of what node to split and when to split are both taken dynamically by our technique. We refer to the BMC problem restricted to a set of splitting decisions (i.e., nodes that must be taken, and nodes that must be avoided) as a verification partition.

Verification starts by creating multiple processes, each of which have access to input program and are connected over the network. One process is designated as the server while the rest are called clients. The search starts sequentially on one of the clients that applies standard BMC on the input program. At some point in time, which is controlled by the splitting rate, the client chooses a splitting node, thus creating two partitions. The client continues verification on one of the partitions, and sends the other partition to the server. The server is only responsible for coordination; it does not do verification itself. It accumulates the partitions (represented as a set of splitting decisions) coming in from the clients and farms them off to idle clients for verification. Clients can split multiple times. This continues until a client reports a counterexample (in which case, it must be a counterexample in the original program) or the server runs out of partitions and all clients become idle (in which case, the BMC problem is concluded as safe).

The splitting rate is adjusted according to the current number of idle client: it is reduced when all clients are busy, and then increased as more clients becomes available.

Splitting has some challenges that we illustrate using the following snippet of code.

procedure main() {
   var x := 0;
   if (…) { call foo(); x := 1; }
   if (…) { call bar(); }
   if (…) { call baz(); }
   assert(x == 1 || expr);
}

Suppose that the assertion at the end of main is the one that we wish to verify (or find a counterexample) and all uses of variable x are shown in the snippet. The main procedure calls multiple other procedures, each of which can manipulate global variables of the program (not shown). In this case, if we split on the call to foo, then one partition (the one that must take foo) becomes trivial: it is easy to see that the assertion holds in that partition, irrespective of what happens in the rest of the program. We refer to this as a trivial split. Each split incurs an overhead when a partition is shipped to another client where the verification context for that partition must be set up from scratch. Trivial splits are troublesome because they accumulate this overhead without any real benefits in trimming down the search. Unfortunately, it is hard to avoid trivial splits altogether because it can involve custom (solver specific) reasoning (e.g., the fact that variable x is not modified outside of main). Our technique instead aims to reduce the overhead with splitting when possible. The server prioritizes sending a partition back to the client that generated it. Each client uses the incremental solving APIs of SMT solvers to remember backtracking points of previous splits that it had produced. This allows a client to get setup for one of its previous partitions much faster, thus reducing overhead.

Next, consider splitting on the call to bar. In this case, both of the generated partitions must still reason about baz because taking or avoiding bar has no implications on the call to baz. If bar turns out to be simple, while most of the complexity lies inside baz, then both partitions will end up doing the same work and diminish the benefits of parallelization. In this case, we rely on extracting information from the solver (via an unsat core) to make informed splitting choices and avoid duplicating work across partitions.

Implementation.  We have implemented our technique in a tool called Hydra. The sequential BMC technique used by Hydra is stratified inlining (SI) [corral], also referred to as inertial refinement [DBLP:conf/fmcad/Sinha10]. SI incrementally builds the VC of a program by lazily inlining procedure calls. Hydra keeps track of the expanding VC, and frequently splits it by picking a splitting node that has already been inlined in the VC.

We evaluated Hydra on Windows Device Driver benchmarks obtained using the Static Driver Verifier [SDV, sdvurl]

. These benchmarks extensively exercise the various features of C such as heaps, pointers, arrays, bit-vector operations, etc.

[DBLP:conf/sigsoft/LalQ14] and take more than days to verify in a sequential setting.

The contributions of this paper are as follows:

  • We propose a distributed design to enable solving large verification problems on a cluster of machines (Section IV-A and Section IV-B);

  • We design a proof-guided splitting strategy that enables a lazy, semantic division of the verification task (Section III-B and Section IV-C);

  • We implemented our design in a tool called Hydra that achieves a 20 speedup on 32 clients, solving additional benchmarks on which the sequential version timed out (Section V).

The rest of the paper is organized as follows. Section II covers background on VC generation and the stratified inlining algorithm. Section III discusses on how the search is decomposed for parallel exploration while Section IV presents the design of Hydra. Section V presents an evaluation of Hydra and Section VI discusses related work.

Ii Background

We describe our techniques on a class of passified imperative programs. Such a program can have multiple procedures. Each procedure has a set of labelled basic blocks, where each block contains a list of statements followed by a goto or a return. A statement can only be an assume or a procedure call. A procedure can have any number of formal input arguments and local variables. Local variables are assumed to be non-deterministically initialized, i.e., their initial value is unconstrained. An assume statement takes an arbitrary expression over the variables in scope. An example program is shown in Figure 1. A goto statement takes multiple block labels and non-deterministically jumps to one of them.

Passified programs do not have global variables, return parameters of procedures, or assignments. These restrictions are without loss of generality because programs with these features can be easily converted to a passified program [DBLP:conf/rp/LalQ13]; such conversion is readily available in tools like Boogie [boogie]. We also leave the expression syntax unspecified: we only require that expressions can be directly encoded in SMT. Our implementation uses linear arithmetic, fixed-size bit-vectors, uninterpreted functions, and extensional arrays. This combination is sufficient to support C programs [DBLP:conf/sigsoft/LalQ14, conf/popl/LahiriQ08].

We aim to solve the following safety verification problem: given a passified program , is the end of its main reachable, i.e., is there an execution of main that reaches its return statement? This question is answered Yes (or UnSafe) by producing such an execution and the answer is No (or Safe) if there is no such execution. Furthermore, we only consider a bounded version of problem where cannot have loops or recursion. (In other words, loops and recursive calls must be unrolled up to a fixed depth.) This problem is decidable with NEXPTIME complexity [DBLP:conf/rp/LalQ13]. We next outline VC generation for single-procedure (Section II-A) and multi-procedure (Section II-B) programs.

procedure main() {
  int x, y, z; bool c;
  L0: goto L1, L2;
  L1: assume c;
      call foo(x,z);
      goto L3;
  L2: assume !c;
      call bar(x,z);
      goto L3;
  L3: call baz(y);
      goto L4;
  L4: assume z != 0
      return;
}
procedure baz(int y) {
  L10: assume y == 3;
       return;
}
procedure foo(int x, int z) {
  bool d;
  L5: goto L6, L7;
  L6: assume d;
      assume z == x + 1;
      goto L8;
  L7: assume !d;
      assume z == x - 1;
      goto L8;
  L8: return;
}
procedure bar(int x, int z) {
  L9: assume z == x + 5;
      return;
}
Fig. 1: An Example of a Passified Program

Ii-a VC generation for a single procedure

Let be a procedure that takes a sequence of arguments . Further, assume that does not include procedure calls. In that case, we construct a formula such that has a terminating execution starting from arguments if and only if is satisfiable.

The VC is constructed as follows. For each block labelled , let be a fresh Boolean variable and be a unique integer constant. Let be the set of successor blocks of (mentioned in the goto statement at the end of block , if any). Further, let be a conjunction of all assumed expressions in the block. Let be if the block ends in a return statement, otherwise let it be:

(1)

where is an uninterpreted function called the control-flow function.

The variables are collectively referred to as control variables. Intuitively, is true when control reaches the beginning of block during the procedure’s execution. The constraint means that if the control reaches block , then it must satisfy the assumed constraints on the block () and pick at least one successor block to jump to. The function is simply to record chosen successor.

Let be the label of the first block of (where procedure execution begins). Let be the set of block labels in . Then, is . If the VC is satisfiable, then one can read-off the counterexample trace from a satisfying assignment by simply looking at the model for . As an example, the VC of procedure foo of Figure 1 is given in Figure 2.

The arguments of a procedure are its interface variables and we make these explicit in the VC. For instance, we will write to make it explicit that and are the interface variables (free variables) and the rest of the variables are implicitly existentially quantified.

Ii-B Stratified Inlining

Inlining all procedure calls can result in an exponential blowup in program size. For that reason, the stratified inlining (SI) algorithm [corral] constructs the VC of a program in a lazy fashion. For ease in description, assume that each block can have at most one procedure call. For a procedure , let , called the partial VC, be the VC of the procedure constructed as described in the previous section where each procedure call is replaced with an “assume true” statement.


Fig. 2: The VC of procedure foo from Figure 1

Given that programs can only have assume statements, the partial VC of a procedure represents an over-approximation of the procedure’s behaviors, one where it optimistically assumes that each callee simply returns. Similarly, for a procedure , if we replace each call with an “assume false” statement, then we get an under-approximation of . The VC of this under-approximation can be obtained by simply setting the control variables to false for each block with an “assume false” statement. For instance, is an over-approximation of main, whereas the following is an under-approximation: .

A static callsite is defined as the pair that represents the (unique) call of procedure in block . For instance, main of Figure 1 has three callsites: , , . A dynamic callsite is a stack of static callsites that represents the runtime stack during a program’s execution. We assume that main is always present at the bottom of the stack for any dynamic callsite. For instance, represents the call stack where main executed to reach L1 and then called foo.

For a procedure , let be the set of static callsites in . Given a static callsite , and dynamic callsite , let be the dynamic callsite where is pushed on the top of the stack . SI can require to inline the same procedure multiple times. Suppose that a procedure calls twice, once in block  and once in block . Dynamic callsites will help distinguish between the two instances of : the first will have on top of the stack and the latter will have on top of the stack.

We must take care to avoid variable name clashes between different VCs as we inline procedures. For a dynamic callsite and procedure that is at the top of , let be the partial VC of (as described earlier in the section), however for the construction of the partial VC, we use globally fresh control variables (variables of Equation 1), globally fresh block identifiers (constants of Equation 1) as well as globally fresh instances for the local variables. In , the argument is only used for bookkeeping purposes: let refer to the control variable used for block when constructing . If is , then let be . Similarly, if is called from procedure in block , then let be the set of interface variables (actuals) for the call to procedure in block in .

Input: A Program with starting procedure main
Input: An SMT solver
Output: Safe, or UnSafe()
1 C
2 .Assert(pVC(main, [main]))
3 while true  do
4       if outcome == Safe outcome == UnSafe() then
5             return
6      else
7             let NoDecision
8            
9      
Algorithm 1 The Stratified Inlining algorithm.
Input: A Program , a set of callsites
Input: An SMT solver
Output: Safe, UnSafe(), NoDecision(uc, I, C)
1 // Under-approximate check
2 .Push()
3 forall  do
4       .Assert()
5      
6if .Check() == SAT  then
7       return UnSafe(.Model())
8else
9       .UnsatCore()
10.Pop()
11 // Over-approximate check
12 if .Check() == UNSAT  then
13       return Safe
14else
15       .Model()
16      
17      
18       forall  do
19            
20            
21      
22       return NoDecision(, , )
Algorithm 2 SIStep(, , )
Input: A dynamic callsite , An SMT solver
Output: A set of open callsites
1 let
2 .Assert()
3
return
Algorithm 3 Inline(, )

The SI algorithm is shown in Algorithm 1. The algorithm requires an SMT solver with the usual interface. We use the Push API to set a backtracking point and a Pop API that backtracks by removing all asserted constraints until a matching Push call. Further, we assume that a counterexample trace can be extracted from a model returned by the solver.

The algorithm works by iteratively refining over-approximations of the program (in hope of getting an early Safe verdict) and under-approximations of the program (in hope of getting an early UnSafe verdict). Both these approximations are refined by inlining procedures.

Line 1 initializes a set of open dynamic callsites. This set represents procedure calls that have not been inlined yet. The partial VC of main is asserted on the solver in Line 1. SI then iteratively calls SIStep (Algorithm 2) that either returns a definitive verdict (Line 1) or refines the set of open callsites (Line 1).

The SIStep routine, shown in Algorithm 2, does an under-approximate check (Line 2) by assuming that calls at each of the open callsites cannot return (Line 2). If it finds a counterexample trace, SI returns UnSafe. This trace is guaranteed to only go through inlined procedure calls because all the open ones were blocked. Ignore the call to gather the unsat core shown on Line 2 for now; we use this information in the next section.

Next, SIStep does an over-approximate check (Line 2). If this is UNSAT, then SI returns Safe. If the check was satisfiable, then we construct the counterexample trace from the model provided by the solver (Line 2). This trace is guaranteed to go through at least one open call site (because the under-approximate check was UNSAT). The SI algorithm proceeds to inline the procedures called at each of the open callsites that the trace goes through. Such callsites are recorded in variable (Line 2); these get returned for bookkeeping purposes (used in the next section). Callsites in are inlined by asserting the partial VC of the callee, as shown in Line 3 in Algorithm 3. Read the asserted constraint as follows: if the control variable of the calling block is set to true then the VC of the procedure must be satisfied. The use of interface-variables ensures that formals are substituted with actuals for the procedure call. New callsites that are created as a result of the inlining are recorded in and then eventually added back to (Line 2). The control returns to SI and the process then repeats. An example illustrating the execution of SI is shown in Table I.

Define a call tree to be a (prefix-closed) set of dynamic callsites that represents all dynamic callsites that have been inlined by the SI algorithm at any point in time. We call this set as a tree because it can be represented as an unfolding of the program’s call graph.

SIStep Action Open Callsites Inlined Callsites
Step-0 Assert [main, (L1,foo)], [main]
[main, (L2,bar)],
[main, (L3,baz)]
Step-1 Underapprox check: UNSAT
Overapprox check: SAT
Assert [main, (L2,bar)] [main, (L1,foo)]
Assert [main, (L3,baz)]
Step-3 Underapprox check: SAT [main, (L2,bar)]
Return Unsafe
TABLE I: Execution of SI on the program of Fig. 1

Iii Splitting the Search

Hydra employs a decomposition-based strategy to achieve parallelism. During the course of execution of the SI algorithm, Hydra splits the current verification task by picking a dynamic callsite that has already been inlined by SI. This generates two partitions: one that requires executions to pass through (referred to as the must-reach partition), and the other that requires executions to avoid (referred to as the must-avoid partition). This strategy provides for an exhaustive and path-disjoint partitioning of the search space.

Formally, a partition is a pair where is a call tree (i.e., set of inlined callsites) and is a set of decisions (either or for ). As a notation shorthand, for a partition and callsite , let be the partition . Similarly, for a decision , let be the partition . Further, let and . One can also see the above strategy as dividing the proof obligation (correctness theorem) on the complete program into a set of lemmas corresponding to each of the partitions.

This section addresses two primary concerns: how to enforce splitting decisions during search? (Section III-A), and how to choose a callsite for splitting? (Section III-B).

Iii-a Encoding splitting decisions in SI as constraints

The constraint for is relatively straightforward. It is simply . Asserting this constraint any time after SI has inlined will ensure that control cannot go through , thus SI will avoid altogether.

We next describe the encoding of the must-reach constraint by first looking at the single-procedure case. For a procedure , we introduce must-reach control variables , one for each basic block of . Intuitively, setting to true should mean that control must go through block . Recall from Section II that the VC of a procedure uses as a unique integer constant for block and as the control-flow function. We define as the following constraint:

(2)

This constraint enforces that if a block must be reached, then one of its predecessors must be reached. The use of the control-flow function ties this constraint with the procedure’s VC. For any block , asserting , in addition to the VC of will enforce the constraint that control must pass through block . The proof is straightforward and we omit it from this paper.

For multi-procedure programs, we construct the must-reach constraint inductively. Let be the constraint , but where the block identifiers are the same as the ones used in . We construct inductively over the length of . If , then is true. Otherwise, if , then is .

Iii-B Choosing a splitting candidate

Given an unsatisfiable formula , expressed as a conjunction set of clauses , a minimal unsatisfiable core (min-unsatcore) is a subset of clauses whose conjunction is still unsatisfiable and every proper subset of is satisfiable.

Consider the under-approximate check made by SI (Line 2 of Algorithm 2) where it blocks open-callsites and attempts to find a counterexample in the currently inlined portion of the program. This check is a conjunction of constraints, passed via , of two forms. First is the (partial) VCs of inlined callsites (Line 3 of Algorithm 3) and second is the blocked open callsites (Line 2 of Algorithm 2). If the check is unsatisfiable, then we extract its min-unsatcore and represent it as a set of callsites (that may be inlined or may be open). The set represents the current proof of safety of the program. Inlined callsites that are not part of are deemed search-irrelevant because whether they were inlined or not is immaterial to conclude safety of the program (at this point in the search). Formally, those callsites could have been left open (i.e., over-approximated) and the check would still be unsatisfiable. Therefore, the solver is likely to spend its energy searching and expanding the portion of the calltree as the search proceeds further. Consequently, we restrict splitting to a callsite chosen from so that we split where the search complexity lies.

Consider the inlining tree shown in Figure 3, where the open callsites appear as dotted circles and the inlined ones are shown as solid circles; the shaded nodes are the callsites that appear on the min-unsatcore (). In this case, both baz and baz1 are ruled out for falling outside . If we pick some other callsite to split, say qux, then the partition of that split is likely to search in the subtree rooted at qux, whereas the partition will search the portion excluding the subtree rooted at qux

. We use a simple heuristic that roughly balances these partitions. Let the current inlined calltree be

and let be the subtree rooted at . We choose the splitting callsite as the one that has maximum number of relevant callsites in its subtree (excluding main because that would be a trivial split). Formally, the splitting callsite is:

In our example, we will pick bar for splitting.

Fig. 3: Proof-guided splitting

We note that this choice of balancing the partitions is just a heuristic. In general, there may be dependencies between callsites. For instance, blocking one callsite can block others or make others be must-reach because of control-flow dependencies in the program. Our heuristic does not capture these dependencies. Furthermore, in our implementation, we do not insist on obtaining a minimal unsat core in order to reduce the time spent in computing it. Solvers generally provide a best-effort unsat core minimization (e.g., the core.minimize option in Z3).

Iv Hydra Design and Implementation

Hydra employs a client-server distributed architecture with a single server and multiple clients. The server (Section IV-B) is responsible for coordination while verification happens on the clients (Section IV-A). A client can decide to split its current search, at which point it sends one partition to the server while it continues on the other partition. If a client finishes its current search with a Safe verdict, it contacts the server to borrow a new partition that it starts solving.

Iv-a Client Design

Input: A Program
Input: An SMT solver
1 while true  do
2       (GET_PARTITION) outcome Verify(, , ) (OUTCOME, outcome)
Algorithm 4 Client-side verification algorithm

All clients implement Algorithm 4. We use as a message-response interaction with the server. is the asynchronous version where a message is sent to the server but a response is not expected. A client repeatedly requests the server for a partition (Line 4), solves it (Line 4) and sends the result back to the server on completion. Each client uses its own dedicated SMT solver () for verification.

Verify (Algorithm 5) maintains a stack of decisions and a set of open callsites . It starts off by preparing the input partition (Lines 5 to 5): it inlines the calltree of and asserts all its splitting decisions. The client then enters a verification loop (Line 5 that repeatedly uses SIStep (Line 5) to expand its search. If a counterexample is found (Line 5), the client returns an UnSafe verdict back to the server. If SIStep returns NoDecision, it implies that some more procedures were inlined but the search remained inconclusive; in this case, we perform the necessary bookkeeping on the set of currently open callsites (), new procedures inlined (), and the minunsatcore from the unsat query ().

If SIStep returned Safe, then the search on the current partition has finished and the client must pick another partition to solve. This is done by returning the Safe verdict (Line 5). The check on Line 5 is an optimization that we describe later in this section.

After checking the outcome of SIStep, the client decides if it is time to split its search. This is referred abstractly as “TimeToSplit” on Line 5: the exact time is communicated by the server to client (see Section IV-C). For splitting, the client picks a callsite in accordance with our proof-guided splitting heuristic (from Section III-B) using the stored unsatcore . We note that the correctness of our technique does not rely on when a split happens or what splitting callsite is chosen. Therefore, these decisions can be guided by heuristics and tuned to optimized performance.

Input: A Program , A partition of , A solver
Output: Safe, or UnSafe()
1 .reset(), , // Setup input partition forall   do
2        ,
3forall  do
4        if then .Assert() if then .Assert()
5while true  do
6        outcome SIStep(, , ) if outcome == UnSafe()  then
7               return outcome
8       else if outcome == NoDecision then
9               , ,
10       else
11               if (POP)==YES  then
12                      repeat
13                             let .Pop(), ,
14                     until  == MUSTAVOID .Push(), .Assert(must-reach(c)), ,
15              else
16                      return outcome
17              
18       if TimeToSplit then
19               .Push() .Assert() (SEND_PARTITION, ) ,
20       
Algorithm 5 Verify(, , )

After splitting, the client continues along the partition with the decision (let’s call this partition ). The other partition () is sent to the server (Line 5). Note further that on Line 5, the client creates a backtracking point that is just before the decision on is asserted. This backtracking point is exploited in Lines 5 to 5. When the client finishes search on , it pings the server to know if has already been solved by a different client or not. If not, it simply backtracks the solver state and asserts the flipped decision to immediately get set up for search on . This way, the client avoids the expensive setup of initializing a new partition (Lines 5 to 5). Because splitting can happen multiple times, the loop on Line 5 is necessary to follow along the recorded stack of decisions.

Iv-B Server Design

We assume that each client has an associated unique identifier. Each message coming from a client is automatically tagged with the client’s identifier. The server maintains two data structures. The first is an array of double-ended queues. The queue stores all partitions produced by client . The second is a queue of clients that are currently idle.

The server processes incoming messages as follows. On receiving the message from client , it does a push-left to insert into . (The manipulation of is depicted in Figure 4.) This ensures that latest partitions (which have a larger number of decisions and a larger call tree) from a particular client appear on the left of .

On receiving message from client , the server needs to reply with a partition because has just become idle. If all queues are empty, then is inserted into and the client is kept waiting for a reply. Otherwise, the server picks the longest queue , does a pop-right

and replies to the client. This strategy attempts to avoid skew in queue sizes. Further, the rightmost partition is the smallest in that queue, which mimizes the setup time for that partition for the client that will get it. As more partitions are reported to the server (via a

SEND_PARTITION), the server loops through , replying to as many idle clients as possible with partitions popped-right from the currently longest queue.

The message from client implies that the client wishes to backtrack to its previously reported partition. Because reported partitions are pushed-left, and other clients (on GET_PARTITION) steal from the right, the previously reported partition from client is exactly the leftmost one in , if any. Thus, the server replies YES back to the client if is non-empty, followed by a pop-left. Otherwise, the server replies NO.

The server additionally listens to OUTCOME messages. If any client reports Unsafe, all clients are terminated and the UnSafe verdict is returned to the user. The server returns Safe verdict to the user when all queues in are empty and all clients are idle (i.e., consists of all clients).

Our design of the work-queue , as an array of sorted (by size) work-queues, is in contrast with using a centralized queue that is standard in classical work-stealing algorithms. It is useful for avoiding skew in queue sizes, distributing smaller partitions first, and enabling the client-backtracking optimization.

Fig. 4: Maintaining the double-ended queues

Iv-C Adaptive rate of splitting

While a low splitting rate inhibits parallelism, a high rate increases the partition-initialization overhead on the clients. Hydra uses a dynamic split-rate determined by the number of idle clients and the number of partitions available at the server. Each client maintains a split time interval (in seconds) and splits the search (“TimeToSplit” of Algorithm 5), if seconds have elapsed since the last split. The value of starts as a constant and is updated by the server as follows:

(3)

In the first case, a client’s splitting is slowed down in proportion to its queue size (divided by the number of idle clients). The second case applies when there are no idle clients. Increasing by a factor of reduces the rate of splitting drastically. We use and in our experiments.

V Experimental Results

We evaluated Hydra on SDV benchmarks [sdvbench], compiled from real-world code that exercises all features of the C language: loops and recursion (up to a bounded depth), pointers, arrays, heap, bit-vector operations, etc. The performance of Hydra was compared against Corral [corral] that implements the sequential Stratified Inlining algorithm. Corral forms a good baseline because it has been optimized heavily for SDV over the years [DBLP:conf/sigsoft/LalQ14].

We only selected hard benchmarks (where Corral took at least 200 seconds to solve or timed out). We ran Hydra with clients. Timeout was set to hour. We conducted our experiments with the server running on one machine (16 core, 64 GB RAM) and the 32 clients running on another machine (72-core with Intel Xeon Platinum 8168 CPU and 144 GB RAM), communicating via HTTP calls. As clients never communicate amongst themselves, this setup is equivalent to running clients on different machines.

(a) Scatter plot of running times
(b) Histogram of speedup of Hydra over Corral
(c) Cactus plot of instances solved
Fig. 5: Comparison of Hydra against Corral on SDV benchmarks

V-a Hydra versus Corral

Instances Solved.  There were a total of programs. Hydra solved 99 instances (30%) on which Corral timed out (34 of these were Safe and the rest 65 were Unsafe). Conversely, Corral solved 12 (4%) instances on which Hydra timed out. We did not investigate these cases in detail; in a practical scenario one can simply dedicate a single client to run Corral and get the best of both tools. Overall, Hydra solved 183 (55%) instances while Corral solved only 96 (29%) instances. Interestingly, there were 138 instances (41%) that were unsolved by both Hydra and Corral indicating the need for further improvements.

Verification Time.  In terms of running time, Hydra was significantly faster than Corral in most (84%) cases: Figure (a)a shows the scatter plot of running times. Figure (b)b is a histogram of the speedup of Hydra over Corral. For example, there were instances where Hydra was more than faster than Corral. A small fraction of instances had slowdowns as well, but the worst among these was , i.e., Corral was faster than Hydra. Over all instances, the mean speedup is and median speedup is . Speedup excludes cases in which one of the tools timed out.

Scalability.  Figure (c)c is a cactus plot illustrating the scalability of Hydra with the number of clients. Corral is able to solve only 58 instances within 1000 seconds. As expected, running Hydra with only a single client results in worse performance than Corral (solves only 46 instances within 1000 seconds). However, the performance improves significantly with the number of clients (solves 166 instances with 32 clients within 1000 seconds).

V-B Effectiveness of proof-guided splitting

Empirical Analysis.  We define dissimilarity of a client with respect to client as , where , denote the set of callsites that and have inlined, respectively, when Hydra finishes. A high value of implies that the clients did a different search. Note, however, that will never be because certain callsites (like main) will always need to be inlined by each client.

Across all benchmarks and all client pairs, the average dissimilarity value was 0.55. This indicates enough difference among the inlined calltrees across clients.

Statistical Analysis.  We implemented a randomized splitting algorithm that (1) decides to split/not-to-split at each inlining step uniformly at random, (2) if it has decided to split, it selects the splitting call-site uniformly at random.

We ran this randomized splitting algorithm 5 times for each program and compared the minimum verification time of these 5 runs for each instance against that of Hydra. Using the Wilcoxon Sign Rank test, we found that Hydra is statistically better than the randomized splitting algorithm with a p-value of 0.0012, indicating that the performance of the splitting heuristic is not accidental.

V-C Server optimizations

We measured the performance impact of the server-side queue implementation on Hydra. We compared our double-ended queues from Section IV-B against a classical work-stealing queue implementation. Our implementation allowed Hydra to complete on 40% more cases where using the classical version made Hydra time out. Further, Hydra’s performance was 8.5 times faster when both implementations terminated with a verdict.

In terms of controlling the splitting rate, both the performance (p-value of ) and the number of splits (p-value of ) were found to be statistically better with split-rate feedback.

Vi Related Work

Parallelizing SAT/SMT solvers.  In contrast to parallelizing verification tasks, parallelizing SAT/SMT solvers has attracted wider attention. There have been two popular, incomparable [Marescotti2018], approaches to parallelizing satisfiability problems: portfolio-based techniques [Chaki2016, Hyvarinen2008, Wintersteiger2009] and divide and conquer techniques (decomposition [Niklas2004, Hamadi2011] or partitioning [Zhang1996, Martins2010, Bohm1996, Jurkowiak2001]). Portfolio-based strategies either run multiple different algorithms or multiple instances of a randomized algorithm. They tend to work well in the presence of heavy-tailed distribution of problem hardness.

Divide and conquer strategies are most similar to our work. They either use static partitioning, based on the structure of the problem [Marescotti2017], or dynamic partitioning [Martins2010] based on run-time heuristics. However, unlike partitioning on individual variables at the logical-level, we split at the program-level based on its call graph. In our setting, the VC of a program can be exponential in the size of the program. This makes it hard to directly use parallelized solvers; we must split even before the entire VC is generated. Furthermore, parallelized solvers are still not as mainstream as sequential solvers. Using solvers as a black-box allows us to directly leverage continued improvements in solver technology

Parallelizing program verification.  Saturn [aiken2007overview] is one of the earlier attempts at parallelizing program verification. Saturn performs a bottom-up analysis on the call graph, generating summaries of procedures in parallel. While the intra-procedural analysis of Saturn is precise, it only retains abstractions of function summaries, thus cannot produce precise refutations of assertions like BMC.

There have been attempts at parallelizing a top-down abstraction-based verifier [albarghouthi2012parallelizing] as well as the property-directed reachability (PDR) algorithm [Bradley2011, Een2011, Marescotti2017, Chaki2016] and k-induction [Kahsai2011, Blicha2020]. These all rely on the discovery of inductive invariants for proof generation, a fundamentally different problem than BMC. It would be interesting future work to study the relative speedups obtained for parallelization in these respective domains.

Closer to BMC, parallelization has been proposed by a partitioning of the control-flow graph [ganai2008d]. This approach does static partitioning (based on program slicing) and does not consider procedures at all (hence, must rely on inlining all procedures). Further, it has only been evaluated on a single benchmark program. Our technique, on the other hand, performs dynamic partitioning, supports procedures and has been much more extensively evaluated.

References