Reachability Logic for Low-Level Programs

Automatic exploit generation is a relatively new area of research. Work in this area aims to automate the manual and labor intensive task of finding exploits in software. In this paper we present a novel program logic to support automatic exploit generation. We develop a program logic called Reachability Logic, which formally defines the relation between reachability of an assertion and the preconditions which allow them to occur. This relation is then used to calculate the search space of preconditions. We show that Reachability Logic is a powerful tool in automatically finding evidence that an assertion is reachable. We verify that the system works for small litmus tests, as well as real-world algorithms. An implementation has been developed, and the entire system is proven to be sound and complete in a theorem prover. This work represents an important step towards formally verified automatic exploit generation.



page 1

page 2

page 3

page 4


Verification of the IBOS Browser Security Properties in Reachability Logic

This paper presents a rewriting logic specification of the Illinois Brow...

A Constructor-Based Reachability Logic for Rewrite Theories

Reachability logic has been applied to K rewrite-rule-based language def...

The Divide-and-Conquer Subgoal-Ordering Algorithm for Speeding up Logic Inference

It is common to view programs as a combination of logic and control: the...

Inductive Reachability Witnesses

In this work, we consider the fundamental problem of reachability analys...

All-Path Reachability Logic

This paper presents a language-independent proof system for reachability...

Connecting Program Synthesis and Reachability: Automatic Program Repair using Test-Input Generation

We prove that certain formulations of program synthesis and reachability...

Automatic Heap Layout Manipulation for Exploitation

Heap layout manipulation is integral to exploiting heap-based memory cor...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Exploit generation is the task of finding a security vulnerability in a program, as well as a way to leverage that vulnerability. This is typically seen as a laborious manual task requiring intricate knowledge of the system under investigation. In contrast, recent studies consider automated exploit generation (AEG) [6, 30]. Even though AEG is “still in its infancy” [2], the knowledge obtained by the exposed exploits has already proven to be valuable to guard and improve software.

Exploit generation, from an abstract point of view, is about reasoning over reachability of unwanted states. For example, a common exploit is a buffer overflow that overwrites the return address, stored at the top of the stack frame. For such exploits, the problem of AEG can be reformulated to deciding whether, just before a return, a state is reachable in which the return address has been modified with respect to the initial state. Other types of exploits may aim to redirect control flow using indirect (dynamically computed) jumps. For such exploits, the problem of AEG can be reformulated as a decision problem whether a state is reachable in which an indirection leads to an arbitrary address. To summarize, a large part of AEG can be reduced to reasoning over reachability of assertions in low-level programs.

Current program logics are not suitable for reasoning over reachability of assertions. Hoare Logic [17] reasons over all possible execution paths to prove correctness of a program with regards to some property. In AEG, a vulnerable state will typically only occur in some corner case, making Hoare Logic unsuitable. Reverse Hoare Logic [13] reasons over total reachability. This means that it defines the relation between a state and all execution paths that lead to it. Listing every possible path that leads to a certain state is difficult, if not unfeasible, and makes Reverse Hoare Logic unfeasible for AEG. Knowing that there is one path that leads to the assertion under investigation is enough.

This paper proposes a novel theoretical foundation for reachability, that can be used for AEG. The first step is to formalize an academic programming language similar to the well-known While [17] language. Whereas While is intended to be an abstract model of high-level programming languages, this paper proposes jump as an abstract model of low-level representations of executable behavior such as assembly or LLVM IR [23]. Reason is that AEG typically applies to executables: it considers low-level properties (e.g., stack overflows or return-oriented-programming) on low-level code. The language Jump is characterized by being low-level, having unstructured control-flow (jumps instead of loops) and an unstructured flat memory model. Moreover, it is non-deterministic, allowing us to model the uncertainty of the semantics of various constructs found in executables [15, 11].

In a similar fashion to Hoare Logic being defined for While, this paper proposes Reachability Logic (RL) for jump. Reachability Logic consists of triples with a precondition, a program, and a postcondition. Assuming the postcondition models a state, a reachability triple states that if the precondition holds, there exists at least one path to a state that satisfies the postcondition. Whereas Hoare Logic is intended to reason over correctness, RL is thus intended to reason over reachability.

Hoare Logic comes with a proof methodology (precondition generation) that allows establishing correctness of a program. In similar fashion, we provide a methodology for precondition generation for RL. A key observation is that the nature of precondition generation between the two logics differ. Hoare Logic requires over-approximative knowledge, e.g., loop invariants and full path exploration, as it is intended for verification. Reachability Logic allows formal reasoning with under-approximative knowledge: no invariants or full path exploration is necessary. We show that Reachability Logic defines a traversable search space. The precondition that shows reachability, if any paths exist, is somewhere in that search space. The precondition generation is proven to be sound and complete; we know that the search space describes only actual reachability evidence, and that it describes all possible ways to reach the intended state. The cost of having both soundness and completeness, is that the search space becomes infinite. However, finding one path from assertion to initial state suffices and thus there is no need for full search space traversal to find exploits.

We demonstrate applicability of the proposed theory in two ways. First, we implement a search space generator for RL and demonstrate on various litmus tests how a search space is generated, traversed and preconditions are found. Second, we consider two realistic assembly implementations (Quicksort and Karatsuba multiplication) and model them in Jump. We show how evidence of reachability is automatically generated.

To summarize, the contributions of this paper are:

  • A novel logic for reasoning over reachability of assertions in low-level programs;

  • A formally proven correct precondition function that automatically computes initializations leading to states described by the postcondition;

  • The application of this logic to various litmus tests as well as two assembly implementations.

All results, source code and the formalized proof of correctness in Isabelle / HOL [25, 22, 12] are publicly available 111

Section 2 introduces RL and argues why existing logics are not suitable for AEG. Section 3 and  4 introduce the jump language and the exploit generation mechanism for it. Litmus tests and applications are described in Sections 5 and 6. Section 7 discusses the search space. Related work is discussed in Section 8 before we conclude in Section 9.

2 Reachability Logic compared to other program logics

Name Definition Usage
Hoare Logic Correctness
Reverse Hoare Total reachability
Reachability Logic Reachability
Table 1: Overview of program logics. Arrow denotes the transition relation induced by program .

Reachability Logic (RL) revolves around triples , for some program . Here, is a state-predicate that describes a state to be reached. State-predicate  represents from where a state is reachable: how must the machine be initialized for to be possible? These predicates thus formulate properties over states, which assign values to memory and registers, including heap, stack frame, and instruction pointer.

Table 1 provides definitions for both some existing logics and the proposed Reachability Logic. Both Hoare Logic (HL) and Reverse Hoare Logic (RHL) can be extended with a frame rule, producing a “separation” version of the logic [26]. We consider the most elementary versions of these logics. A reachability triple expresses that for any state satisfying predicate there exists some non-deterministic execution such that a state satisfying is reached.

Hoare Logic

HL [17] shows absence of bugs, with respect to some property, or, in other words, program correctness. We argue why it cannot be used for reachability. First, consider an approach based on deriving an HL triple of the form . This would express that all executions of the program lead to a desired state, which is unrealistic. Typically only a few cases lead to the desired state and as such this approach would often lead to providing no information on reachability. Second, consider an approach based on deriving a HL triple of the form . The intuition would be to derive preconditions that ensure that the state cannot be reached. The negation of such a precondition may then be an initialization where the exploit does happen.

However, there are various counterarguments to this hypothetical approach. First, by definition of Hoare triples, an initial state satisfying does not necessarily lead to the Q-state, i.e, this approach does not show reachability. Second, the negation of can grossly over-approximate the set of initial states that may lead to the Q-state, even in the case of a hypothetical perfect weakest precondition generation (which is infeasible in practice due to scalability issues [4]).

As a final argument against the use of HL for reachability, we argue that HL inherently requires reasoning over all execution paths. It requires full path exploration and invariants over loops, which is known as the Achilles’ heel of formal verification. A logic to be used for AEG should not require any over-approximative reasoning: it reasons over some path, or some number of iterations of a loop. A logic for AEG should not require loop invariants.

Reverse Hoare Logic

RHL [13] shows total reachability. Specifically, it shows that all -states are reachable from a -state. RHL requires to characterize all states leading to which can easily become infeasible. We illustrate this with the RHL triple below.

From precondition , there is a valid path to a state satisfying postcondition . However, looking at the definition of RHL in Table 1, this is not a valid RHL triple. For all states that satisfy the postcondition, there must be a state that satisfies the precondition that leads to it. For example, is a state that satisfies the postcondition, but no state exists that satisfies leads to it. RHL triples need to specify every state that leads to the postcondition. In the example above, this is trivial, but in more involved cases, this will quickly become infeasible. In the case of loops, the precondition might be infinitely big, or require a manually written loop variant. This makes RHL unsuitable as a foundation for AEG. For the purpose of AEG, it satisfies to find one path, and indeed the following RL-triple holds: . In other words, the universal quantification in RHL triples again requires a form of overapproximative reasoning (specifically: loop invariants), which should not be necessary in the case of AEG.

Incorrectness Logic

IL [24] has its basis in RHL, but with some changes. The postcondition has an added exit-condition, indicating if the program terminated in an error or not. Like RHL, IL uses manually defined loop variants to deal with loops. Finally, IL replaces conditionals by so called assumes, that assume the conditional to hold or not, depending on the selected branch. While these changes make it easier to work with IL, the triples are still the same as in RHL.

Hoare Logic, Reverse Hoare Logic and Incorrectness Logic are not suitable for reasoning about reachability in the context of AEG. The key difference between Reachability Logic and existing logics is that Reachability Logic is the only logic that satisfies all of the following: 1.) RL is able to reason over the existence and reachability of paths, 2.) RL allows under-approximative semantics of the program under investigation, and 3.) is able to deal with non-determinism. Reachability Logic instead does allow us to express the relation between a postcondition to be reached and an initial state configuration. RL is suitable for not only describing this relation, but also automatically generating preconditions. The coming sections describe RL in more detail, including precondition generation and examples to illustrate this process.

3 The jump language

The jump language is intended as an abstract representation of low-level languages such as LLVM IR [23] or assembly. It has no explicit control-flow; instead it has jumps to addresses. It consists of basic blocks of elementary statements that end with either a jump or an exit. Blocks are labeled with addresses. Memory is modeled as a mapping from addresses to values (see Definition 2 below). Variables from a set represent registers and flags. The values

stored in variables or memory are words (bit-vectors) of type


The following design decisions have been made regarding jump.


We explicitly include non-determinism through an statement that allows to retrieve some value out of a set. Non-determinism allows modeling of external functions whose behavior is unknown, allows dealing with uncertain semantics of assembly instructions and allows modeling user-input and IO. The statement is the only source of non-determinism in jump.

Unstructured memory

Memory essentially consists of a flat mapping of addresses to values. There is no explicit notion of heap, stack frame, data section, or global variables. This is purposefully chosen as it allows to reason over pointer aliasing. For example, it allows Reachability Logic to formulate statements as “the initial value of this pointer should be equal to the initial value of register ” which is interpreted as a pointer pointing to the return address at the top of the stack frame.

No structured control-flow

All control flow happens through either conditional jumps or indirect jumps. Indirect control flow is typically introduced by a compiler in case of switch-statements, callbacks, and to implement dynamic dispatch. Note that a normal instruction such as the x86 instruction ret implicitly is an indirect jump as well.

Definition 1

A jump program is defined as the pair where is the entry address, and is a mapping from addresses to blocks. A block is defined by the grammar in Figure 1.

= Sequence, Exit
Conditional jump, Indirect jump
= Variable assignment
Nondeterministic assign
Store v in address e
= Value, variable, deref, operation, negate
Binary operators
Figure 1: jump

A block consists of a sequence of zero or more statements, followed by either a conditional jump, indirect jump or . The conditional jump jumps to the address only if the given expression evaluates to zero, otherwise to address . The indirect jump calculates the value of and jumps to the block at that address. Statements can be assignments or stores. A deterministic assignment writes the value of expression to variable . A nondeterministic assignment writes some value from a set to variable , defined by any value such that expression evaluates to non-zero. Note that since expressions can read from memory, an assignment can model a load-instruction. A store writes the value of variable into memory. Expressions consist of values, variables, dereferencing, binary operations and negation.

The state consists of values assigned to variables and memory. We first define the memory model.

Definition 2

The tuple is a memory model, where is the type of the memory itself and is the type of addresses. Function is of type and function is of type . A memory model must satisfy:

We assume values can bijectively be cast to addresses and we do so freely.

Definition 3

A state is a tuple where is of type and is of type .

Semantics are expressed through transition relations , and that respectively define state transitions induced by programs, blocks, and statements (see Figure 2). For example, notation denotes a transition induced by program from state to state . Notation denotes the evaluation of expression  in state to value .

Figure 2: Semantics of jump. Rules for evaluation of expressions are omitted, except for the dereference operator.

The semantics are largely straightforward. A program is evaluated by evaluating the block pointed to by the entry address. A conditional jump is evaluated by evaluating the condition, and then the target block. Indirect jumps are evaluated in a similar manner, by evaluating the expression to obtain the block to jump to. Non-standard is the nondeterministic assignment ndassign, which evaluates expression after substituting the variable for some value . For any value where expression evaluates to non-zero, a transition may occur. A store evaluates expression producing some address , and writes the value of variable to the corresponding region in memory. A load uses function to read from memory.

4 Precondition generation

This section formalizes precondition generation functions for Reachability Logic. The central idea is to formulate a transformation function that takes as input 1.) a program , and 2.) a postcondition , and produce as output a disjunctive set of preconditions. This transformation function follows the recursive structure of jump, i.e., we formulate functions , and that perform transformations relative to a program, a block and a statement respectively.

When applied statement-by-statement, these functions populate the Reachability search space. This search space is an acyclic graph, with symbolic predicates as vertices and as root the initial postcondition. It contains a labeled edge if and only if application of function for statement  and postcondition  produces a set containing precondition .

Below, the definition of predicates that form pre- and postconditions is given.

=  Existential quantification, expression

Predicates are expressions (true if and only if they evaluate to non-zero), but can also contain outermost existential quantifiers. The predicate means there exists a value for such that both and hold.

Given a program and a postcondition defined in the predicate language above, a transformation is sound if it generates preconditions that form a reachability triple. Soundness means that a generated precondition actually represents an initial state that non-deterministically leads to the Q-state. To define soundness, we first define the notion of a reachability triple relative to blocks (instead of a program as a whole as in Section 2):

Definition 4

A reachability triple for block is defined as:

We repeat this definition to stress that a reachability triple over block intuitively means that precondition leads to the desired state when running the block and subsequent blocks jumped to, until an exit, i.e, not just running the instructions within block  itself. This is due to the nature of transition relation (see Figure 2). A similar definition can also be made for statements: a reachability triple for statement is defined for transition relation and thus concerns the execution of the individual statement only.

Definition 5

Function is sound, if and only if, for any program and postcondition :

Similarly, soundness is defined for blocks and statements.

Figure 3 shows the transformation functions. Function starts at the entry block of the program. The program is then traversed in the style of a right fold [29]: starting at the entry the program is traversed up to an exit point, from which postcondition transformation happens. Function is identical to standard weakest-precondition generation in the cases of sequence and exit. In the case of a conditional jump, two paths are explored. Both could lead to exploits, as long as the branching conditions remain internally consistent. In case of an indirect jump, all possible addresses that can be jumped to, are explored.



Figure 3: Transformation functions for a program .

Function is standard in case of deterministic assignment. In case of nondeterministic assignment, according to the execution semantics, some value needs to be found that fulfills the condition . That existentially quantified value is substituted for variable in the postcondition.

In the case of memory assignment, predicate transformation is a bit more complex. Consider the following example:

If and alias, then will be 43 after execution. The postcondition can only hold if  and  are separate.

We explicitly encode assumptions about memory separation into the generated preconditions. The function listed in Figure 4 takes care of this. It takes as input the address  to which value is written, and the postcondition . It returns a set of tuples where is the precondition and provides the pointer-relations under which that substitution holds. For example, we have . This indicates two possible substitutions when transforming postcondition into precondition:

All other cases of merely propagate the case generation.

Figure 4: Case definitions for precondition of store

There are no special rules for dealing with loops. Instead, loops are unrolled by the precondition generation. In the case of infinite iterations, the reachability search space will be infinitely large. To deal with this search space, we order and prune the space. Theorem 4.1 states a basic property of exploit triples that is used for the purpose of pruning. Section 5 describes how the space is ordered to manage large search spaces.

Theorem 4.1 (Preservation of unsatisfiability)

For any program and conditions and such that ,

The above can directly be concluded from the definition of a reachability triple, as given in Section 2. Once an unsatisfiable condition is generated, the precondition generation can be halted, and the condition discarded.

We validate our precondition generation function by proving it is both sound and complete. Theorems 4.2 and 4.3 define these respective properties.

Theorem 4.2 (Soundness of precondition generation)

Functions , and are sound.

Theorem 4.3 (Completeness of precondition generation)

Having both soundness and completeness means that the reachability space defines all and only valid preconditions for a certain program and postcondition.

Both theorems, including 1.) the syntax and semantics of jump, 2.) the syntax and semantics of the predicates, and 3.) the functions have been formally proven correct in the Isabelle/HOL theorem prover. The proof, including a small example of exploit generation within Isabelle/HOL, constitutes roughly 1000 lines of code. Proof scripts are publicly available 222 To prove completeness, Theorem 4.3 imposes two restrictions. One, we require execution of a program under a state described by to terminate. If a program does not terminate, it is impossible to construct a P’ for this program, and therefore completeness does not hold. Two, we show the theorem holds for programs without indirect jumps. In practice however, this premise has little to no impact. Every jump program containing indirect jumps, can be manually converted to one with only direct jumps. Given that is a precondition for program and postcondition , the precondition generation will generate a that is non-strictly weaker than . An equivalent of Theorem 4.3 also holds for and .

5 Litmus tests

This section presents several litmus tests that demonstrate the functionality of Reachability Logic. All of the examples have been tested in our prototype implementation in Haskell, and are available online 333

The prototype implements the functions similar to how they are presented above. Some changes were made to make the system more user friendly. The functions are defined as non-deterministic functions, building up a tree as a search space. Branches at the same level originate from a conditional, and deeper branches indicate a deeper level of jumps. On top of that, a basic simplification step is applied to the generated predicates, to make them more readable.

The reachability search space can be infinitely large. This is why the implementation builds up the search space as a tree structure. This orders the search space, making it feasible to search the infinite space in a structured way. Although some rudimentary ordering is done, efficiently searching the reachability space is explicitly left as future work.

5.0.1 Memory: Double store

Figure 5: Precondition generation for double write example

Our first litmus test demonstrates how Reachability Logic deals with symbolic memory access. Figure 5 lists the program on the left. The program stores the content of memory position in memory position , and then stores the contents of position in position . On the right, a schematic representation of the precondition generation is given for postcondition . Precondition generation works from back to front, so we start at the bottom. The command leaves the condition unaltered. We then arrive at a statement. Here, we split into two cases, one for the case where memory location and are separate, one for the aliasing case. The second splits the conditions up once more, for the separate and aligned cases. At the top of the tree, we find the final preconditions.

5.0.2 Infinite exploit space: Long division

Figure 6: Precondition generation for long division example. A dashed arrow leads to an unsatisfiable precondition.

Our next litmus test demonstrates conditional jumps, loops, infinite reachability space and postcondition pruning. Figure 6 lists the program blocks on the left. The blocks are labeled though , with block is the entry point. Variables and signify the input. The program divides by , by means of long division. If is larger than , the divisor without remainder is returned in variable . The variable is updated, and after execution holds the remainder from division.

In this case, we want to derive that a state is reachable which clearly should not be, to show that there is a bug in the program. The program behaves incorrectly when after execution, the remainder stored in is equal to or larger than the divisor . We use this, , as our exploit postcondition.

The right of Figure 6 represents precondition generation. We start back to front. does not alter the postcondition, so we just copy it. Then, we either execute block 0, 1 or 2, depending on what condition holds. If we came directly from block 0, then must hold, so our precondition is , which is false, indicated by the lightning bolt. If we came from block 1, then must have held. Block 1 updates with , leading to the precondition . Note that this precondition is unsatisfiable. By Theorem 4.1 we know that we can discard it.

The last block to look at, is block 2. To arrive here, we must have had that . The body of block 2 updates , and we end up with . Here, we see the loop unfolding at work. We have executed the loop body once, and the function generates two alternatives. We exit the loop, indicated by the arrow pointing up, or we run another iteration, indicated by the arrow pointing right.

Ending the loop at this point again leads to a precondition that is unsatisfiable, and we can prune it. Running the loop a second time, and then exiting leads to a precondition that is satisfiable, and completing the calculation, leads us to the first viable precondition for the postcondition .

The precondition function does not stop at this point. It will continue to unroll the loop an infinite amount of times, making the exploit space infinitely large. By ordering the space as shown in this example, we can deal with infinity.

5.0.3 Nondeterministic loop

A defining feature of both jump and Reachability Logic, is the ability to deal with nondeterminism. The program listed in Figure 7 is a very small example of such a program. It assigns a random integer value to the variable , between 1 and some value . Then, if modulo is zero, the program jumps to , otherwise it repeats itself.

Figure 7: Precondition generation for nondeterministic loop example

For this example, we are simply interested in termination. So as a postcondition, we pick . The block does not change the condition. Executing the loop once, means that must hold. The nondeterministic assignment updates with any value between 1 and . Translated to precondition, this means that there exists an such that .

This condition holds if and only if is not prime. What makes this example interesting is the contrast with other program logics like Hoare logic. With Hoare Logic, we are only able to prove that if the program terminates, was not prime. Reachability Logic however allows us to show that if is not prime, then the program can terminate, which is a much more powerful statement.

5.0.4 Indirect Jumps

Our last litmus test demonstrates how Reachability Logic deals with indirect jumps. Switch-statements consisting of many cases are often compiled into jump tables. These are typically combined with a guard for values not handled by the jump table. Figure 8 shows a model of this.

Figure 8: Precondition generation for indirect jump example.

Execution starts at block 0. Here, is set to 0, and the conditional jump checks if x is smaller than 0 or larger than 3. If so, we jump to exit. If not, we jump to block 1, which is the start of our guard. The indirect jump jumps to the block label stored in . Blocks 2, 3, and 4 signify the guard options.

The goal of this example is to demonstrate how an indirect jump works with our precondition generation function. As a postcondition, we select .

Starting at block 9, we again work our way up the execution back to front. Block 9 itself has no effect on the postcondition, so it is copied. We can reach block 9 from four different locations in the code, namely block 0, 1, 3 and 4.

Block 0 assigns 0 to , which leads to a contradiction in our precondition, namely . Block 1 is the indirect jump. We can only have reached block 9 from here if was equal to 9. In turn, we could have reached block 1 from itself, or 0. If we indeed came from itself, we must have made an indirect jump to 1, requiring x to be equal to 1 as well, which is a contradiction. If we came from 0, we have the same problem as before, is set to 0 and so must hold.

Block 3 always jumps to 9, so it has no effect on the condition. The only way to reach block 3 is from block 1, the indirect jump. This requires x to be equal to 3. We can reach block 2 from itself and 0. Both lead to a contradiction; if we came from 1, then , which is false, if we came from 0, must hold.

Block 4 is our last hope to find a satisfiable precondition. The block itself assigns 5 to , leading to the precondition . Block 4 can be reached from block 2 or 1. Block 2 is a simple jump, so we copy the condition. Block 2 is only reachable from block 1, which adds to the condition. We can only reach 1 from itself or 0. If we came from itself, we get a contradiction immediately. If we came from 0, we know that the jump condition was false, so we add its negation to the condition. This leads to the first satisfiable precondition.

If we came directly from block 1, we know that should hold. Analogous to the case described directly above, we obtain that the only feasible route is via block 0, which again leads to a satisfiable precondition.

What is interesting about this example, is the fact that although we have an indirect jump, the number of paths to explore stays rather limited. Potentially, an indirect jump can jump to any address, but in practice, these addresses are limited by the conditions that must hold.

6 Case Studies

We presents results from applying Reachability Logic to two realistic examples. These case studies were performed using the prototype mentioned in Section 5.

Faulty Partitioning for Quicksort

The core of any quicksort algorithm is the partitioning algorithm. One well-known partitioning algorithm is the one invented by Tony Hoare 

[16] which selects a pivot element and then transforms an input data set into two smaller sets, depending on relative ordering of elements in the data set to the pivot. This scheme seems superficially very simple, but it is very easy to get wrong. For instance, the following algorithm has a superficially plausible variant of this partitioning scheme, which is “nearly correct”.

void quicksort(int a[], size_t N) {
  if(N <= 1) return;
  int pivot = a[rand()%N];
  int i = 0, j = N-1;
  while(i <= j) {
    while(i < j  && a[i] <= pivot) i++;
    while(i <= j && a[j] >= pivot) j--;
    swap(&a[i++], &a[j--]);
  quicksort(a, j+1);
  quicksort(a+i, N-i);}

The partitioning scheme can be translated into a jump program relatively easily; selection of the pivot can be modeled using a non-deterministic assign.

We are interested in detecting out-of-bounds memory access. We add bounds checks to the program, and state triggering one of them as our postcondition. Running the resultant program through our implementation for an array of size 3 will then generate an exploit-precondition: the program can go out of bounds if the following condition holds:

Informally, this conditions says that a[0] is not the minimal element of the array. The reason for this is that if the minimal element is chosen as a pivot, and a[0] is not equal to it, the first inner loop will simply fall through, and after the second loop, will become , pointing outside the array before the swap occurs. A fix for this would be make the swap conditional, replacing it with:

    if(i <= j) swap(&a[i++], &a[j--]);

This will in fact prevent any out-of-bound memory access. However, another way any version of quicksort can fail dramatically is when the recursive calls are performed with incorrect parameters. For example if or at the end of the partitioning scheme, we will end up in a infinite recursive loop. If we specify this as a postcondition of the partitioning scheme, we find that the same preconditions are generated as before.

The functional correctness of the partitioning scheme can also be examined—that is, is it actually the case that all the elements moved towards the the left-hand side of the array are less-or-equal to the pivot, and that the elements to the right are greater-or-equal than the pivot? To examine this, we can specify as an exploit condition that the input to the first recursive invocation of quicksort contains an element greater than the pivot; this finds no satisfiable conditions (as it is not true). However, specifying this for input sent to the second invocation of quicksort instead, our prototype will essentially start generating counter-examples. For example, if the first element is the pivot, and strictly less than the middle element but strictly higher than the third element, partitioning fails.


Several assembly routines for multiplying multi-precision integers on an 8-bit AVR controller were verified by Schoolderman [28]. However, it was also discovered that some of these routines could compute incorrect results if their arguments aliased with the memory location intended to store the result. A full verification like this appears to require significant effort; however, if we are only interested in finding aliasing bugs, RL seems ideally suited to find these.

We focused on the smallest routine exhibiting the problem: the -bit multiplication routine as originally developed by Hutter and Schwabe [20]. This routine computes a product of two 48-bit integers using Karatsuba’s method, splitting its inputs into two 24-bit halves, and performing a three -bit multiplications with these, combining the results.444To be more precise, this method uses the fact that In the process, the lowest 24-bits of the result are known early on and written to memory before the upper half of the inputs is read, causing an aliasing bug.

To model this in jump, registers and the carry flags are modeled as jump variables, whereas the memory space is modeled using jump addresses. Every AVR instruction is modeled by a sequence of jump statements. For example, the instruction ADD , can be expressed by the sequence:

Adding the appropriate binary operators to the syntax of Figure 1, every instruction required for the program (which are only a handful) can be modeled, allowing the entire multiplication routine (consisting of 136 instructions) to be expressed as a jump program. The memory accesses, which operate on three bytes at a time, were modeled as a single memory operations on a three-byte memory region.

As seen in Section 5, generated preconditions can be fairly verbose, and we expected that in this case as well. To remedy this somewhat, we extended the Haskell implementation with constant folding and other simplifications to more efficiently manage the search space of possible preconditions, and pruning areas of the search space which can easily be determined to be impossible. In a more production-oriented setting, SMT solving and/or a robust expression simplifier can be used to do this more efficiently than our naive Haskell implementation.

For the precondition, we look at the case where . Clearly the expected result should be , i.e. the 96-bit result should consist of 12 bytes, all of which contain , except for the seventh byte which should hold . As a postcondition, we therefore specify that this byte does not hold .

Running the jump version of the 48-bit Karatsuba code through our analysis resulted in a handful of preconditions. Some of these simplify to , as they express impossible aliasing conditions—an SMT solver would be able to discard these easily. However, 7 preconditions remained which are completely plausible and satisfiable, which fall into three categories:

  • , alias, and their high 24-bits overlap with the low 24-bits of the result

  • , are disjoint, and of them partially overlaps with the result as before

  • , are partially aliased, and one of them partially overlaps with the result

Which are exactly the case we would expect: the issue is being caused by either (or both) inputs sharing their high 24-bits with the low 24-bits of the output location. Had we not chosen the fixed input values for and , this case would have generated more complex preconditions, however, this case shows that there is an easy instance where these would be satisfied.

7 Discussion

Key to our approach is the reachability space. The size and shape of this search space directly influences the effectiveness or even feasibility of Reachability Logic. State space explosion is a very typical and problematic effect for approaches like symbolic execution and loop unrolling.

How is this different for Reachability Logic? First of all, the goal of RL is not to say something about all possible execution paths, but instead only one. This means that instead of proving properties over the entire search space, we just need to find one execution path that is viable. The nature of the reachability space is such that it only contains paths that are finite in length, but there may be infinitely many of them due to loop unrolling and indirect jumps. This means that we only need to search though this space in width, not depth, to find one path to the assertion. On top of that, the reachability space can be ordered in such a way to make searching more efficient. For the litmus tests and case studies, this is already done to improve performance.

8 Related work

Besides the program logics mentioned in Section 2, several other angles have been explored to (automatically) reason over programs, reachability and exploits.

Brumley et al. [6] generate exploits from patches. Given a program and a patched version of the program , they are able to find the exploit that was patched in , but to which is vulnerable. An obvious drawback of this approach is that a patched version of the program needs to be available, for this approach to work.

Avgerinos et al. [2]

present an automatic exploit generation system (AEG) that only requires the source code of the program to be exploited. They generate LLVM code from the source and analyze this code using symbolic execution and some safely property to obtain exploits. Their approach uses heuristics, and the safety property used is not discussed or listed in the paper. No guarantees are given about the search space, as opposed to our solution, where we formally verify the reachability search space.

As an alternative to AEG, automatic vulnerability detection is a less strictly defined line of research into finding vulnerabilities. Russell et al. [27]

use deep representation learning to do the job. They use the large set of open-source code available online, labeled with vulnerability information, to train their tool. Their results are very promising, but as with all machine learning-based solutions, their approach does not come with any guarantees.

Symbolic execution runs a program with symbols instead of actual input. Running the program with these symbolic inputs results in a complete overview of the programs behavior. Symbolic execution is extensively used for software testing [5, 7, 8]. Cadar and Sen [9] provide a great overview of the applications of symbolic execution for this purpose. The biggest downside of symbolic execution is that it describes the complete program behavior, and therefore quickly becomes infeasible, due to the may paths to be described.

Symbolic backward execution (SBE) attempts to mitigate the downside of reasoning over all possible paths by looking at a target location in source code. Charreteur and Gotlieb present a method for generating test input based on SBE for Java bytecode [10]. Dinges and Agha augment this approach with concrete execution as well [14]. Although there are some similarity between SBE and RL, there are two crucial difference. First of all, RL is proven to be sound and complete, providing the user with guarantees over the result. Second, RL is able to deal with non-determinism, modeling external functions, uncertain semantics of some instructions, etc. Finally, RL only looks at the postcondition, instead of a location in code. This means that only those statements that affect the postcondition, need to be taken into account, making RL much more feasible than SBE.

There is a wide variety available of static bug-finding techniques available. Static analysis has been applied to detect security bugs [19], runtime errors [18] and even type errors [21]. These analyses are used in practice to detect bugs, for example the FindBug application [3], which employs simple static techniques to find bugs in Java code. Static analyses have to be tailored to a specific type of bugs. Reachability Logic on the other hand, enables the detection of the reachability of any postcondition.

Anand et al. [1] provide a nice overview of the field of test case generation. They identify five techniques for test case generation. Symbolic execution, model based testing, combinatorial testing, adaptive random testing and search-based testing. In general, our approach is a much more formal one than existing test case generation. We are interested in formally defining the entire reachability space in such a way that it is sound and complete. Test case generation has a different goal in mind, namely selecting suitable test cases for a certain program. Many of the techniques mentioned do not guarantee full coverage. This is something that Reachability Logic does have.

9 Conclusion

In this paper, we have presented the novel program logic Reachability Logic. RL is well suited for proving that an assertion is reachable in a nondeterministic program. Existing program logics such as HL and RHL have a different focus and are unsuitable for this purpose.

We have developed a low-level language called jump, having jumps instead of explicit control flow. For this language we presented a precondition generation function, which we have formally proven to be sound and complete. This precondition function thus generates a sound and complete reachability space. To validate our approach, we have presented several litmus tests that illustrate the precondition generation function, as well as real world case studies that find bugs. An implementation of the system has been developed, in which both the litmus tests and case studies have been run. The entire system is formally proven correct in the Isabelle/HOL theorem prover.

Test case generation research offers a plethora of techniques to show errors in software by means of finding the right test. While there are many different ways to generate test cases, full coverage is not guaranteed. Finding vulnerabilities has also been done using machine learning, but similarly this method too does not come with any guarantees. Symbolic execution comes closer, since it is able to prove properties over all paths in the program, but state space explosion prevents it from being applied to large code bases. Some initial work has been done on automatic exploit generation. Compared to existing work, we take a formal approach, so that we are able to give the user the guarantee that all preconditions are sound.

9.0.1 Acknowledgements

This work is supported by the Defense Advanced Research Projects Agency (DARPA) under Agreement No. HR00112090028 and contract N6600121C4028 and US Office of Naval Research (ONR) under grant N00014-17-1-2297.


  • [1] S. Anand, E. K. Burke, T. Y. Chen, J. A. Clark, M. B. Cohen, W. Grieskamp, M. Harman, M. J. Harrold, and P. McMinn (2013) An orchestrated survey of methodologies for automated software test case generation. J. Syst. Softw. 86 (8), pp. 1978–2001. Cited by: §8.
  • [2] T. Avgerinos, S. K. Cha, B. L. T. Hao, and D. Brumley (2011) AEG: automatic exploit generation. See DBLP:conf/ndss/2011, Cited by: §1, §8.
  • [3] N. Ayewah, D. Hovemeyer, J. D. Morgenthaler, J. Penix, and W. Pugh (2008) Using static analysis to find bugs. IEEE Softw. 25 (5), pp. 22–29. Cited by: §8.
  • [4] M. Barnett and K. R. M. Leino (2005) Weakest-precondition of unstructured programs. In Proceedings of the 6th ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineering, pp. 82–87. Cited by: §2.
  • [5] R. S. Boyer, B. Elspas, and K. N. Levitt (1975) SELECT—a formal system for testing and debugging programs by symbolic execution. ACM SigPlan Notices 10 (6), pp. 234–245. Cited by: §8.
  • [6] D. Brumley, P. Poosankam, D. X. Song, and J. Zheng (2008) Automatic patch-based exploit generation is possible: techniques and implications. See DBLP:conf/sp/2008, pp. 143–157. Cited by: §1, §8.
  • [7] J. R. Burch, E. M. Clarke, K. L. McMillan, D. L. Dill, and L. Hwang (1992) Symbolic model checking: 1020 states and beyond. Information and computation 98 (2), pp. 142–170. Cited by: §8.
  • [8] C. Cadar, D. Dunbar, D. R. Engler, et al. (2008) Klee: unassisted and automatic generation of high-coverage tests for complex systems programs.. In OSDI, Vol. 8, pp. 209–224. Cited by: §8.
  • [9] C. Cadar and K. Sen (2013) Symbolic execution for software testing: three decades later. Commun. ACM 56 (2), pp. 82–90. Cited by: §8.
  • [10] F. Charreteur and A. Gotlieb (2010) Constraint-based test input generation for java bytecode. In IEEE 21st International Symposium on Software Reliability Engineering, ISSRE 2010, San Jose, CA, USA, 1-4 November 2010, pp. 131–140. Cited by: §8.
  • [11] S. Dasgupta, D. Park, T. Kasampalis, V. S. Adve, and G. Roşu (2019) A complete formal semantics of x86-64 user-level instruction set architecture. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 1133–1148. Cited by: §1.
  • [12] J. Dawson (2009) Isabelle theories for machine words. Electronic Notes in Theoretical Computer Science 250 (1), pp. 55–70. Cited by: §1.
  • [13] E. de Vries and V. Koutavas (2011) Reverse hoare logic. See DBLP:conf/sefm/2011, pp. 155–171. Cited by: §1, §2.
  • [14] P. Dinges and G. A. Agha (2014) Targeted test input generation using symbolic-concrete backward execution. In ACM/IEEE International Conference on Automated Software Engineering, ASE ’14, Vasteras, Sweden - September 15 - 19, 2014, I. Crnkovic, M. Chechik, and P. Grünbacher (Eds.), pp. 31–36. Cited by: §8.
  • [15] S. Heule, E. Schkufza, R. Sharma, and A. Aiken (2016) Stratified synthesis: automatically learning the x86-64 instruction set. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 237–250. Cited by: §1.
  • [16] C. A. R. Hoare (1961-07) Algorithm 64: quicksort. Commun. ACM 4 (7), pp. 321. External Links: ISSN 0001-0782 Cited by: §6.
  • [17] C. A. R. Hoare (1969) An axiomatic basis for computer programming. Commun. ACM 12 (10), pp. 576–580. Cited by: §1, §1, §2.
  • [18] D. Hovemeyer, J. Spacco, and W. Pugh (2005) Evaluating and tuning a static analysis to find null pointer bugs. See DBLP:conf/paste/2005, pp. 13–19. Cited by: §8.
  • [19] Y. Huang, F. Yu, C. Hang, C. Tsai, D. Lee, and S. Kuo (2004) Securing web application code by static analysis and runtime protection. See DBLP:conf/www/2004, pp. 40–52. Cited by: §8.
  • [20] M. Hutter and P. Schwabe (2015) Multiprecision multiplication on AVR revisited. Journal of Cryptographic Engineering 5 (3), pp. 201–214. Cited by: §6.
  • [21] S. H. Jensen, A. Møller, and P. Thiemann (2009) Type analysis for javascript. See DBLP:conf/sas/2009, pp. 238–255. Cited by: §8.
  • [22] F. Kammüller, M. Wenzel, and L. C. Paulson (1999) Locales a sectioning concept for isabelle. In International Conference on Theorem Proving in Higher Order Logics, pp. 149–165. Cited by: §1.
  • [23] C. Lattner and V. S. Adve (2004) LLVM: A compilation framework for lifelong program analysis & transformation. See DBLP:conf/cgo/2004, pp. 75–88. Cited by: §1, §3.
  • [24] P. W. O’Hearn (2020) Incorrectness logic. Proc. ACM Program. Lang. 4 (POPL), pp. 10:1–10:32. Cited by: §2.
  • [25] L. C. Paulson (1994) Isabelle: a generic theorem prover. Vol. 828, Springer Science & Business Media. Cited by: §1.
  • [26] A. Raad, J. Berdine, H. Dang, D. Dreyer, P. W. O’Hearn, and J. Villard (2020) Local reasoning about the presence of bugs: incorrectness separation logic. See DBLP:conf/cav/2020-2, pp. 225–252. Cited by: §2.
  • [27] R. L. Russell, L. Y. Kim, L. H. Hamilton, T. Lazovich, J. Harer, O. Ozdemir, P. M. Ellingwood, and M. W. McConley (2018) Automated vulnerability detection in source code using deep representation learning. See DBLP:conf/icmla/2018, pp. 757–762. Cited by: §8.
  • [28] M. Schoolderman (2017) Verifying branch-free assembly code in why3. In Verified Software. Theories, Tools, and Experiments, A. Paskevich and T. Wies (Eds.), Cham, pp. 66–83. External Links: ISBN 978-3-319-72308-2 Cited by: §6.
  • [29] T. Sheard and L. Fegaras (1993) A fold for all seasons. In Proceedings of the conference on Functional programming languages and computer architecture, pp. 233–242. Cited by: §4.
  • [30] L. Xu, W. Jia, W. Dong, and Y. Li (2018) Automatic exploit generation for buffer overflow vulnerabilities. In 2018 IEEE International Conference on Software Quality, Reliability and Security Companion (QRS-C), Vol. , pp. 463–468. Cited by: §1.