TrABin: Trustworthy Analyses of Binaries

Verification of microkernels, device drivers, and crypto routines requires analyses at the binary level. In order to automate these analyses, in the last years several binary analysis platforms have been introduced. These platforms share a common design: the adoption of hardware-independent intermediate representations, a mechanism to translate architecture dependent code to this representation, and a set of architecture independent analyses that process the intermediate representation. The usage of these platforms to verify software introduces the need for trusting both the correctness of the translation from binary code to intermediate language (called transpilation) and the correctness of the analyses. Achieving a high degree of trust is challenging since the transpilation must handle (i) all the side effects of the instructions, (ii) multiple instruction encodings (e.g. ARM Thumb), and (iii) variable instruction length (e.g. Intel). Similarly, analyses can use complex transformations (e.g. loop unrolling) and simplifications (e.g. partial evaluation) of the artifacts, whose bugs can jeopardize correctness of the results. We overcome these problems by developing a binary analysis platform on top of the interactive theorem prover HOL4. First, we formally model a binary intermediate language and we prove correctness of several supporting tools (i.e. a type checker). Then, we implement two proof-producing transpilers, which respectively translate ARMv8 and CortexM0 programs to the intermediate language and generate a certificate. This certificate is a HOL4 proof demonstrating correctness of the translation. As demonstrating analysis, we implement a proof-producing weakest precondition generator, which can be used to verify that a given loop-free program fragment satisfies a contract. Finally, we use an AES encryption implementation to benchmark our platform.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

07/27/2018

Sound Transpilation from Binary to Machine-Independent Code

In order to handle the complexity and heterogeneity of mod- ern instruct...
10/22/2019

Metamath Zero: The Cartesian Theorem Prover

As the usage of theorem prover technology expands, so too does the relia...
08/31/2018

Wasabi: A Framework for Dynamically Analyzing WebAssembly

WebAssembly is the new low-level language for the web and has now been i...
09/08/2020

On Architecture to Architecture Mapping for Concurrency

Mapping programs from one architecture to another plays a key role in te...
05/16/2022

Natural Language Specifications in Proof Assistants

Interactive proof assistants are computer programs carefully constructed...
08/29/2019

ICurry

FlatCurry is a well-established intermediate representation of Curry pro...
04/04/2020

The Collection Virtual Machine: An Abstraction for Multi-Frontend Multi-Backend Data Analysis

Getting the best performance from the ever-increasing number of hardware...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Despite the existence of formally verified compilers, the verification of binary code is a critical task to guarantee trustworthiness of systems. This is particularly necessary for software mixing high-level language with assembly (system software), using ad-hoc languages and compilers (specialized software), in presence of instruction set extensions (like for encryption and hashing), and when the source code is not available (binary blobs). This necessity is not only limited to the general-purpose computing scenario but also applies to connected embedded systems, where software bugs can enable a remote attacker to tamper with the security of automobiles, payment services, and smart IoT devices.

The need of semi-automatic analysis techniques for binary code has lead to the development of several tools song2008bitblaze ; brumley2011bap ; shoshitaishvili2016state . To handle the complexity and heterogeneity of modern instruction set architectures (ISA), all these tools follow a common design (see Figure 1): They have introduced a platform independent intermediate representation that allows to implement analysis independently of (i) names and number of registers, (ii) instruction decoding, (iii) endianness of memory access, and (iv) instruction side-effects (like updating conditional flags or the stack pointer). This intermediate representation is often a dialect of the Valgrind’s IR nethercote2003valgrind .

Figure 1: Architecture of the analysis platform. Gray components and non-dashed arrows are contributions described in this paper

Even if the existing binary analysis platforms have been proved successful thanks to the automation they provide, their usage for verifying software introduces the need of trusting both the transpiler (i.e. the tool translating from machine code to intermediate language) and the analysis. Soundness of the transpiler should not be foregone: It may have to handle multiple instruction encodings (e.g. ARM Thumb), variable instruction length (e.g. Intel), and complex side effects of instructions (e.g. ARM branch with link and conditional executions). Clearly, a transpiler bug jeopardizes the soundness of all analyses done on the intermediate representation. Similarly, complex analyses involve program transformations (e.g. loop unrolling and resolution of indirect jumps) and simplifications (e.g. partial evaluation) that are difficult to implement correctly.

To handle these issues we implement a binary analysis tookit whose results are machine checkable proofs. The prototype tookit consists of four components: (i) formal models of the Instruction Set Architectures (ISAs), (ii) the formal model of the intermediate language, called Binary Intermediate Representation (BIR), (iii) a proof-producing transpiler, and (iv) proof-producing analysis tools. As verification platform, we selected the interactive theorem prover HOL4, due to the existing availability of formal models of commodity ISAs  HOL4Armv8 ; fox2012directions . Here, we chose ARMv8 armv8manual and CortexM0 armv6mmanual as demonstrating ISAs. For the target language, we implemented a deep embedding of a machine independent language, which can represent effects to registers, flags, and memory of common architectures and it is relatively simple to analyse. Verification of the transpilation is done via two HOL4 proof producing procedures, which translate respectively ARMv8 and CortexM0 programs to IL programs, and yield the HOL4 proof that demonstrates the correctness of the result. The theorem establishes a simulation between the input binary program and the generated IL program, showing that the two programs have the same behavior. Our contribution enables a verifier to prove properties of the generated IL program (i.e. by directly using the theorem prover or proof-producing analysis techniques) and to transfer them to the original binary program using the generated simulation theorems. As demonstrating analysis, we implement a proof-producing weakest precondition propagator, which can be used to verify that a given loop-free program fragment satisfies a contract.

Outline

We present the state of the art and the previous works relating to our contribution in Section 2. Section 3 introduces the HOL4 formal models of the ISAs and the BIR language. Section 4 and Section 5 present the two proof-producing tools: the certifying transpiler and the weakest precondition generation. We demonstrate that the theorems produced by the proof producing tools can be used to transfer verification conditions in Section 6. In Section 7, we test and evaluate our tool. We give concluding remarks in Section 8.

New contributions

We briefly describe the new contributions of this paper with respect to metere2017sound . The weakest precondition generation and the corresponding optimization is a new proof producing tool. Therefore Section 5 was not present in metere2017sound . Technically, the BIR model and the transpiler have been heavily re-engineered, for this reason Sections 3.2 and 4 have been adapted to introduce some of the new concepts (e.g. the weak transition relation) and tools (e.g. the type checker, the pre-verified theorems that speed up the transpilation). Also, Section 7 has been rewritten, since it evaluates the new transpiler and the weakest precondition procedure. Finally, the transpiler has been extended to support CortexM0. This allows us to demonstrate that the transpiler can be easily adapted to support new architectures and the requirements for specific proofs for the transpiler are limited.

2 Related work

Recent work has shown that formal techniques are ready to achieve detailed verification of real software, making it possible to provide low-level platforms with unprecedented security guarantees klein2009sel4 ; alkassar2008verisoft ; dam2013machine . For such system software, limiting the verification to the source code level is undesirable. A modern compiler (e.g. GCC) consists of several millions of lines of code, in contrast to micro-kernels that consist of few thousand lines of code, making it difficult to trust the compiler output even when optimization is disabled111An example of a bug found in GCC: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80180.

To overcome this limitation, formally verified compilers leroy2009formal ; boldo2013formally ; kumar2014cakeml and proof/producing compilers li2007structure have been developed. Similarly to our work, these compilers use detailed models of the underlying ISA to show the correctness of their output. This usually involves a simulation theorem, which demonstrates that the behavior of the produced binary code resembles the one specified by the semantics of the high level language (e.g. C or ML). These theorems permit properties verified at the source-level to be automatically transferred to the binary-level. For instance, CompCert has been used in beringer2015verified to verify security of OpenSSL HMAC by transferring functional correctness of the source code to the produced binary.

Even if formally verified compilers obviate the need for trusting their output, they do not fulfill all the needs of verified system software. Some of these compilers target languages that are unsuitable for developing system software (e.g. ML cannot be used to develop a microkernel due to its garbage collector). Also, they do not support mixing the high-level language with assembly code, which is necessary for storing and restoring the CPU context or for managing the page table. Some of the effects of these operations can break the assumptions made to define a precise semantics of the high level language (e.g. a memory write can alter the page table which in turn affects the virtual memory layout). Also, some properties (e.g. absence of side channels created by non-secure accesses to the caches) cannot be verified at the source code level; the analysis must be aware of the exact sequence of memory accesses performed by the software. Finally, binary blob analysis is imperative for verifying memory safety of binary code whose source code is not available (e.g. the power management of ARM trusted firmware).

Unfortunately, detailed formal specifications of machine languages (e.g. the ones used to verify compiler correctness fox2010specification ) consist of thousands of lines of definitions. The complexity of these models makes them unusable to directly verify any binary code that is not a toy example. Moreover, the target verification tools, usually interactive theorem provers, provide little or no support for either automatic reasoning or reuse of algorithms among different hardware models. To make machine-code verification proofs reusable by different architectures, Myreen et al. DBLP:conf/fmcad/MyreenGS08 developed a proof-producing decompilation procedure. Those tools have been implemented in the HOL4 system and have been used by the seL4 project to check that the binary code produced by the compiler is correct, permitting to transfer properties verified at the source code level to the actual binary code executed by the CPU sewell2013translation . The same framework has been used to verify a bignum integer library myreen2013proof . However, the automation provided by this framework is still far from what is provided by today’s binary analysis platforms (e.g. song2008bitblaze ; brumley2011bap ; shoshitaishvili2016state ). These provide tools to compute and analyze control-flow graphs, to perform abstract interpretation and symbolic execution, to verify contracts, to verify information flow properties balliu2014automating , and to analyze side channels CacheAudit . On the other hand, their usage requires to trust both the transpiler and the implementation of the analysis. Due to the complexity of writing a transpiler for each architecture, recent work has been done to synthesize the transpiler from compiler backends Hasabnis:2016:LAI:2954680.2872380 . However, this requires to trust both: the synthesis procedure and the compiler backend.

Regarding trustworthy weakest precondition generation, which is our demonstrating analisys, Vogels et al. vogels2010machine verified the soundness of an algorithm for a simple imperative while language in Coq. However, their work does not fit the needs of a trustworthy verification condition generator for a verification toolkit, since the target language is not designed to handle unstructured binary programs.

3 Formal Models

3.1 The ARMv8 and CortexM0 models

In our work, we use the ARMv8 and CortexM0 models developed by Fox HOL4Armv8 , which are constructed from the pseudocode described in the ARM specifications armv8manual ; armv6mmanual . These models provide detailed HOL4 formalization of the effects of the instructions, taking into account the different execution modes, flags, and other characteristics of the processor behavior.

We start describing the ARMv8 model. The system state is modeled as a tuple . Here, represents a sequence of 64-bit general purpose registers. We identify the -th register with . The tuple contains the special registers representing the program counter, the stack pointer, and the link register respectively. The tuple represents the current processor state and contains the arithmetical flags. The 64-bit addressable memory is modeled as the function . Finally, the system behavior is represented by the deterministic transition relation , describing how the ARMv8 state reaches the state by executing a single instruction. The transition relation models the behavior of standard ARMv8 ISA, including fetching four bytes from memory, decoding the instruction, and applying its effects to registers, flags, and memory. Hereafter, we use to access tuple fields (e.g.  states for the program counter of the state ) and to represent all possible states.

The CortexM0 model has a similar flavour, with main differences consisting of the general purpose registers being 32-bit and the memory being 32-bit addressable memory. Also, CortexM0 has variable encoding, allowing each instruction to use either two or four bytes. Hereafter, when needed, we use subscripts and to respectively identify ARMv8 and CortexM0 models, i.e.  is the transition relation of the CortexM0 model.

The HOL4 machine models consist of hundreds of definitions and their complexity makes it difficult to analyze large programs. To simplify the analyses, the models are equipped with a mechanism to statically compute the effects of a single instruction via the function. Let be the binary encoding of an instruction and be the address where the instruction is stored, then the function returns a list of step theorems . Each theorem has the following structure:

where is a function that reads the instruction from the memory. Intuitively, each step theorem describes one of the possible behaviors of the instruction and consists of the guard condition that enables the transition and the function that transforms the starting state into the next state. We use three examples from the ARMv8 model to illustrate this mechanism.

Let the instruction stored at the address 0x1000000c be the addition of the registers and into the register (whose encoding is 0x8b000020), the step function produces the following step theorem:

(where updates the register of the state with ). In this case, only one theorem is generated, and there is no guard condition (i.e.  is a tautology).

Some machine instructions (i.e. conditional branches) can have different behavior according to the value of some state components. In these cases, the step function produces as many theorems as the number of possible execution cases. For example, the output of the step function for the Signed Greater Than branch instruction consists of the following two theorems:

That is, if the test succeeds (i.e.  holds) then the jump is taken (in this case jumping back in a loop to the address ), otherwise (i.e.  holds) the jump is not taken (the program counter is updated to point to the next instruction). Notice that for every state the condition hold.

Finally, some instructions (i.e. memory stores) can have unsound behavior if some conditions are not met. In these cases, the step function generates the step theorems only for the correct behaviors; for a given instruction, let be the generated theorems and the corresponding guards, the behavior of the instruction is soundly deduced by the step function for every state such that holds and can not be deduced otherwise. For example, the output of the step function for a memory store consists of the theorem:

Intuitively, the step function can predict the behavior only for states having the target address (i.e. ) aligned.

3.2 The BIR model

Our platform uses the machine independent Binary Intermediate Representation (BIR). In this representation, a statement has only explicit state changes, i.e. there are no implicit side effects, and it can only affect one variable.

BIR’s syntax is depicted in Table 1. A program is a list of blocks, each one consisting of a uniquely identifying label (i.e. a string or an integer), a list of block statements, and one control flow statement. In the following we assume that all programs are well defined, i.e. they have no duplicate block labels. A statement can affect the state by (i) assigning the evaluation of an expression to a variable, (ii) terminating the system in a failure state if an assertion does not hold. A control flow statement can (conditionally or unconditionally) modify the control flow. As usual, labels are used to refer to the specific locations in the program and can be the target of jump statements.

BIR expressions are built using constants (i.e. strings and integers), conditionals (i.e. ), standard binary and unary operators (ranged over by and respectively) for finite integer arithmetic and casting, and accessing variables of the environment (i.e. ). Additionally, two types of expressions can operate on memories. The expression reads bytes from the memory starting from the address . The expression returns a new memory in which all the locations have the same values as the initial memory except the addresses where that contain the chunks of . Figure 2 provides an example of a BIR program.

Table 1: BIR’s syntax

Hereafter, we use to represent the set of all possible strings. These can be used to identify both labels and variable names. We use to range over BIR data types; let , the type for words of -bits is denoted by and the type for memories addressed using -bits is denoted by . We use and to represent the set of all BIR types and values respectively.

This BIR program pops and pushes a register from/to the stack. The register is modeled by the variable , the stack pointer by the variable , and the memory by the variable . The program consists of two blocks, which are labeled and . The first block assigns to the content of starting from , then it increases the stack pointer. The second block saves into the stack and decreases the stack pointer. Notice that returns a modified copy of and does not directly modify . Therefore, to model a memory write, the new memory must be explicitly assigned to the variable .

Figure 2: Example of a BIR program

A BIR environment maps variable names (given as strings) to pairs of type and value; . Types of variables are immutable and any wrongly typed operation produces a run-time failure. The semantics of BIR expressions is modeled by the evaluation function : It takes an expression and an environment and yields either a value having a type in or . The evaluation intuitively follows the semantics of operations by recursively evaluating the sub-expressions given as operands. The value results when operators and types are incompatible, thus modeling a type error.

A BIR state is a pair of an environment and a program counter . While executing a program, the program counter is and is the label of the executing block. In the cases of either type mismatch or failed assertion, the execution terminates setting the program counter to either (type mismatch) or (failed assertion). Notice that the program is not part of the state, disallowing run-time changes to the program.

The system behavior is modeled by the deterministic transition relation , which describes the execution of one BIR block. In HOL4, this relation is modeled by the execution function , which defines the small step semantics of an entire block. Hereafter, we use when the program counter of the resulting states represents one of the possible errors. The relation is defined on top of two other functions: models the environment effects of a single block statement , and models the program counter resulting by executing a single control flow statement . Both functions can return and in case of violated assertions and type errors respectively.

The execution of assigns the evaluation of the expression to the variable . Let and be the type of , the value of the variable is updated in the context (). The statement fails in case of a type mismatch: or . The statement has no effects if the expression evaluates to true (i.e. ) and terminates in an error state otherwise.

The execution of jumps to the referenced block, by setting the program counter to . If the type of is neither string nor integer then the statement fails. The statement changes the control flow based on the condition . The statement fails if the type of the condition is not or the targets (i.e.  or ) are not valid labels. Notice that the targets of the jump are evaluated using the current context, allowing BIR to express indirect jumps that are resolved at run-time.

The HOL4 model is equipped with several supporting tools and definitions, which simplify the development of the transpiler and analyses. A weak transition relation is defined to hide some executions steps. Let be a set of labels, is a partial function built on top of the single-block small step semantics, which yields the first reachable state that is an error state or that has the program counter in . The function is undefined if no error and no label in are reachable. A formally verified type checker permits to rule out run-time type errors. Well typed programs cannot have wrongly typed expressions (i.e. arguments of binary operators must have the same type, values to store in memory must match the specified type, etc.), use only one type per variable name, use expressions with correct types in statements (i.e. conditions in assertions and conditional jumps must be boolean expressions). A pair consisting of a program and an environment is well typed if the program is well typed and if for every variable the type of variable in the environment matches the usage of the variable in the program. Well typed program fragments that start from well typed environments cannot cause type errors and can only reach well typed environments.

4 Certifying Transpiler

The translation procedure uses a mapping of HOL4 machine states to BIR states. Every machine state field must be mapped to a BIR variable or to the program counter. Our framework provides one default mapping for each supported architecture: ARMv8 and CortexM0. In both cases, the i-th register is mapped to the variable (i.e.  represents the register number zero), the variable represents the system memory, the BIR program counter reflects the machine’s program counter, and every flag is mapped to a proper variable. This mapping induces a simulation relation that relates BIR states to machine states.

To transform a program to the corresponding BIR fragment, we need to capture all possible effects of the program execution in terms of affected registers, flags and memory locations. The generated BIR fragment should emulate the behaviour of the instructions executed on the machine. This goal is accomplished by reusing the function and the following two HOL4 certifying procedures.

  • A procedure to translate HOL4 word terms (i.e. those having type , , etc.) to BIR expressions. This procedure is used to convert the guards of the step theorems and the expressions contained in the transformation functions.

  • A procedure to translate a single instruction to the corresponding BIR fragment. This procedure computes the possible effects of an instruction using the transformation functions of the step theorems.

To phrase the theorem produced by the transpiler we introduce the following notations. A binary program is represented by a finite set of pairs , where each pair represents that the instruction is located at the address . The predicate states that the program is stored in the memory of the state (formally, ). A single machine instruction can be transpiled to multiple blocks. The transpiler uses a naming convention to distinguish the label of the first block produced for an instruction from the labels of the other blocks. Hereafter, we use to identify the set of labels that represents the entry point of instructions. We denote transitions of machine states with and transitions of BIR visiting with . The translation procedure generates a theorem that resembles compiler correctness222The ISA and BIR transition systems are deterministic, thus the transition relations are functions. For this reason we omit quantifiers over the states on the right hand side of transitions, since they are unique.:

Theorem

Let be the entry point of the program . For every state and BIR state , if , , and , then

  1. for every if then
    , and

  2. for every if then
    .

The meaning of the transpiler theorem is depicted in Figure 3. Each machine instruction is translated to multiple blocks, the first one having a label in , and each block consisting of multiple statements. Assuming that the program is stored in machine memory, the state is configured to start the execution from the entry point of the program, and the initial HOL4 machine state resembles the initial BIR state, then (1) for every state reachable by the ISA model, there is an execution of the BIR program that results (after visiting blocks whose labels are in ) in either an error state () or in a state that resembles , and (2) for every state reachable by the BIR program after reaching the first block of an instruction, there is an execution of the machine that re-establishes the simulation relation.

Figure 3: The theorem demonstrated by the transpiler

Error states permit to identify if an initial configuration can cause a program to reach a state that cannot be handled by the transpiler (e.g. self-modifying programs or programs containing instructions whose behavior cannot be predicted by the step function). It is worth noticing that these cases cannot be identified statically without knowing the program preconditions (e.g. misaligned memory accesses can be caused by the initial content of the stack where pointers are stored) and must be ruled out when verifying the program.

4.1 Translation of expressions

In order to build the transpiler on top of the step function, the HOL4 expressions occurring in the guards and the transformation functions must be converted to BIR expressions. For example, while translating the binary instruction 0x54fffe8c of Section 3.1 to a conditional jump, the expressions and must be expressed in BIR to generate the condition and the target of the jump respectively.

Let be a HOL4 expression, the output of the transpiler is the theorem , stating that, if the environment satisfies the assumption , then the evaluation of is . These assumptions usually constrain the values of the variables in the environment to match the free variables of the HOL4 expressions. For instance, for the expression the transpiler generates the theorem .

If a HOL4 operator has no direct correspondence in BIR, the transpiler uses a set of manually verified theorems to justify the emulation of the operator via a composition of the primitive BIR operators. This is the case for expressions involving bit extractions (i.e. most significant bit, least significant bit, etc), alignment, reversing endianness, and rotation.

For several ISA models, special care must be taken to convert expressions involved in updating status flags. For instance, both ARMv8 and CortexM0 use so called NZCV status flags for conditional execution, where

  • Negative is set if the result of a data processing instruction was negative

  • Zero is set if the result is zero

  • Carry is set if is set if an addition, subtraction or compare causes a result bigger than word size

  • oVerflow is set if an addition, subtraction or compare produces a signed result bigger than 31/63 bit (for CortexM0 and ARMv8 respectively), i.e. the largest representable positive number

The expressions produced by the ISA models for these flags involve conversion of words to natural numbers and arithmetic operations with arbitrary precision. For example, following the pseudocode of the ARMv8 reference manual armv8manual , the carry flag in 64-bit additions is computed by the expression , where and is their interpretation as natural numbers. Both the inequality and the addition cannot be directly converted as BIR expression, because BIR can only handle finite arithmetics333This design choice simplifies the development of analyses for BIR and the integration of external tools, like SMT solvers supporting bitvectors. For the carry flag the transpiler uses the theorem , where and are complement and unsigned comparison of bitvectors respectively.

4.2 Translation of single instructions

The transpilation of a single instruction takes three arguments: the binary code of the instruction, the address of the instruction in memory, and a set of memory address ranges . The latter argument identifies which memory addresses should not be modified by the instruction and is used to guarantee that the program is not self-modifying. In fact, a self-modifying program cannot be transformed to equivalent BIR programs (due to BIR following the Harvard architecture). If an instruction modifies the program code then the translated BIR program must terminate in an error state. The addresses in are used to instrument the instruction transpiler with the information about where the program code is stored. Hereafter we use to represent that the transpiler produces the program fragment (sequence of blocks) . Also, we use to represent that a program contains the fragment .

Figure 4: BIR fragment generated to one instruction

The transpiler uses the function to compute the behavior of the input instruction and to generate the step theorems , where is . Hereafter we assume that the function generates two theorems (i.e. ), which is the case for conditional instructions in CortexM0 and branches in ARMv8. We will comment on the other cases at the end of this section.

A single machine instruction can be translated to multiple BIR blocks, following the template of Figure 4. The label of the first block is equal to the address of the instruction and is the only block having an integer label. The other two blocks have string labels and represent the effects of the two step theorems.

The behavior of the instruction can be soundly deduced by the step function only if one of the predicates holds (see Section 3.1). The transpiler simplifies the disjunction of the guards demonstrating (where is a HOL4 predicate) and translates it to a BIR expression (demonstrating ). The BIR statement is generated as first statement of the first block. Intuitively, if a machine state does not satisfy any guard, then every similar BIR state does not satisfy the assertion, causing the BIR program to terminate in a error state. On the other hand, if the BIR state satisfies the assertion, then every similar machine state satisfies at least one of the guards, thus the instruction’s behavior can be deduced by the step function.

The second task is to redirect the BIR control flow to the proper internal block according to the guards of the step theorems. The procedure translates to a BIR expression ( demonstrating ) and is generated as last statement of the first block. Intuitively, a BIR state executes if and only if the similar machine state satisfies .

The third task is to translate the effects of the instruction on every field of the machine state for every step theorem . Let be one field of the machine state (e.g.  is the register zero) and let be the corresponding variable of BIR according to the relation . The transpiler computes the new value of the field (and demonstrates ). If then the machine state’s field is not affected by the instruction and the corresponding variable should not be modified by the generated BIR block, otherwise the variable must be updated accordingly. The expression is translated to obtain the theorem and the BIR statement is generated. The need of a temporary variable is due to the presence of instructions that can affect several variables, and whose resulting values depend on each other (i.e. imagine an instruction swapping registers zero and one, where ). After all field values have been computed and stored into the temporary variables, these are copied into the original variables via the statement .

Special care is needed for memory updates (i.e. ). The BIR program should fail if the original program updates a memory location in . The transpiler inspects the expression to identify the addresses that can be changed by the instruction and extracts the corresponding set of expressions (in CortexM0 and ARMv8 a single instruction can store multiple registers). The expression (which guarantees that no modified address belongs to the reserved memory region) is translated to obtain the theorem and the BIR statement is added as preamble of the block. If the machine instruction modifies an address in , then the corresponding BIR state does not satisfy the assertion, causing the BIR program to terminate in an error state.

Finally, the program counter field is used to generate statements that update the control flow. The expression is translated to and is appended to the BIR fragment. If possible, is first simplified to be a constant, which reduces the number of indirect jumps in the BIR program.

This procedure is generalized to handle arbitrary number of step theorems, using one block per theorem. Moreover, the transpiler optimizes some common cases. If the transformation function modifies only the program counter (i.e. a conditional instruction, which behaves as NOP if the instruction condition is not met) then is not generated and the translation of is used in place of ”ad-j” in . If there is only one step theorem, then the is merged with and the conditional jump is removed. If an updated state field is not used to compute the value of other fields then the temporary variable is not used.

A large part of the HOL4 implementation focuses on optimizing the verification that the generated fragment resembles the original machine instruction. This is done by preproved theorems about the template-blocks (i.e. ), which enable the transpiler to use the intermediate theorems generated for expressions and the step theorems to establish the instruction-theorem. The preproved theorems for the template-blocks also ensure that these are well-typed, statically guaranteeing that a generated fragment cannot cause a type error.

Theorem

Let . For every machine state , BIR state , and BIR program if , , , and , then

  1. if and 444We remark that is deterministic and left total, and is deterministic then

The theorem shows (1) the BIR program either fails or reaches the first block of an instruction starting from the first block of the translated one (i.e. the internal blocks do not introduce loops), and (2) if the complete execution of the generated blocks succeeds then the BIR program behaves equivalently to the machine instruction and memory in is not modified.

4.3 Transpiling programs

The theorems generated for every instruction are composed to verify Theorem 4. Property (1) is verified by induction over , using predicate . This ensures that the program is in memory after the execution of each instruction, thus allowing to make the assumption of the translation theorem (i.e. ) an invariant.

Property (2) is verified by induction over . Since then the program counter of points to the one of the produced by the transpiler. Therefore we can use the corresponding instruction-theorem to show that exists. This and the fact that the ISA transition relation is total enable part (2) of the instruction-theorem, showing that the machine instruction behaves equivalently to the BIR block.

4.4 Support for more architectures

In the following, we review the modifications of the certifying procedures needed to support other common computer architectures, like MIPS, x86 and ARMv7-A. The transpiler has three main dependencies: A formal model of the architecture, a function producing step theorems, and the definitions of a simulation relation. There exist HOL4 models for x86, x64, ARMv7-A, RISC-V, and MIPS, which are equipped with the corresponding step function. The simulation relation can differ for each architecture since it maps machine state fields to BIR variables. In fact, the name, the number, and the type of registers can be very different among unrelated architectures. However, defining the simulation relation is straightforward, since it simply requires to map machine state fields to BIR variables.

The expression translation has to handle the expressions of guard conditions and transformation functions that are present in the step theorems. Since these use HOL4 number and word theories, independently of the architecture, big parts of the translation of Section 4.1 can be reused. There are two exceptions: One is the possible usage of different word lengths, and the other is the need of proving helper theorems to justify the emulation of operators that have no direct correspondence in BIR (e.g. for the computation of the carry flag in CortexM0 and ARMv8, or to support specialized instructions for encryption).

Defining the simulation relation and extending the expression translation enable the transpilation of single instructions of Section 4.2 to support a new architecture, without requiring further modifications. In fact, the structure of the produced BIR blocks is architecture independent and is ready to support some peculiarities of MIPS and x64.

4.4.1 Delay slots

On the MIPS architecture, jump and branch instructions have a “delay slot”. This means that the instruction after the jump is executed before the jump is executed. The HOL4 model for MIPS handles delay slots using the shadow registers BranchDelay (), which can be either unset or the address of an instruction. This register can be mapped to BIR using a boolean variable (), which holds if the shadow register is set, and a word variable (), which represents the register value.

Let be a MIPS state, the transition relation is undefined if is set and the instruction makes a jump (i.e. jumps are not allowed after jumps). Otherwise it yields a state where

  • if is set, otherwise

  • is the target of jump if the instruction modifies the control flow, otherwise is unset

The step theorems can be accordingly generated. Let be the step theorem’s guards of an instruction and let be the condition that causes the instruction to modify the control flow (i.e.  is always false if the current instruction is neither a jump nor a branch), holds if is unset or does not hold. Following the template of Figure 4, the transpiler will produce the assertion , where . Non-jump and non-branch instructions need multiple step theorems to model delay slots. These instructions could be translated via two BIR blocks: one executed when holds and that jumps to ; and one that jumps to the next instruction when BD_SET does not hold. Finally, the variables and can be update like all other register variables.

4.4.2 Different calling conventions

Intel x86 and x64 architectures have different calling conventions. In fact, in x64 a limited number of parameters can be passed via registers. It is worth noticing that the transpiler does not make any assumption on the calling convention used. In fact, the transpiler procedure does not need to know symbols and can handle programs that violate standard calling convention. In practice, different calling conventions will result in different assembly instructions, whose behavior is captured by the step theorems.

5 Weakest preconditon generation

Contract based verification is a convenient approach for compositionally verifying properties of systems. Due to the unstructured nature of binary code, a binary program (and therefore the corresponding BIR program) can have multiple entry and exit points. For this reason we adapt the common notion of Hoare triples. Hereafter we assume programs and environments to be well typed. Let and be two partial functions mapping labels to BIR boolean expressions, we say that a BIR program satisfies the contract , if the execution of the program starting from an entry point and a state satisfying the precondition establishes the postcondition whenever it reaches an exit point , formally

Definition (BIR triple)

holds iff for every , , if , then and .

There are well known semi-automatic techniques to verify contracts for program fragments that are loop free and whose control flow can be statically identified. A common approach is using precondition propagation, which computes a precondition from a program and a postcondition . Hence, the algorithm must ensure that the triple holds. Also, if for every the precondition is the weakest condition ensuring that is met, then the algorithm is called weakest precondition propagation. The desired contract holds if for every the precondition implies the computed precondition . Therefore, the overall workflow is: specify a desired contract, automatically compute the weakest precondition, and prove the implication condition.

5.1 WP generation

Since the BIR transition relation models the complete execution of one block, we consider a block the unit for which the algorithm operate. The weakest precondition is propagated by following the control flow graph (CFG) backwards. The CFG is a directed acyclic graph (due to absence of loops) with multiple entry points and multiple exit points. For instance, Figure 5a depicts the CFG of a program with blocks having labels through , two entry points (i.e.  and ) and two exit points ( and ).

Our approach is to iteratively extend a partial function , which initially equals the postcondition partial function . This function maps labels to their respective weakest precondition or postcondition in the case of exit points. Formally, the procedure preserves the invariant and (where is the restriction of a function to the domain and is set complement). Once includes all entry points of the program, the weakest precondition is obtained by the restriction . For each iteration, the algorithm uses the following rule: It (1) selects the label of a BIR block for which the domain of contains the labels of all successors (i.e. the weakest precondition of the successors have already been computed); (2) computes the weakest precondition for the label , demonstrating ; (3) extends as . Figure 5 depicts an example of this procedure. The partial function is defined for labels of dark gray nodes, and light gray nodes represent nodes that can be selected by (1). Figure 5a shows the initial state where only the exit nodes are mapped and Figure 5b shows the results of the first iteration if is selected. The algorithm extends with the weakest precondition for , making available for selection in the next iteration. Notice that after the first iteration, we cannot directly compute the weakest precondition of , since we do not know the weakest precondition of the child node labeled .

(a) Initial state
(b) State after one iteration
Figure 5: Iterations to compute the weakest precondition.

To compute the weakest precondition of a block compositionally, the definition of triples is lift to smaller elements of BIR, i.e. the execution of single blocks and single statements:

Definition

Let be a block having label , be a statement, and be a control flow statement:

  • holds iff for every if and then

  • holds iff for every if and then and

  • holds iff for every if and then and

To compute the weakest precondition for the label (i.e. ) the proof producing procedure uses the following rules: Assuming that is defined for every child of and that the program block having label is , the weakest precondition of can be computed by propagating the preconditions through the block, by computing the weakest precondition of the execution of (i.e. ). As usual for sequential composition, the block precondition is computed by propagating the postcondition backwards.

Notice that, differently than blocks and control flow statements, internal block statements always have one successor, therefore their postcondition is a boolean expression instead of a partial function. The rules for the block statements and are standard (we use to represent the substitution of every occurrence of the variable name in with the expression ):

The rules for the control flow statements and are standard for unstructured programs. Both rules access for the successor labels, which requires to identify the CFG of the program statically (e.g. the program fragment is free of indirect jumps), and to have already computed the preconditions for the successor labels (, , and ).

Steps (2) and (3) of the weakest precondition procedure have been formally verified, by demonstrating soundness of the rules. Step (1) is a proof producing procedure, which dynamically demonstrates for each iteration that the selection of is correct (i.e. that contains all successors of ). This frees us from verifying that the algorithm to select

and compute the CFG graph is correct, enabling to integrate heuristics to handle indirect jumps, which can be incomplete and difficult to verify. Also, our procedure only demonstrates soundness of the generated precondition, but does not prove that it is the weakest one. This task requires to dynamically build a counterexample showing that any weaker condition is not a precondition. The lack of this proof does not affect the usage of the tools, since it is not needed for contract based verification.

5.2 Optimization

Weakest precondition propagation has two well known scalability issues. Firstly, branches in the CFG can cause an exponential blowup. For instance, in Figure 5 the precondition of node is propagated twice (via and ) and occurs twice as sub-expression of the precondition of the branch node . There exist approaches leino2005efficient to handle this problem, however they generate preconditions that are difficult to handle with SMT solvers, which can preclude their usage for practical contract verification. The second problem, which Section 7 demonstrates to be critical for our scenario, is that expression substitutions introduced by assignments can exponentially increase the size of preconditions. Consider the following program fragment:

The weakest precondition of is . This equals when external substitutions are expanded. This behavior is common in BIR programs that model binary programs, due to the presence of indirect loads and stores. For example, in the following fragment

both and are loaded from , leading the expression modeling the new value of the variable to contain three occurrences of .

A solution to this issue is using single dynamic assignment and passification. The program is transformed into an “equivalent” one, which ensures that each variable is only assigned once for every possible execution path. Then a second transformation generates a program that has assumptions in place of assignment. However, this approach requires proof-producing transformers and additional machinery to transfer properties from the the passified program to the original one.

For the scope of contract based verification, we can obtain the same conditions without applying these transformations. Our goal is to generate a precondition and to check that for every BIR state this is entailed by the contract precondition , i.e. checking that is a tautology (we write this as ). This drives our strategy: (1) we generate a weakest precondition that contains substitutions, but we do not expand them to prevent the exponential growth of ; (2) we take the precondition and we generate a substitution free condition using a proof producing procedure, which ensures that if and only if ; (3) the original contract is verified by checking unsatisfability of .

For step (2) we developed a proof producing procedure that uses a set of preproved inference rules. The rules capture all syntactic forms which can be produced by the weakest precondition generation, i.e. conjunction, implication and substitution. If the precondition is a conjunction, we can recursively transform the two conjuncts under the common premise and combine the transformed preconditions:

If is the implication , we can include in the premise and transform recursively. Then we can restore the original implication form using the transformed predicate :

If is the substitution and is free in , we apply the following rule. This rule prevents the blowup by avoiding applying the substitution and instead using the fresh variable as an abbreviation for in . Before continuing the application of transformation rules, we have to remove the substitution . This ensures that each application of this transformation rule removes one substitution from . We achieve this by recursively applying the substitution until we reach another substitution . Normally, the application of substitution would require the application of the inner substitution first, which would reintroduce the blowup problem. Instead, we rewrite the expression as follows: This moves the inner substitution out by individually applying the substitution to every free occurrence of in and . This means that the substitution should not be applied to if .

Notice that the last tautology transformation rule cannot be applied if , since it can introduce type errors. Consider the predicate substitution . Here, the two references to have different types, i.e. integer and boolean. By using this rule and applying all substitutions, we would obtain . However, this equality does not hold, since the left side can be a tautology while the right side cannot, since it is a wrongly typed expression. Practically, if is not free in then the substitution has no effect and can be simply removed:

This simplification procedure can be applied to the previous example, where the weakest precondition of is . Let be a precondition, the simplification prevents the four repeated occurrences of the variable in the final substitution:

6 Applications

The TrABin’s verification work flow consists of three tasks: (1) transpile a binary program to BIR, (2) proving that the BIR program does not reach error states, (3) proving that the desired properties of the BIR program hold, and (4) using the refinement relation to transfer these properties to the original binary program.

Task (2) can be done following the strategy of Section 5. Let be the partial function whose domain is the program entry points and values are the corresponding preconditions, and let be the partial function whose domain is the exit points of the program which all map to the constant true, error freedom of program can be verified by simply establishing the contract .

For (3) and (4) we show that the transpiler output and BIR contract verification can be used for four common verification tasks: Control Flow Graph (CFG) analysis, contract-based verification, partial correctness refinement, and verification of termination.

Knowing the CFG of a program is essential to many compiler optimizations and static analysis tools. Furthermore, proving control flow integrity ensures resiliency against return-oriented programming Shacham:2007:GIF:1315245.1315313 and jump-oriented programming attacks bletsch2011jump . In its simplest form, the CFG consists of a directed connected graph , whose node set is : The graph contains if the program can flow from the address to the address by executing a single instruction; The nodes represent the entry points of the program, which can be multiple due to binary programs being unstructured.

Analyzing the CFG of a binary program requires to deal with indirect jumps. Even if the source program avoids using function pointers, indirect jumps are introduced by the compiler, e.g. to handle function exits and exceptions (for example the ARM link register is used to track the return address of functions and can be pushed to and popped from the stack). For this reason, the correctness of the control flow depends on the integrity of the stack itself. Thus, verifying the CFG of a program requires assuming a precondition for every entry point , which constraints the content of the heap, stack and registers.

Definition (Control flow graph integrity)

For every machine state such that , , and , for every , if and then .

It is straightforward to show that CFG integrity can be verified by using the transpiler theorem, by defining a BIR precondition that corresponds to , and by proving the following verification conditions.

Verification Condition (BIR control flow integrity)

For every such that , and for every if , then , , and .

Verification Condition (Transfer of precondition)

For every and such that , if then .

Contract based verification for a binary program consists of verifying that a program meets the contract , when its executions start at an entry point and end at one of the exit points .

Definition (Contract verification)

For every and such that , and , if