Log In Sign Up

User Experience with Language-Independent Formal Verification

by   Suhabe Bugrara, et al.

The goal of this paper is to help mainstream programmers routinely use formal verification on their smart contracts by 1) proposing a new YAML-format for writing general-purpose formal specifications, 2) demonstrating how a formal specification can be incrementally built up without needing advanced training, and 3) showing how formal specifications can be tested by using program mutation.


Formal Specification and Verification of Smart Contracts for Azure Blockchain

In this paper, we describe the formal verification of Smart Contracts of...

SPEEDY: An Eclipse-based IDE for invariant inference

SPEEDY is an Eclipse-based IDE for exploring techniques that assist user...

A Survey of Smart Contract Formal Specification and Verification

A smart contract is a computer program which allows users to define and ...

Specification sketching for Linear Temporal Logic

Virtually all verification and synthesis techniques assume that the form...

Transition-Oriented Programming: Developing Verifiable Systems

It is extremely challenging to develop verifiable systems that are regul...

Comparative Study of Eight Formal Specifications of the Message Authenticator Algorithm

The Message Authenticator Algorithm (MAA) is one of the first cryptograp...

{log}: Applications to Software Specification, Prototyping and Verification

This document shows how Z specifications can be translated into {log} an...

1. Introduction

The goal of this paper is to help mainstream programmers routinely use formal verification on their smart contracts. It attempts to achieve this by:

  1. proposing a simple, general-purpose, easy-to-read YAML format for writing formal specifications that is purely syntactic sugar for the K Framework (Roşu and Şerbănuţă, 2010) and can thus, in principle, be used for any smart contract language (Section 4),

  2. demonstrating how a programmer can incrementally build up a formal specification for their program without advanced training and without needing to understand the internals of the formal verification prover (Sections 5),

  3. showing how formal specifications can be tested by constructing mutant variations of the program. (Section 7).

Section 2 gives an overview of the K Framework. Section 3 discusses the major usability challenges that state-of-the-art formal verification systems share that have inhibited their mainstream adoption. Section 6 describes the partial formal specification written for the SimpleMultiSig, a real Ethereum multisig smart contract.

2. K Framework

The K Framework (Roşu and Şerbănuţă, 2010) is a disruptive system in programming languages that seeks to enable the design and implementation of programming tools (such as compilers, virtual machines, deductive verifiers, and others) independently of a particular programming language. Its vision is that, once the syntax and operational semantics of a programming language is specified in K, the K Framework can automatically generate programming tools in a correct-by-construction manner:

K semantics have been successfully written and tested for several popular and widely-used languages including C (Hathhorn et al., 2015), Java (Bogdanas and Rosu, 2015), EVM (Hildenbrandt et al., 2018), Javascript (Park et al., 2015), x86-64 (Dasgupta et al., 2019)

, and others. The K Framework itself and these individual language semantics are open-sourced under the UIUC License, which is permissive and free. One of the distinctive advantages of a K semantics for a programming language is that it is

executable in the sense that the K Framework can immediately generate an interpreter for the language directly from the K semantics. This interpreter can be used to execute any program written in that language which enables one to actually write and run test cases on the semantics. In contrast, for example, the EVM Yellow Paper (Wood, ) is an English language semantics that is not executable and has been found to be unclear, under-specified, and, in exceptional cases, inconsistent with actual EVM implementations (Hildenbrandt et al., 2018). The Ethereum Foundation has been considering adopting the K-EVM semantics as the official semantics for the EVM.

Even at this early stage of the area, many new programming languages have already been proposed, developed or adopted to support decentralization. Some examples are Move, Mokoto, Solidity, Vyper, Serpent, and WebAssembly. It is impractical and wasteful to repeatedly build a new formal verification system for each language. Furthermore, programming languages are constantly evolving and continually needing to keep a formal verification system up-to-date with the latest versions is expensive. The power of the K Framework is that formal verification systems can be developed independently of a particular programming language. Once the semantics of the language is formally defined in K, a K-based prover can immediately be used to formally verify properties of programs written in that language.

3. Usability

This section discusses some of the usability issues that state-of-the-art formal verification system typically have.

3.1. User Interactions

A user typically interacts with a formal verification system in several different ways:

  • The user writes a formal specification of the program in a declarative, logic-based specification language that the prover must be designed to handle.

  • The user writes formal summaries that are attached to difficult-to-reason-about blocks of the program, such as a particular function or loop. A formal summary is typically written in the same logic language as the actual formal specification and either partially or fully encodes the block’s behavior or gives the prover hints about how to reason about the block.

  • The user reads the final output of the prover which ideally is a formal proof that can be mechanically checked by a proof checker that is orders of magnitude simpler and smaller than the prover itself. Unfortunately, most systems simply output either ”yes, the program adheres to the spec” or ”no, the program does not adhere to the spec”, which introduces trust issues discussed in Section 3.5.

  • The user reads the auxiliary output of the prover that may give the user some information about why the prover is not able to prove the spec within a reasonable time period. This auxiliary output typically requires deep understanding of the internals of the prover’s implementation as well as its underlying mathematical foundations.

3.2. Undecidability

The undecidability of the halting problem (Turing, 1937) has a direct impact on the usability of formal verification systems. Rice’s theorem (Rice, 1953), an immediate consequence of the halting problem, states that all non-trivial, semantic properties of programs are undecidable. Thus, it is impossible for a prover to be fully automatic across all programs, for any interesting semantic property. This limitation implies that a user will inevitably need to interact with the prover, in some manner beyond simply writing the formal specification, because the prover cannot, in general, be fully automatic and will need the user’s help to push the proof through.

For example, provers requires that the user supplies a formal invariant for each dynamically-bound loop in a program, and these invariants are difficult for non-experts to write. For example, see the encodepacked_keccak00 (§ 5.6) and encodepacked_keccak01 (§ 5.6) programs, the ecrecoverloop01 (§ 5.9) program, and the storage02 (§ 5.12) program.

3.3. Language Level and Compilation Stage

A practical formal verification system will make many design decision in its implementation, but one of the most impactful is choosing the programming language and compilation stage the prover will operate at. Systems will typically take one of the following approaches:

3.3.1. Higher Language Level / Early Compilation Stage

With this approach, the prover is designed to work directly on the program in its original form, in the source language in which it was written, before any compilation step and without any significant source code transformations. This approach has several disadvantages:

  1. High-level languages such as Solidity or C++ are enormous, and it would require immense engineering resources to model every language construct. For example, the Solidity language includes inline assembly which means that a prover that operates at the Solidity level would need to be engineered to reason about every inline assembly instruction as well the rest of the high level Solidity language.

  2. The prover would only work for the chosen high-level programming language and would need to be actively maintained and updated for every new version of the language, which is costly especially for a language like Solidity which is constantly being iterated on (Foundation, 2019a).

  3. The prover’s conclusion about the correctness of the program would only apply to the high-level program and not the final binary produced by compiler which is what actually gets executed. The compiler could have a bug, for example in its optimization passes, which incorrectly changes the semantics of the program in its binary form.

On the other hand, a very compelling advantage of this approach, which perhaps compensates for the disadvantages listed above, is that because the prover works at the language level at which the program was written, it makes it easier for the user to interact with the prover, make sense of its behavior, and understand its auxiliary output (Section 3.1). As explained in Section 3.2, the user will inevitably need to interact closely with a prover because of the theoretical limits of fully automated formal verification.

3.3.2. Lower Language Level / Late Compilation Stage

With this approach, the prover is designed to work after the last compilation stage, directly on the binary produced by the compiler, for example on optimized EVM bytecode or x86 instructions. This approach as several major advantages which somewhat mirror the disadvantages of operating at a higher-level:

  1. Lower level languages such as EVM bytecode and x86 will reduce the richness of higher level language constructs to a set of basic word-level operations. For example, in Solidity, the mapping data structure is compiled down to simple key-value pairs in an account’s storage. This reduction drastically simplifies the engineering of the prover.

  2. The prover’s conclusion about the correctness of the program applies to the program in its executable form, which means the user does not need to trust the compiler.

  3. The prover can be applied to programs originally written in any high-level language that compiles down to the same bytecode or binary language. For example, a prover that works on EVM bytecode could be used for programs originally written in both Solidity and Vyper by compiling it down to EVM bytecode first.

However, a critical disadvantage of this approach is that the user experience of interacting with a prover that operates at the bytecode level is prohibitively tedious and time-consuming. Some examples of this phenomenon are discussed in Section 5 in the context of proving properties of the SimpleMultiSig smart contract at the EVM bytecode level.

3.3.3. Intermediate Language Level / Mid Compilation Stage

A common approach is to design the prover to operate on the intermediate language of a standard compiler infrastructure, such LLVM (Lattner and Adve, 2004) or yul (Foundation, 2019b). An intermediate language reduces the higher level language to a more manageable set of operations and data structures, but also retains some of the important high-level constructs that make formal verification easier such as structured control flow, functions and some type information. This approach is a compromise between strictly operating the prover at either a higher-level or lower-level language, which mitigates the impact of both their advantages and disadvantages. For example:

  • LLVM bitcode is much more readable than x86 assembly but still much less readable that C++. Readability is important because it enables the user to interact with the prover more effectively.

  • Many higher-level languages compile down to LLVM bitcode which enables the prover to work on programs originally written in any of those languages, but now the user has to trust the portion of the compiler infrastructure that generates machine code from LLVM bitcode because the prover only operates on the LLVM bitcode.

One notable instance of this approach is to create a new, specialized intermediate programming language specifically designed for formal verification instead of machine-code generation. For example, the solc-verify (Hajdu and Jovanovic, 2019) and Verisol verification tools (Lahiri et al., 2018) translate Solidity programs to Boogie (Barnett et al., 2005), a verification intermediate language. Similar, in (Bhargavan et al., 2016), Solidity programs are translated to F*.

3.4. Formalization of Correctness Properties

The first step to formally verifying a program is informally, yet rigorously, describing what the intended correct behavior of the program is, and then translating this informal description into a formal specification that a mechanized prover can understand. This translation step is critical because the proof generated by the prover assumes that the formal specification is correct in the sense that it faithfully captures the intended correctness properties of the program.

3.4.1. Formal specifications need be simple for humans to read and understand

The premise of formal verification is that formal specifications are much simpler to understand and be convinced of being correct than the program. It is unclear how useful formal verification is when the formal specification is longer or more difficult to read than the program itself.

3.4.2. A partial versus a full formal specification

Users typically do not write formal specifications for all correctness properties of a program because writing and proving formal specifications is time-consuming and difficult. Instead, in practice, users consider the cost and benefit of formally specifying a property. See Section 5.4 for an example of making this practical trade-off.

3.5. Provers are not trustworthy

A prover often consists of hundreds of thousands of lines of highly complex code and thus bugs in the prover are inevitable which could cause the prover to incorrectly claim that a program follows specification when it does not. A technique called proof carrying code (Necula, 1997) has been extensively researched in academia that aims to address this challenge, but has not been used in practice.

3.6. Investigating Why the Prover Fails

The prover may fail to prove a specification for several reasons:

  1. The specification has a bug.

  2. The program has a bug.

  3. The prover has a bug.

  4. The language semantics has a bug.

  5. The underlying constraint solver (e.g Z3 (de Moura and Bjørner, 2008)) has a bug.

  6. An incorrect lemma was supplied.

  7. The prover is not powerful enough to reason automatically about the program and specification.

Trying to figure why the prover fails is tedious, difficult and time-consuming because typically a user would need to understand the internals of the prover implementation as well as its method of deductive reasoning. Conceptually, a prover works by carefully exploring the entire, exponentially-sized state space of the program under all possible inputs. Some tools such as KLab (DappHub, 2018) have been developed to help users navigate this state space, similar to how a debugger can be used to step through the state changes of a program under a single input.

3.7. EVM Bytecode

Some of the design of EVM bytecode makes formal verification tedious, difficult, and time-consuming. For example, the EVM does not have functions, which means that a user will need to specify the actual program counter of the EVM bytecode that the specification applies to. If the user changes the Solidity program and recompiles it to EVM bytecode, the user will need to manually update the program counter in the specification. This manual process is error-prone because if the user gets it wrong, the specification will be incorrect and thus a buggy program may pass through undetected.

4. K-Yaml

The K specification language is based on matching logic (Roşu, 2017) which makes it powerful, expressive, and flexible, but its syntax can be difficult for new users to read. K’s syntax was originally designed to make it easier for formal methods experts to specify complete executable semantics for entire programming languages, and thus, one of its design priorities was optimizing the succinctness of writing individual rules. However, for programmers without experience in formal methods who only want to use K to write correctness specifications for their programs, the syntax can initially seem prohibitively esoteric.

This paper proposes a simple YAML-based format for structuring K specifications called K-YAML that enables users to write easier-to-read specifications for formally verifying programs. YAML was chosen because it is a widely-used, human-readable, system configuration file format that programmers are comfortable with. The key design decision of K-YAML is to be purely syntactic sugar for K and to not abstract away any part of it in order to retain its power and expressivity.

A K-YAML specification is a list of spec blocks where each block has the following structure:

2  - rule:
3      name: <Name-of-Spec>
4      inherits: <Name-of-Parent-Spec>
5      if:
6        match:
7          <If-Terms>
8        where:
9          <If-Constraints>
10      then:
11        match:
12          <Then-Terms>
13        where:
14          <Then-Constraints>
  • The name key designates a name for this spec block that can be referenced elsewhere in the specification.

  • The inherits key references another spec block to indicate that this one inherits from it. (Section 5.5)

  • The if key is the precondition, which defines the set of initial program states of interest using two components: match and where. Conceptually, the match component is similar to Rust’s match operator which accepts patterns over terms and variables to describe the structure of data and then matches and binds a value against the structure. Here, match is a dictionary whose keys are K configuration cells and whose values are K terms over symbolic variables. The where component is a list of conjuncts over K predicates and symbolic variables that have possibly been bound by the match component. A program state is in the precondition if it matches the match dictionary and satisfies the where constraints.

  • The then key is the postcondition which specifies what the final program states should be in terms of the initial program states defined by the precondition. It uses the same match and where mechanism to specify this set of final program states.

Section 5 gives many examples of small programs and the corresponding correctness specifications in K-YAML.

5. Verification Walkthrough

This section details some of my experience using the K prover to verify a few properties of the SimpleMultiSig smart contract. I was not involved in the K prover’s design and implementation, and thus, I needed to find a way to use it effectively without understanding its internals. My approach was to start by formally verifying simple programs and then incrementally adding new functionality until I reconstructed the SimpleMultiSig. This approach enabled me to use the prover as a black box as much as possible, thus minimizing the need to understand its internals. However, I did need to have a thorough understanding of the K-EVM semantics (Hildenbrandt et al., 2018) to make progress.

5.1. Starting Point

As a starting point, I used the simple program below which has a single function named execute that returns the value 5:

1pragma solidity 0.5.0;
2contract simple00 {
3    function execute() public returns (uint) {
4        return 5;
5    }

Then I wrote the specification below to try to prove that each transaction to the execute function always returns the value 5 successfully. More technically stated, the specification ensures that when the calldata of a transaction matches the ABI-encoding of execute’s function-selector, the value returned is the ABI-encoding of the constant 5 as a 32-byte uint256, and the status code of the transaction is EVMC_SUCCESS:

2  - rule:
3      if:
4        match:
5          callData: \#abiCallData("execute", .TypedArgs)
6      then:
7        match:
8          statusCode: EVMC_SUCCESS
9          output: \#encodeArgs(#uint256(5))

Then I compiled the program to EVM bytecode using the Solidity compiler and invoked the K prover on the bytecode and the specification. After a few seconds, the prover returned True, which indicates that it was able to formally prove that the program passes the specification for all possible transactions, in the context of all possible blockchain states.

5.2. Using function parameters

Next, I took the previous program and modified it by 1) adding a parameter a0 to the execute function and 2) returning a0 instead of the constant 5.

1pragma solidity 0.5.0;
2contract simple02 {
3    function execute(uint a0) public returns (uint) {
4        return a0;
5    }

The specification had to be modified as follows:

  1. The callData cell was changed by introducing a fresh symbolic variable A0 that is bound to the first, 32-byte argument of the calldata.

  2. The output cell was changed to return A0, thus specifying that execute returns the value of its first, 32-byte argument.

  3. Finally, a where constraint is added to constrain the value of the symbolic variable A0 to be within the uint256 domain, which includes all integers between .

2  - rule:
3      if:
4        match:
5          callData: \#abiCallData("execute",\#uint256(A0))
6        where:
7          - \#rangeUInt(256, A0)
8      then:
9        match:
10          statusCode: EVMC_SUCCESS
11          output: \#encodeArgs(\#uint256(A0))

5.3. Static Arrays

Next, I changed the program to accept a static array parameter and return its first element.

1pragma solidity 0.5.0;
2contract staticarray00 {
3    function execute(uint[3] memory a)
4            public returns (uint) {
5        return a[0];
6    }

Because the KEVM semantics did not have an existing a high-level construct for expressing static array calldata parameters, I needed to add it myself. Fortunately, the ABI encoding of statically-sized data structures is simple: they are straightforwardly flattened into a tuple of 32-byte elements. I defined a new K rule named #abiCallData2 that behaves exactly like the existing #abiCallData function except that it enables the user to specify the full function signature instead of only the function name:

1syntax WordStack ::=
2    #abiCallData2 ( String , TypedArgs ) [function]
4rule #abiCallData2( FSIG , ARGS )
5  => #parseByteStack(substrString(Keccak256(FSIG), 0, 8))
6  ++ #encodeArgs(ARGS)

This new #abiCallData2 function enabled me to represent the static array parameter a as the tuple (A0, A1, A2) where A0, A1, A2 are 32-byte, fresh symbolic variables:

2  - rule:
3      if:
4        match:
5          callData: \#abiCallData2("execute(uint256[3])",
6            \#uint256(A0), \#uint256(A1), \#uint256(A2))
7        where:
8          - \#rangeUInt(256, A0)
9          - \#rangeUInt(256, A1)
10          - \#rangeUInt(256, A2)
11      then:
12        match:
13          statusCode: EVMC_SUCCESS
14          output: \#encodeArgs(\#uint256(A0))

This is a good example of the flexibility and extensibility of K compared to other verification systems: extensions can be incorporated without needing to make changes to the underlying prover. The new K rule above gave the abstraction needed to succinctly write the spec.

5.4. Dynamic byte arrays

This next program accepts a dynamically-sized, bytes-typed parameter named data and return its length:

1pragma solidity 0.5.0;
2contract bytes00 {
3    function execute(bytes memory data)
4            public returns (uint) {
6        return data.length;
7    }

The specification follows the same pattern as the previous specifications but uses the KEVM #buf construct to bind dynamically-sized calldata:

2  - rule:
3      if:
4        match:
5          callData: \#abiCallData2("execute(bytes)",
6                      \#bytes(\#buf(DATA_LEN,DATA)))
7        where:
8          - \#range(0 <= DATA_LEN < 2 ^Int 16)
9      then:
10        match:
11          statusCode: EVMC_SUCCESS
12          output: \#encodeArgs(\#uint256(DATA_LEN))

The #buf(DATA_LEN,DATA) expression is a symbolic byte buffer that introduces two fresh symbolic variables to bind the data parameter. DATA_LEN represents data’s length and DATA represents its contents.

Note the where constraint in the specification which bounds the length of DATA_LEN to be between and . This constraint is needed because the EVM imposes a upper limit on the length of a bytes array since it must fit into memory, which is 32-byte addressable. However, this upper bound is actually not tight enough to make the specification pass because the EVM will need to store other data in memory besides the bytes array. I could not figure out a quick way to calculate the tightest possible upper bound on DATA_LEN, so I did what was done for other K verification projects such as the GnosisSafe (Verification, 2019), which was to arbitrarily constrain DATA_LEN to .

Thus, technically-speaking, the correctness guarantees given by the formal proof would not apply to transactions whose data parameter is longer than bytes. Practically speaking, it is unclear whether it would be worth the effort to figure out the tightest upper bound. This situation is an example of the common trade off between the precision and cost of partial versus full formal verification, as discussed in Section 3.4.2.

This is a good example of three usability challenges: automation (Section 3.2), low-level specifications (Section 3.3.3), and prover debugging (Section 3.6). The user needs to understand the EVM at a low level to write a passing specification, and the actual tight bound to make this specification pass is hard for a user to calculate. Furthermore, the user’s first attempt at writing this specification would likely be to erroneously constrain DATA_LEN to and when the prover fails to prove this, it is very non-obvious why and the output of the K prover does not help because it only gives the set of all intermediate proof states. It is possible that, if the prover implementation was able to generate a counter-example, that is, a concrete transaction that demonstrates the program violating the specification, the user may have a better chance of figuring out the reason for the specification failing.

5.5. Transaction Reverts

This next program uses Solidity’s require statement which reverts the transaction when the a0 > 0 condition does not hold:

1pragma solidity 0.5.0;
2contract requires00 {
3    function execute(uint256 a0)
4            public returns (uint256) {
6        require(a0 > 0);
7        return 5;
8    }

The specification now needs to cover more than one program path: a success path when and revert path when . Thus it has two spec blocks, one for each path:

1- spec:
2    name: a0gt0
3    if:
4      match:
5        callData: #abiCallData2("execute(uint256)", #uint256(A0))
6      where:
7        - A0 >Int 0
8        - #rangeUInt(256, A0)
9    then:
10      match:
11        output: #encodeArgs(#uint256(5))
12        statusCode: EVMC_SUCCESS
14- spec:
15    name: a0le0
16    if:
17      match:
18        callData: #abiCallData2("execute(uint256)", #uint256(A0))
19      where:
20        - A0 <=Int 0
21        - #rangeUInt(256, A0)
22    then:
23      match:
24        statusCode: EVMC_REVERT

The first spec named a0gt0 specifies the case: the status code of the execute function is EVMC_SUCCESS and the output is the constant 5. The second spec named a0le0 specifies the case: the status code of the execute function is EVMC_REVERT and the output is unspecified.

The specification can be slightly rewritten so that common parts of the two spec blocks can be shared, instead of duplicated, in a parent spec by using the inherits property:

2  - rule:
3      name: base
4      if:
5        match:
6          callData: \#abiCallData2("execute(uint256)", \#uint256(A0))
7        where:
8          - \#rangeUInt(256, A0)
10  - rule:
11      name: a0gt0
12      inherits: base
13      if:
14        where:
15          - A0 >Int 0
16      then:
17        match:
18          output: \#encodeArgs(\#uint256(5))
19          statusCode: EVMC_SUCCESS
21  - rule:
22      name: a0le0
23      inherits: base
24      if:
25        where:
26          - A0 <=Int 0
27      then:
28        match:
29          statusCode: EVMC_REVERT

The rewritten specification above introduces a third block named base that lifts the common parts, and the a0gt0 and a0le0 blocks now inherit the base block. While it may not seem that this refactoring makes the specification more concise in this particular example, when specifying larger programs, this feature will be necessary to keep the specification readable as can be seen in final SimpleMultiSigT3 specification in Section 6.3.

5.6. Hashing packed encoding

This next program calculates and returns the keccak-256 cryptographic hash of a bytes array parameter prefixed with the single byte of 0x1:

1pragma solidity 0.4.24;
2contract encodepacked_keccak01 {
3    function execute(bytes32 a0)
4            pure external returns(bytes32) {
5        return keccak256(
6                    abi.encodePacked(byte(0x01),a0));
7    }

The specification uses the KEVM semantics function named keccak256 that abstracts the keccak-256 cryptographic hash function as an uninterpreted function that assumes all the hashed values appearing in each execution trace are collision-free:

1- spec:
2    if:
3      match:
4        callData:  #abiCallData2("execute(bytes32)",
5                    #bytes32(A0))
6      where:
7        - #rangeUInt(256, A0)
8    then:
9      match:
10        output: #encodeArgs(#bytes32(keccak(1 :
11                  #encodeArgs(#uint256(A0)))))
12        statusCode: EVMC_SUCCESS

This abstraction of keccak-256 is the typical approach taken by practical formal verification systems, where complex library functions are abstracted by writing a formal specification for their behavior called a formal summary, and then applying the formal summary at call sites invoking the library function instead of directly verifying the library function’s code. This method of abstraction enables the user to decompose the verification task into two steps: first formally verifying the program that uses the library function, and then separately verifying the implementation of the library function. See Section 5.8 for another example of how this formal summary technique is used for programs that verify signatures.

Now, when I tried to run the prover, I found that it failed to prove the specification even though the specification is correct. It turned out that the reason is that the actual EVM bytecode that this Solidity program compiles down to has a loop that is essentially an inlined ”memcpy” for copying calldata to local memory. As discussed in Section 3.2, in general, provers will not be able to reason about loops with dynamic bounds automatically unless the user supplies superfluous information such as a loop invariant.

The figure below illustrates this memcpy loop in the control flow graph of the EVM bytecode of the encodepacked_keccak00 program.

I could have spent time writing a loop invariant but fortunately, in this case, there’s a simpler approach which was to update to the 0.5.0 version of the Solidity compiler instead of 0.4.24 because 0.5.0 has optimizations that are able to optimize away the loop entirely. The following program, which has been changed to use pragma solidity 0.5.0, has an acyclic control flow graph and the prover is able verify it.

1pragma solidity 0.5.0;
2contract encodepacked_keccak01 {
3    function execute(bytes32 a0)
4            pure external returns(bytes32) {
5        return keccak256(
6            abi.encodePacked(byte(0x01),a0));
7    }

This program is a good example of several usability challenges that formal verification systems typically exhibit: automation, low-level specifications, and prover debugging. It would be difficult for a user to figure out that the compiled EVM bytecode has a hidden loop that does not appear in the original Solidity source program. If the prover was fully automated, this hidden loop would not be an issue because the prover would just be able to prove the specification without user intervention. But, because it cannot be fully automated due to undecidability, the user needs to supply the prover with a loop invariant in terms of the compiled EVM bytecode, which means the user will have to decompile the bytecode, identify the exact instruction counters of the loop entry/exit and make sense of the difficult-to-follow stack-based push/pop/dup instructions of the EVM.

5.7. Statically-sized loops

This next program has a loop that iterates three times after an initial require check:

1pragma solidity 0.5.0;
2contract staticloop00 {
3    function execute(uint a0)
4            pure external returns(uint256) {
5        uint sum = a0;
6        require (a0 < 10);
7        for (uint i = 0; i < 3; i++) {
8            sum += i;
9        }
10        return sum;
11    }

Note that statically-bound loops, such as this one, are much easier for a prover to reason about automatically than a dynamically-bound loop. In principle, a prover can automatically and fully unroll a statically-sized loop into straight-line code. In contrast, dynamically-bound loops require that the user supply a loop-invariant.

The specification below checks that, if the function parameter is an integer within [0, 10), the transaction will return the value of with an EVMC_SUCCESS status code. Otherwise, it will revert with a EVMC_REVERT status code:

2  - rule:
3      if:
4        match:
5          callData: \#abiCallData2("execute(uint256)", \#uint256(A0))
6        where:
7          - \#range(0 <= A0 < 10)
8      then:
9        match:
10          statusCode: EVMC_SUCCESS
11          output: \#encodeArgs(\#uint256(A0 +Int 3))

5.8. Recovering an address from a signature

This next program accepts a 32-byte message and the (V,R,S) components of an ECDSA signature. It uses the Solidity ecrecover function to verify the signature on the message and recover the signer address, and otherwise reverts:

1pragma solidity 0.5.0;
2contract ecrecover00 {
3    function execute(bytes32 hash, uint8 sigV,
4                     bytes32 sigR, bytes32 sigS)
5                     pure external returns(address) {
6        address a = ecrecover(hash, sigV, sigR, sigS);
7        require(a > address(0));
8        return a;
9    }

The specification below has three blocks: 1) the sigvalid block specifies that the behavior of the program on a transaction with a valid signature is to return the signer address, 2) the siginvalid block specifies that it should revert if the signature is not valid, and 3) the base block is a parent of the other two blocks that keeps the common components.

2  - rule:
3      name: base
4      if:
5        match:
6          callData: \#abiCallData2("execute(bytes32,uint8,bytes32,bytes32)",
7              \#bytes32(HASH), \#uint8(SIGV),
8              \#bytes32(SIGR), \#bytes32(SIGS))
9        where:
10          - \#rangeUInt(256, HASH)
11          - \#rangeUInt(8, SIGV)
12          - \#rangeBytes(32, SIGR)
13          - \#rangeBytes(32, SIGS)
14          - ECREC_DATA ==K
15              \#encodeArgs(\#bytes32(HASH), \#uint8(SIGV),
16                 \#bytes32(SIGR), \#bytes32(SIGS))
18  - rule:
19      name: sigvalid
20      inherits: base
21      if:
22        where:
23          - RECOVERED ==Int \#symEcrec(ECREC_DATA)
24          - not: \#ecrecEmpty(ECREC_DATA)
25      then:
26        match:
27          statusCode: EVMC_SUCCESS
28          output: \#encodeArgs(\#address(RECOVERED))
30  - rule:
31      name: siginvalid
32      inherits: base
33      if:
34        where:
35          - \#ecrecEmpty(ECREC_DATA)
36      then:
37        match:
38          statusCode: EVMC_REVERT
39          output: _

The KEVM semantics defines two functions named #symEcrec and #ecrecEmpty that abstract the ecrecover precompile as an uninterpreted function, similar to how the keccak-256 cryptographic hash function is abstracted (Section 5.6).

5.9. Recovering a static array of signatures

This next program uses a static loop (Section 5.7) to iterate over a static array (Section 5.3) of signatures (Section 5.8) and reverts (Section 5.5) if any of them do not verify. Otherwise, the function returns successfully. This program uses four different constructs that were previously verified independently of each other in earlier sections, but now they are combined into a single program. This is a good example of how a user can incrementally build up the specification of their program without needing to understand the internals of how the prover works.

1pragma solidity 0.5.0;
2contract ecrecoverloop00 {
3    function execute(bytes32 hash,
4                     uint8[2] memory sigV,
5                     bytes32[2] memory sigR,
6                     bytes32[2] memory sigS)
7            pure public {
9        for (uint i = 0; i < 2; i++) {
10            address a = ecrecover(hash, sigV[i],
11                                    sigR[i], sigS[i]);
12            require(a > address(0));
13        }
14    }

The specification has four blocks. The base block serves as the prefix of the three other blocks and specifies the behavior related to function entry such calldata. The sigs-valid block specifies the success case where all signatures can be verified. The sigs0-invalid and sigs1-invalid blocks specify the cases where the first and second signatures, respectively, cannot be verified:

2  - rule:
3      name: base
4      if:
5        match:
6          callData: \#abiCallData2("execute(bytes32,uint8[2],bytes32[2],bytes32[2])",
7                      \#bytes32(HASH),
8                      \#uint8(SIGV0), \#uint8(SIGV1),
9                      \#bytes32(SIGR0), \#bytes32(SIGR1),
10                      \#bytes32(SIGS0), \#bytes32(SIGS1))
11        where:
12          - \#rangeUInt(256, HASH)
13          - \#rangeUInt(8, SIGV0)
14          - \#rangeUInt(8, SIGV1)
15          - \#rangeBytes(32, SIGR0)
16          - \#rangeBytes(32, SIGR1)
17          - \#rangeBytes(32, SIGS0)
18          - \#rangeBytes(32, SIGS1)
19          - ECREC_DATA0 ==K
20              \#encodeArgs(\#bytes32(HASH), \#uint8(SIGV0),
21              \#bytes32(SIGR0), \#bytes32(SIGS0))
22          - ECREC_DATA1 ==K
23              \#encodeArgs(\#bytes32(HASH), \#uint8(SIGV1),
24              \#bytes32(SIGR1), \#bytes32(SIGS1))
26  - rule:
27      name: sigs-valid
28      inherits: base
29      if:
30        where:
31          - RECOVERED0 ==Int \#symEcrec(ECREC_DATA0)
32          - RECOVERED1 ==Int \#symEcrec(ECREC_DATA1)
33          - not: \#ecrecEmpty(ECREC_DATA0)
34          - not: \#ecrecEmpty(ECREC_DATA1)
35      then:
36        match:
37          statusCode: EVMC_SUCCESS
39  - rule:
40      name: sig0-invalid
41      inherits: base
42      if:
43        where:
44          - \#ecrecEmpty(ECREC_DATA0)
45      then:
46        match:
47          statusCode: EVMC_REVERT
49  - rule:
50      name: sig1-invalid
51      inherits: base
52      if:
53        where:
54          - \#ecrecEmpty(ECREC_DATA1)
55      then:
56        match:
57          statusCode: EVMC_REVERT

The prover was able to prove the specification above. But, then, I made a slight modification to the ecrecoverloop00 program by only adding a new bytes-typed parameter named data, which is not used at all in the body of the Solidity function:

1pragma solidity 0.5.0;
2contract ecrecoverloop01 {
3    function execute(bytes32 hash, bytes memory data,
4                     uint8[2] memory sigV,
5                     bytes32[2] memory sigR,
6                     bytes32[2] memory sigS)
7            pure public {
9        for (uint i = 0; i < 2; i++) {
10            address a = ecrecover(hash, sigV[i],
11                                    sigR[i], sigS[i]);
12            require(a > address(0));
13        }
14    }

To clarify, the reason I added this dead parameter is that my goal is to eventually reconstruct the SimpleMultiSig program which accepts such a bytes parameter in the same function that verifies signatures using a loop.

Unintuitively, simply adding this dead data parameter causes the prover to timeout on the sig1-invalid specification, when using a time limit of 30 minutes. In contrast, without this parameter, the prover returns successfully within three minutes. It turned out that, in order to fix this issue, I needed to add the additional, superfluous where constraint not: #ecrecEmpty(ECREC_DATA0) on line 6 to the sig1-invalid specification:

1- spec:
2    name: sig1-invalid
3    inherits: base
4    if:
5      where:
6        - not: #ecrecEmpty( ECREC_DATA0 )
7        - #ecrecEmpty(ECREC_DATA1)
8    then:
9      match:
10        statusCode: EVMC_REVERT

This additional constraint is superfluous in the logical sense because it does not change the specification’s logical meaning since is equivalent to . Conceptually, adding this additional conjunct helps the prover succeed because it directly captures the behavior of the for-loop. There may exist multiple invalid signatures, but the for-loop terminates when it reaches the very first invalid signature, so the specification needed to faithfully capture this behavior.

5.10. Reading from storage

This next program reads the storage variable n and returns its value:

1pragma solidity 0.5.0;
2contract storage00 {
3    uint private n;
5    function execute() view public returns(uint) {
6        return n;
7    }

The corresponding specification below shows how the KEVM semantics abstracts storage as an integer map represented by fresh symbolic variable S and uses the KEVM select function to calculate the offset per Solidity’s interpretation of the EVM storage:

2  - rule:
3      if:
4        match:
5          storage: S
6          callData: \#abiCallData2("execute()", .TypedArgs)
7        where:
8          - N ==Int select(S, \#hashedLocation("Solidity",
9                                    0, .IntList))
10          - \#rangeUInt(256, N)
11      then:
12        match:
13          output: \#encodeArgs(\#uint256(N))
14          statusCode: EVMC_SUCCESS

5.11. Writing to storage

This next program writes the constant 5 to the storage variable n, reads n and then returns its value:

1pragma solidity 0.5.0;
2contract storage01 {
3    uint private n;
5    function execute() public returns(uint) {
6        n = 5;
7        return n;
8    }

The EVM will give a refund of ether to the caller if a transaction writes the value of 0 to a storage location, which enables the EVM to stop explicitly tracking that storage location on-chain. The rationale of this refund mechanism is to incentivize users to free up storage they are not using anymore. For this specific program, however, there is never a refund because storage is always written with the value 5, not 0. Thus, the specification below has the refund cell set to 0 indicating a caller will never receive a refund under any transaction.

2  - rule:
3      if:
4        match:
5          storage: S0
6          refund: 0
7          callData: \#abiCallData2("execute()", .TypedArgs)
8        where:
9          - N0 ==Int select(S0, \#hashedLocation("Solidity",
10                                    0, .IntList))
11          - \#rangeUInt(256, N0)
12      then:
13        match:
14          storage: S1
15          output: \#encodeArgs(\#uint256(N1))
16          statusCode: EVMC_SUCCESS
17        where:
18          - N1 ==Int select(S1, \#hashedLocation("Solidity",
19                                    0, .IntList))
20          - N1 ==Int 5

The prover should have been able prove the specification above, however, it was not able to do. Interestingly, the prover constructed a symbolic expression that had a value of zero, but it was unable to simplify this expression to zero on its own. To fix this, I needed to add a new lemma that helped the prover understand that the expression does in fact simplify to zero:

1rule Rsstore(BYZANTIUM, NEW, CURR, ORIG) => 0 requires NEW =/=Int 

5.12. Writing overflow to storage

This next program increments the storage variable n which will overflow when n is :

1pragma solidity 0.5.0;
2contract storage02 {
3    uint private n;
5    function execute() public returns(uint) {
6        n = n + 1;
7        return n;
8    }

The specification needs to handle the non-overflow and overflow cases separately because the +Int operator used by the KEVM semantics is integer addition, rather than modular addition:

2  - rule:
3      name: base
4      if:
5        match:
6          callData: \#abiCallData2("execute()", .TypedArgs)
7          storage: S0
8          refund: _
9        where:
10          - N0 ==Int select(S0, \#hashedLocation("Solidity", 0, .IntList))
11      then:
12        match:
13          storage: S1
14          refund: _
15          statusCode: EVMC_SUCCESS
16          output: \#encodeArgs(\#uint256(N1))
18  - rule:
19      name: no-overflow
20      inherits: base
21      if:
22        where:
23          - \#range(0 <= N0 < 2 ^Int 256 -Int 1)
24          - N1 ==Int N0 +Int 1
26  - rule:
27      name: overflow
28      inherits: base
29      if:
30        where:
31          - N0 ==Int 2 ^Int 256 -Int 1
32          - N1 ==Int 

The prover was able to prove the no-overflow specification automatically without any user intervention, but not the overflow specification. The reason was that output cell of the final proof state had the term

which represents a 32-byte buffer whose value is one plus the zeroth storage slot under modular addition. The final proof state also had the constraint

which constrains the zeroth storage slot to . Ideally, the prover should have been able to simplify chop(select(S0, 0) + 1) to the value 0 on its own, but I needed to add the following lemma which simply states that applying the chop function to should simplify to 0:

1rule chop(I) => 0 requires I ==Int pow256

5.13. Calling an address

This next program invokes the destination parameter with a call payload passed by the caller:

1pragma solidity 0.5.0;
2contract call00 {
3    function execute(bool condition, uint gasLimit,
4                     uint value, bytes memory data,
5                     address destination) public {
6        require(condition);
7        bool success = false;
8        assembly {
9            success := call(gasLimit, destination,
10                            value, add(data, 0x20),
11                            mload(data), 0, 0)
12        }
13        require(success);
14    }

The specification checks the following correctness properties:

  • Call to destination is invoked at least once if condition is true

  • Call to destination is not invoked if condition is false

  • Call to destination is invoked at most once if condition is true

  • Call to destination is invoked with the correct value, gas limit, return start/length

  • Call to destination is invoked with correct data start/length

  • execute reverts if call to destination returns an error code execute succeeds if call to destination returns a success code

  • No storage reads or writes after the call to destination.

2  - rule:
3      name: base
4      if:
5        match:
6            callData:  \#abiCallData("execute",
7                \#bool(CONDITION),
8                \#uint256(GAS_LIMIT), \#uint256(VALUE),
9                \#bytes(\#buf(DATA_LEN,DATA)),
10                \#address(DESTINATION))
11            callLog: .List
12        where:
13            - \#rangeAddress(DESTINATION)
14            - \#rangeUInt(8, CONDITION)
15            - \#rangeUInt(256, GAS_LIMIT)
16            - \#rangeUInt(256, VALUE)
17            - \#rangeUInt(256, DATA_LEN)
18            - DATA_LEN <Int 2 ^Int 16
19            - CALL_PC ==Int 364
20            - CONDITION ==Int 1
21      then:
22        match:
23          callLog: ListItem ( 0 CALL_PC GAS_LIMIT
24                      DESTINATION VALUE ARGSTART_C
25                      ARGWIDTH_C 0 0 LM_C .List .List )
26                   .List
27        where:
28          - selectRange(LM_C, ARGSTART_C, ARGWIDTH_C) ==K
29                    \#buf(DATA_LEN,DATA)
31  - rule:
32      name: call-succeeded
33      inherits: base
34      if:
35        where:
36          - \#callSuccess(0, DESTINATION)
37      then:
38        match:
39          statusCode: EVMC_SUCCESS
41  - rule:
42      name: call-failed
43      inherits: base
44      if:
45        where:
46          - \#callFailure(0, DESTINATION)
47      then:
48        match:
49          statusCode: EVMC_REVERT

To check these properties, I needed to extend the KEVM semantics by adding a new cell in the KEVM configuration named callLog to keep track of a list of all the call invocations during transaction execution. Each element in the list is a tuple representing callsite information including the call index, program counter, gas limit, memory offset and size of the parameters and return value. I also added two new cells named readLog and writeLog that keep track of all the storage reads and writes to check that none occur after call instructions. The details are omitted here due to space limitations.

6. SimpleMultiSig

Multisig wallets are compelling targets for formal verification because they control high-value assets. The SimpleMultiSig smart contract (Lundqvist, 2015, 2017) is a minimal multisig wallet written in Solidity for the EVM. Its code is included in the appendix under Listing 1.

6.1. Correctness Properties

This section aims to give a complete, but informal, list of high-level correctness properties that a SimpleMultiSig implementation should have.

  • If the transaction does not include a call payload, then the transaction is rejected and has no effect on the Ethereum state.

  • The threshold and set of owners can never be changed once the program is deployed to an account.

  • The program is not susceptible to replay attacks. Once a transaction’s call payload is executed successfully on-chain, the transaction, or a part thereof, can never be used to have any further effect on the Ethereum state.

  • If the transaction does not have at least a threshold number of owner signatures that each sign the transaction’s call payload, the transaction has no effect on the Ethereum state.

  • If the transaction does have at least a threshold number of owner signatures that each sign the transaction’s call payload, and the call payload is invoked exactly once intraprocedurally.

  • If the transaction’s call payload is invoked but does not execute successfully, the transaction has no effect on the Ethereum state.

  • The program is effectively callback free (Grossman et al., 2018) and thus has no re-entrancy vulnerabilities.

  • Ether is never locked in the program.

  • The program has no other functionality.

6.2. Modifications

I made two changes to the SimpleMultiSig implementation to make it easier to formally verify using KEVM and called this new version SimpleMultiSigT3 as shown in Listing 2 in the appendix.

6.2.1. Modification #1: Statically-sized Arrays

The SimpleMultiSig uses dynamically-sized array parameters in the execute function to pass the components of the signatures. In principle, KEVM can be used to reason about dynamic arrays, however, to automate this reasoning, KEVM would need to be extended with additional lemmas, which are difficult for a non-expert to write correctly. Thus, I modified the code by replacing these dynamically-sized array parameters with statically-sized arrays instead, which means that the program would only work for a specific threshold. For example, Listing 2 in the appendix shows the program would change when the threshold is statically set to 3.

Fortunately, this modification does not restrict the functionality of the SimpleMultiSig because the threshold is immutable anyway once the constructor initializes the contract account. However, it does mean that users would need to code-generate the program for the specific threshold that they needed. Incidentally, using statically-sized arrays also has the advantage of reducing gas costs by:

  • reducing transaction size since the ABI encoding of the static arrays is shorter,

  • reducing overhead of copying the array calldata into memory,

  • eliminating the two require checks on the lengths of the signature parameters since the lengths are statically-enforced,

  • eliminating memory reads on the signature array-length checks in signature validation loop, and

  • reducing the size of the bytecode

To simplify the presentation in this paper, I use the specific code-generation of the SimpleMultiSig, named to SimpleMultiSigT3, where the threshold is statically fixed to three, but the discussion and the specification can be straightforwardly adapted to any threshold.

6.2.2. Modification #2: Solidity version

I also replaced the directive pragma solidity ^0.4.24 with pragma solidity 0.5.0 to restrict the scope of the verification to EVM bytecode generated by a Solidity 0.5.0 compiler. The Solidity 0.4.24 compiler is missing some code optimizations that the 0.5.0 version introduces. These optimizations generate significantly simpler bytecode which was easier for KEVM to reason about automatically as discussed in Section 5.6.

6.3. Partial Formal Specification of SimpleMultiSigT3

The partial formal specification written for SimpleMultiSigT3 is given in Listing 5 in the Appendix. The specification consists of seven individual K rules informally described below.

  • The executor-invalid rule checks that the execute function reverts if the executor argument is not 0 and is not equal to the msg.sender of the transaction.

  • The sigcheck-fail-revert-0 rule checks that the execute function reverts if the first signature is not valid for the EIP712 encoding of the multisig payload, which includes the gasLimit, destination, executor, value, and data arguments, and the nonce storage variable.

  • The sigcheck-fail-revert-1 rule is the same as the previous rule but for the second signature.

  • The sigcheck-fail-revert-2 rule is the same as the previous rule but for the third signature.

  • The ownercheck-fail-revert rule checks that the execute function reverts if any of the three addresses recovered from the signatures are not owners. And, the function reverts if the three addresses are not in a strictly increasing order. In other words, the first address must be numerically less than the second address, and the second address must be numerically less than the third address. This requirement ensures the addresses are unique.

  • The call-failure rule checks that the execute function reverts if the call to the destination argument returns 0, which indicates the call reverted, for example, due to running out of gas or other reasons.

  • The call-success rule checks that the execute function succeeds if 1) the executor argument is either 0 or is equal to msg.sender, 2) the three signatures are all valid and each recover to unique owners of the wallet, 3) the call invokes the destination exactly once and with the correct multisig payload, 4) the call returns 1, which indicates that it succeeded, and 5) no storage variables are modified after the call.

Note that these rules do not comprehensively cover all the correctness properties listed in Section 6.1. For example, they do not check the following aspects of the SimpleMultiSig, among others, which we leave for future work:

  • storage initialization by constructor

  • ether balance updates

  • re-entrancy

  • bounds on gas usage

  • dynamic array calldata

Furthermore, a full formal verification would also require informally, yet rigorously, arguing that the formal specification faithfully captures the correctness properties in Section 6.1.

The table below gives the results of running the seven K rules through the prover over the SimpleMultiSigT3 program using a c5.2xlarge AWS EC2 instance, which has 16GB of RAM and a 3.4 GHz Intel Xeon Platinum 8000 processor.

Rule Time (min) Result
executor-invalid 2.2 [HTML]D4EDDA proved true
sigcheck-fail-revert-0 7.1 [HTML]D4EDDA proved true
sigcheck-fail-revert-1 11.2 [HTML]D4EDDA proved true
sigcheck-fail-revert-2 18.9 [HTML]D4EDDA proved true
ownercheck-fail-revert 44.2 [HTML]D4EDDA proved true
call-failure 45.9 [HTML]D4EDDA proved true
call-success 48.6 [HTML]D4EDDA proved true
Table 1. Results of running the seven K rules through the prover over SimpleMultiSigT3.

7. Testing Specifications

The output of the prover is a simple ”yes” or ”no”: was it, or was it not, able to prove the specification? This meant that I had to trust that the prover itself did not have bugs, but, as is usually the case with formal verification provers, their implementations are large, extremely complex, and depend on sophisticated mathematics and algorithms which could easily have been coded incorrectly (Section 3.5). Because the stakes of a bug in the SimpleMultiSig is so high, I had to find a way to at least spot check the prover’s work.

Similarly, I also had to trust that the formal specification I wrote itself did not have any bugs, which is very possible considering how long it is and how unusual specification languages are to a programmer.

To try to mitigate both of these trust issues, I created thirty-two faulty variations of the SimpleMultiSigT3 program. In the software testing literature, these faulty variations are called mutants (Jia and Harman, 2011) and are constructed by taking the original program and changing it slightly to add a bug. Traditionally, mutants have been used to evaluate the effectiveness of manual test suites, but here I use the same idea to spot check the specification and prover. I ran the prover with the formal specification in Section 6.3 over each of the mutants and then checked to make sure that the prover failed to prove the specification on each mutant. The remainder of this section discusses a few examples of the different types of mutants used, and the full list of the test results can be found at

7.1. Example Test Case: Call Mutation

The call_7.sol mutant adds a bug to SimpleMultiSigT3 by adding a second call to the destination:

Table 2 below gives the results of running the prover on this mutant. This test case successfully passes because at least one of the K rules fail, specifically, the call-failure and the call-success rules.

Rule Time (min) Result
executor-invalid 2.0 [HTML]D4EDDA proved true
sigcheck-fail-revert-0 6.6 [HTML]D4EDDA proved true
sigcheck-fail-revert-1 11.3 [HTML]D4EDDA proved true
sigcheck-fail-revert-2 19.2 [HTML]D4EDDA proved true
ownercheck-fail-revert 42.5 [HTML]D4EDDA proved true
call-failure 51.6 [HTML]F8D7DA error
call-success 53.0 [HTML]F8D7DA error
Table 2. Results of running the seven K rules through the prover over a mutant which invokes the destination address more than once.

7.2. Example Test Case: EIP712 Encoding Mutation

The eip712_0.sol mutant adds a bug to the EIP712 encoding of the multisig payload by changing the first argument of the abi.encoding function invocation to 0 instead of TXHASH_TYPE

Table 3 below gives the results of running the K prover on this mutant. All except one of the specifications time out, which means that the prover process failed to exit within the three-hour time limit. I chose three hours because, for the correct SimpleMultiSigT3.sol program, the prover exits in under an hour for each of the specifications. I make the assumption that if the prover takes at least three times longer than it would on the correct program, then it will not be able to prove the specification on the incorrect program even if it was run for longer than three hours.

Rule Time (min) Result
executor-invalid 2.2 [HTML]D4EDDA proved true
sigcheck-fail-revert-0 180 [HTML]F8D7DA timeout
sigcheck-fail-revert-1 180 [HTML]F8D7DA timeout
sigcheck-fail-revert-2 180 [HTML]F8D7DA timeout
ownercheck-fail-revert 180 [HTML]F8D7DA timeout
call-failure 180 [HTML]F8D7DA timeout
call-success 180 [HTML]F8D7DA timeout
Table 3. Results of running the seven K rules through the prover on a mutant which incorrectly passes the value 0 to the EIP712 encoding rather than TXHASH_TYPE.

7.3. Example Test Case: Signature Checking Mutation

The sigcheck_5.sol mutant adds a bug to SimpleMultiSigT3 by removing the requires check which ensures the signatures are unique by enforcing that they are passed to the function in order of the strictly increasing value of their recovered addresses:

Table 4 below gives the results of running the prover on this mutant. This test case successfully passes because at least one of the K rules fail, specifically, all of them except for the executor-invalid spec.

Rule Time (min) Result
executor-invalid 2.1 [HTML]D4EDDA proved true
sigcheck-fail-revert-0 180.0 [HTML]F8D7DA timeout
sigcheck-fail-revert-1 30.9 [HTML]F8D7DA error
sigcheck-fail-revert-2 24.1 [HTML]F8D7DA error
ownercheck-fail-revert 49.6 [HTML]F8D7DA error
call-failure 46.9 [HTML]F8D7DA error
call-success 50.0 [HTML]F8D7DA error
Table 4. Results of running the seven K rules through the prover over a mutant which removes the uniqueness check on the signatures.
This work was funded by ConsenSys R&D. Thanks to Mario Alvarez, Joseph Chow, Robert Drost, Christian Lundkvist, and Valentin Wüstholz for their thoughtful feedback throughout the project. Thanks to Grigore Rosu for his support of this work, his commitment to open research and generously open-sourcing and liberally licensing the K Framework. Thanks to Denis Bogdanas, Dwight Guth, Everett Hildenbrandt, Daejun Park, and Yi Zhang for answering my technical questions about the K prover and devising the additional lemmas needed to push some of the proofs through.


  • M. Barnett, B. E. Chang, R. DeLine, B. Jacobs, and K. R. M. Leino (2005) Boogie: A modular reusable verifier for object-oriented programs. In Formal Methods for Components and Objects, 4th International Symposium, FMCO 2005, Amsterdam, The Netherlands, November 1-4, 2005, Revised Lectures, pp. 364–387. Cited by: §3.3.3.
  • K. Bhargavan, A. Delignat-Lavaud, C. Fournet, A. Gollamudi, G. Gonthier, N. Kobeissi, N. Kulatova, A. Rastogi, T. Sibut-Pinote, N. Swamy, and S. Zanella-Béguelin (2016) Formal verification of smart contracts: short paper. In Proceedings of the 2016 ACM Workshop on Programming Languages and Analysis for Security, PLAS ’16, New York, NY, USA, pp. 91–96. External Links: ISBN 978-1-4503-4574-3, Link, Document Cited by: §3.3.3.
  • D. Bogdanas and G. Rosu (2015) K-java: A complete semantics of java. In Proceedings of the 42nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2015, Mumbai, India, January 15-17, 2015, pp. 445–456. External Links: Link, Document Cited by: §2.
  • DappHub (2018) K lab proof explorer. External Links: Link Cited by: §3.6.
  • S. Dasgupta, D. Park, T. Kasampalis, V. S. Adve, and G. Rosu (2019) A complete formal semantics of x86-64 user-level instruction set architecture. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2019, Phoenix, AZ, USA, June 22-26, 2019., pp. 1133–1148. External Links: Link, Document Cited by: §2.
  • L. de Moura and N. Bjørner (2008) Z3: an efficient smt solver. In Tools and Algorithms for the Construction and Analysis of Systems, C. R. Ramakrishnan and J. Rehof (Eds.), Berlin, Heidelberg, pp. 337–340. External Links: ISBN 978-3-540-78800-3 Cited by: item 5.
  • E. Foundation (2019a) Solidity release page. External Links: Link Cited by: item 2.
  • E. Foundation (2019b) Yul. External Links: Link Cited by: §3.3.3.
  • S. Grossman, I. Abraham, G. Golan-Gueta, Y. Michalevsky, N. Rinetzky, M. Sagiv, and Y. Zohar (2018) Online detection of effectively callback free objects with applications to smart contracts. PACMPL 2 (POPL), pp. 48:1–48:28. External Links: Link, Document Cited by: 7th item.
  • Á. Hajdu and D. Jovanovic (2019) Solc-verify: A modular verifier for solidity smart contracts. CoRR abs/1907.04262. External Links: Link, 1907.04262 Cited by: §3.3.3.
  • C. Hathhorn, C. Ellison, and G. Roşu (2015) Defining the undefinedness of c. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’15), pp. 336–345. External Links: Document Cited by: §2.
  • E. Hildenbrandt, M. Saxena, N. Rodrigues, X. Zhu, P. Daian, D. Guth, B. M. Moore, D. Park, Y. Zhang, A. Stefanescu, and G. Rosu (2018) KEVM: A complete formal semantics of the ethereum virtual machine. In 31st IEEE Computer Security Foundations Symposium, CSF 2018, Oxford, United Kingdom, July 9-12, 2018, pp. 204–217. External Links: Link, Document Cited by: §2, §5.
  • Y. Jia and M. Harman (2011) An analysis and survey of the development of mutation testing. IEEE Trans. Software Eng. 37 (5), pp. 649–678. External Links: Link, Document Cited by: §7.
  • S. K. Lahiri, S. Chen, Y. Wang, and I. Dillig (2018) Formal specification and verification of smart contracts for azure blockchain. CoRR abs/1812.08829. External Links: Link, 1812.08829 Cited by: §3.3.3.
  • C. Lattner and V. S. Adve (2004) LLVM: a compilation framework for lifelong program analysis & transformation.. pp. 75–88. Cited by: §3.3.3.
  • C. Lundqvist (2015) SimpleMultiSig.sol. External Links: Link Cited by: Listing 1, §6.
  • C. Lundqvist (2017) Exploring simpler ethereum multisig contracts. External Links: Link Cited by: §6.
  • G. C. Necula (1997) Proof-carrying code. In Proceedings of the 24th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL ’97, New York, NY, USA, pp. 106–119. External Links: ISBN 0-89791-853-3 Cited by: §3.5.
  • D. Park, A. Ştefănescu, and G. Roşu (2015) KJS: a complete formal semantics of JavaScript. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’15), pp. 346–356. External Links: Document Cited by: §2.
  • H. G. Rice (1953) Classes of recursively enumerable sets and their decision problems. Transactions of the American Mathematical Society 74, pp. 358–366. External Links: Document Cited by: §3.2.
  • G. Roşu and T. F. Şerbănuţă (2010) An overview of the K semantic framework. Journal of Logic and Algebraic Programming 79 (6), pp. 397–434. External Links: Document Cited by: item 1, §2.
  • G. Roşu (2017) Matching logic. Logical Methods in Computer Science 13 (4), pp. 1–61. External Links: Document Cited by: §4.
  • A. M. Turing (1937) On computable numbers, with an application to the entscheidungsproblem. Proceedings of the London Mathematical Society s2-42 (1), pp. 230–265. External Links: Document, Link, Cited by: §3.2.
  • R. Verification (2019) Formal verification report for the gnosissafe. External Links: Link Cited by: §5.4.
  • [25] G. Wood ETHEREUM: a secure decentralised generalised transaction ledger eip-150 revision. External Links: Link Cited by: §2.