The goal of this paper is to help mainstream programmers routinely use formal verification on their smart contracts. It attempts to achieve this by:
demonstrating how a programmer can incrementally build up a formal specification for their program without advanced training and without needing to understand the internals of the formal verification prover (Sections 5),
showing how formal specifications can be tested by constructing mutant variations of the program. (Section 7).
Section 2 gives an overview of the K Framework. Section 3 discusses the major usability challenges that state-of-the-art formal verification systems share that have inhibited their mainstream adoption. Section 6 describes the partial formal specification written for the
SimpleMultiSig, a real Ethereum multisig smart contract.
2. K Framework
The K Framework (Roşu and Şerbănuţă, 2010) is a disruptive system in programming languages that seeks to enable the design and implementation of programming tools (such as compilers, virtual machines, deductive verifiers, and others) independently of a particular programming language. Its vision is that, once the syntax and operational semantics of a programming language is specified in K, the K Framework can automatically generate programming tools in a correct-by-construction manner:
, and others. The K Framework itself and these individual language semantics are open-sourced under the UIUC License, which is permissive and free. One of the distinctive advantages of a K semantics for a programming language is that it isexecutable in the sense that the K Framework can immediately generate an interpreter for the language directly from the K semantics. This interpreter can be used to execute any program written in that language which enables one to actually write and run test cases on the semantics. In contrast, for example, the EVM Yellow Paper (Wood, ) is an English language semantics that is not executable and has been found to be unclear, under-specified, and, in exceptional cases, inconsistent with actual EVM implementations (Hildenbrandt et al., 2018). The Ethereum Foundation has been considering adopting the K-EVM semantics as the official semantics for the EVM.
Even at this early stage of the area, many new programming languages have already been proposed, developed or adopted to support decentralization. Some examples are Move, Mokoto, Solidity, Vyper, Serpent, and WebAssembly. It is impractical and wasteful to repeatedly build a new formal verification system for each language. Furthermore, programming languages are constantly evolving and continually needing to keep a formal verification system up-to-date with the latest versions is expensive. The power of the K Framework is that formal verification systems can be developed independently of a particular programming language. Once the semantics of the language is formally defined in K, a K-based prover can immediately be used to formally verify properties of programs written in that language.
This section discusses some of the usability issues that state-of-the-art formal verification system typically have.
3.1. User Interactions
A user typically interacts with a formal verification system in several different ways:
The user writes a formal specification of the program in a declarative, logic-based specification language that the prover must be designed to handle.
The user writes formal summaries that are attached to difficult-to-reason-about blocks of the program, such as a particular function or loop. A formal summary is typically written in the same logic language as the actual formal specification and either partially or fully encodes the block’s behavior or gives the prover hints about how to reason about the block.
The user reads the final output of the prover which ideally is a formal proof that can be mechanically checked by a proof checker that is orders of magnitude simpler and smaller than the prover itself. Unfortunately, most systems simply output either ”yes, the program adheres to the spec” or ”no, the program does not adhere to the spec”, which introduces trust issues discussed in Section 3.5.
The user reads the auxiliary output of the prover that may give the user some information about why the prover is not able to prove the spec within a reasonable time period. This auxiliary output typically requires deep understanding of the internals of the prover’s implementation as well as its underlying mathematical foundations.
The undecidability of the halting problem (Turing, 1937) has a direct impact on the usability of formal verification systems. Rice’s theorem (Rice, 1953), an immediate consequence of the halting problem, states that all non-trivial, semantic properties of programs are undecidable. Thus, it is impossible for a prover to be fully automatic across all programs, for any interesting semantic property. This limitation implies that a user will inevitably need to interact with the prover, in some manner beyond simply writing the formal specification, because the prover cannot, in general, be fully automatic and will need the user’s help to push the proof through.
For example, provers requires that the user supplies a formal invariant for each dynamically-bound loop in a program, and these invariants are difficult for non-experts to write. For example, see the
encodepacked_keccak00 (§ 5.6) and
encodepacked_keccak01 (§ 5.6) programs, the
ecrecoverloop01 (§ 5.9) program, and the
storage02 (§ 5.12) program.
3.3. Language Level and Compilation Stage
A practical formal verification system will make many design decision in its implementation, but one of the most impactful is choosing the programming language and compilation stage the prover will operate at. Systems will typically take one of the following approaches:
3.3.1. Higher Language Level / Early Compilation Stage
With this approach, the prover is designed to work directly on the program in its original form, in the source language in which it was written, before any compilation step and without any significant source code transformations. This approach has several disadvantages:
High-level languages such as Solidity or C++ are enormous, and it would require immense engineering resources to model every language construct. For example, the Solidity language includes inline assembly which means that a prover that operates at the Solidity level would need to be engineered to reason about every inline assembly instruction as well the rest of the high level Solidity language.
The prover would only work for the chosen high-level programming language and would need to be actively maintained and updated for every new version of the language, which is costly especially for a language like Solidity which is constantly being iterated on (Foundation, 2019a).
The prover’s conclusion about the correctness of the program would only apply to the high-level program and not the final binary produced by compiler which is what actually gets executed. The compiler could have a bug, for example in its optimization passes, which incorrectly changes the semantics of the program in its binary form.
On the other hand, a very compelling advantage of this approach, which perhaps compensates for the disadvantages listed above, is that because the prover works at the language level at which the program was written, it makes it easier for the user to interact with the prover, make sense of its behavior, and understand its auxiliary output (Section 3.1). As explained in Section 3.2, the user will inevitably need to interact closely with a prover because of the theoretical limits of fully automated formal verification.
3.3.2. Lower Language Level / Late Compilation Stage
With this approach, the prover is designed to work after the last compilation stage, directly on the binary produced by the compiler, for example on optimized EVM bytecode or x86 instructions. This approach as several major advantages which somewhat mirror the disadvantages of operating at a higher-level:
Lower level languages such as EVM bytecode and x86 will reduce the richness of higher level language constructs to a set of basic word-level operations. For example, in Solidity, the
mappingdata structure is compiled down to simple key-value pairs in an account’s storage. This reduction drastically simplifies the engineering of the prover.
The prover’s conclusion about the correctness of the program applies to the program in its executable form, which means the user does not need to trust the compiler.
The prover can be applied to programs originally written in any high-level language that compiles down to the same bytecode or binary language. For example, a prover that works on EVM bytecode could be used for programs originally written in both Solidity and Vyper by compiling it down to EVM bytecode first.
However, a critical disadvantage of this approach is that the user experience of interacting with a prover that operates at the bytecode level is prohibitively tedious and time-consuming. Some examples of this phenomenon are discussed in Section 5 in the context of proving properties of the
SimpleMultiSig smart contract at the EVM bytecode level.
3.3.3. Intermediate Language Level / Mid Compilation Stage
A common approach is to design the prover to operate on the intermediate language of a standard compiler infrastructure, such LLVM (Lattner and Adve, 2004) or yul (Foundation, 2019b). An intermediate language reduces the higher level language to a more manageable set of operations and data structures, but also retains some of the important high-level constructs that make formal verification easier such as structured control flow, functions and some type information. This approach is a compromise between strictly operating the prover at either a higher-level or lower-level language, which mitigates the impact of both their advantages and disadvantages. For example:
LLVM bitcode is much more readable than x86 assembly but still much less readable that C++. Readability is important because it enables the user to interact with the prover more effectively.
Many higher-level languages compile down to LLVM bitcode which enables the prover to work on programs originally written in any of those languages, but now the user has to trust the portion of the compiler infrastructure that generates machine code from LLVM bitcode because the prover only operates on the LLVM bitcode.
One notable instance of this approach is to create a new, specialized intermediate programming language specifically designed for formal verification instead of machine-code generation. For example, the solc-verify (Hajdu and Jovanovic, 2019) and Verisol verification tools (Lahiri et al., 2018) translate Solidity programs to Boogie (Barnett et al., 2005), a verification intermediate language. Similar, in (Bhargavan et al., 2016), Solidity programs are translated to F*.
3.4. Formalization of Correctness Properties
The first step to formally verifying a program is informally, yet rigorously, describing what the intended correct behavior of the program is, and then translating this informal description into a formal specification that a mechanized prover can understand. This translation step is critical because the proof generated by the prover assumes that the formal specification is correct in the sense that it faithfully captures the intended correctness properties of the program.
3.4.1. Formal specifications need be simple for humans to read and understand
The premise of formal verification is that formal specifications are much simpler to understand and be convinced of being correct than the program. It is unclear how useful formal verification is when the formal specification is longer or more difficult to read than the program itself.
3.4.2. A partial versus a full formal specification
Users typically do not write formal specifications for all correctness properties of a program because writing and proving formal specifications is time-consuming and difficult. Instead, in practice, users consider the cost and benefit of formally specifying a property. See Section 5.4 for an example of making this practical trade-off.
3.5. Provers are not trustworthy
A prover often consists of hundreds of thousands of lines of highly complex code and thus bugs in the prover are inevitable which could cause the prover to incorrectly claim that a program follows specification when it does not. A technique called proof carrying code (Necula, 1997) has been extensively researched in academia that aims to address this challenge, but has not been used in practice.
3.6. Investigating Why the Prover Fails
The prover may fail to prove a specification for several reasons:
The specification has a bug.
The program has a bug.
The prover has a bug.
The language semantics has a bug.
The underlying constraint solver (e.g Z3 (de Moura and Bjørner, 2008)) has a bug.
An incorrect lemma was supplied.
The prover is not powerful enough to reason automatically about the program and specification.
Trying to figure why the prover fails is tedious, difficult and time-consuming because typically a user would need to understand the internals of the prover implementation as well as its method of deductive reasoning. Conceptually, a prover works by carefully exploring the entire, exponentially-sized state space of the program under all possible inputs. Some tools such as KLab (DappHub, 2018) have been developed to help users navigate this state space, similar to how a debugger can be used to step through the state changes of a program under a single input.
3.7. EVM Bytecode
Some of the design of EVM bytecode makes formal verification tedious, difficult, and time-consuming. For example, the EVM does not have functions, which means that a user will need to specify the actual program counter of the EVM bytecode that the specification applies to. If the user changes the Solidity program and recompiles it to EVM bytecode, the user will need to manually update the program counter in the specification. This manual process is error-prone because if the user gets it wrong, the specification will be incorrect and thus a buggy program may pass through undetected.
The K specification language is based on matching logic (Roşu, 2017) which makes it powerful, expressive, and flexible, but its syntax can be difficult for new users to read. K’s syntax was originally designed to make it easier for formal methods experts to specify complete executable semantics for entire programming languages, and thus, one of its design priorities was optimizing the succinctness of writing individual rules. However, for programmers without experience in formal methods who only want to use K to write correctness specifications for their programs, the syntax can initially seem prohibitively esoteric.
This paper proposes a simple YAML-based format for structuring K specifications called K-YAML that enables users to write easier-to-read specifications for formally verifying programs. YAML was chosen because it is a widely-used, human-readable, system configuration file format that programmers are comfortable with. The key design decision of K-YAML is to be purely syntactic sugar for K and to not abstract away any part of it in order to retain its power and expressivity.
A K-YAML specification is a list of
spec blocks where each block has the following structure:
namekey designates a name for this
specblock that can be referenced elsewhere in the specification.
inheritskey references another
specblock to indicate that this one inherits from it. (Section 5.5)
ifkey is the precondition, which defines the set of initial program states of interest using two components:
where. Conceptually, the
matchcomponent is similar to Rust’s
matchoperator which accepts patterns over terms and variables to describe the structure of data and then matches and binds a value against the structure. Here,
matchis a dictionary whose keys are K configuration cells and whose values are K terms over symbolic variables. The
wherecomponent is a list of conjuncts over K predicates and symbolic variables that have possibly been bound by the
matchcomponent. A program state is in the precondition if it matches the
matchdictionary and satisfies the
thenkey is the postcondition which specifies what the final program states should be in terms of the initial program states defined by the precondition. It uses the same
wheremechanism to specify this set of final program states.
Section 5 gives many examples of small programs and the corresponding correctness specifications in K-YAML.
5. Verification Walkthrough
This section details some of my experience using the K prover to verify a few properties of the
SimpleMultiSig smart contract. I was not involved in the K prover’s design and implementation, and thus, I needed to find a way to use it effectively without understanding its internals. My approach was to start by formally verifying simple programs and then incrementally adding new functionality until I reconstructed the
SimpleMultiSig. This approach enabled me to use the prover as a black box as much as possible, thus minimizing the need to understand its internals. However, I did need to have a thorough understanding of the K-EVM semantics (Hildenbrandt et al., 2018) to make progress.
5.1. Starting Point
As a starting point, I used the simple program below which has a single function named
execute that returns the value 5:
Then I wrote the specification below to try to prove that each transaction to the
execute function always returns the value 5 successfully. More technically stated, the specification ensures that when the calldata of a transaction matches the ABI-encoding of
execute’s function-selector, the value returned is the ABI-encoding of the constant 5 as a 32-byte
uint256, and the status code of the transaction is
Then I compiled the program to EVM bytecode using the Solidity compiler and invoked the K prover on the bytecode and the specification. After a few seconds, the prover returned
True, which indicates that it was able to formally prove that the program passes the specification for all possible transactions, in the context of all possible blockchain states.
5.2. Using function parameters
Next, I took the previous program and modified it by 1) adding a parameter
a0 to the
execute function and 2) returning
a0 instead of the constant 5.
The specification had to be modified as follows:
callDatacell was changed by introducing a fresh symbolic variable
A0that is bound to the first, 32-byte argument of the calldata.
The output cell was changed to return
A0, thus specifying that
executereturns the value of its first, 32-byte argument.
whereconstraint is added to constrain the value of the symbolic variable
A0to be within the
uint256domain, which includes all integers between .
5.3. Static Arrays
Next, I changed the program to accept a static array parameter and return its first element.
Because the KEVM semantics did not have an existing a high-level construct for expressing static array calldata parameters, I needed to add it myself. Fortunately, the ABI encoding of statically-sized data structures is simple: they are straightforwardly flattened into a tuple of 32-byte elements. I defined a new K rule named
#abiCallData2 that behaves exactly like the existing
#abiCallData function except that it enables the user to specify the full function signature instead of only the function name:
#abiCallData2 function enabled me to represent the static array parameter
a as the tuple
(A0, A1, A2) where
A2 are 32-byte, fresh symbolic variables:
This is a good example of the flexibility and extensibility of K compared to other verification systems: extensions can be incorporated without needing to make changes to the underlying prover. The new K rule above gave the abstraction needed to succinctly write the spec.
5.4. Dynamic byte arrays
This next program accepts a dynamically-sized,
bytes-typed parameter named
data and return its length:
The specification follows the same pattern as the previous specifications but uses the KEVM
#buf construct to bind dynamically-sized calldata:
#buf(DATA_LEN,DATA) expression is a symbolic byte buffer that introduces two fresh symbolic variables to bind the
data’s length and
DATA represents its contents.
where constraint in the specification which bounds the length of
DATA_LEN to be between and . This constraint is needed because the EVM imposes a upper limit on the length of a
bytes array since it must fit into memory, which is 32-byte addressable. However, this upper bound is actually not tight enough to make the specification pass because the EVM will need to store other data in memory besides the
bytes array. I could not figure out a quick way to calculate the tightest possible upper bound on
DATA_LEN, so I did what was done for other K verification projects such as the GnosisSafe (Verification, 2019), which was to arbitrarily constrain
DATA_LEN to .
Thus, technically-speaking, the correctness guarantees given by the formal proof would not apply to transactions whose data parameter is longer than bytes. Practically speaking, it is unclear whether it would be worth the effort to figure out the tightest upper bound. This situation is an example of the common trade off between the precision and cost of partial versus full formal verification, as discussed in Section 3.4.2.
This is a good example of three usability challenges: automation (Section 3.2), low-level specifications (Section 3.3.3), and prover debugging (Section 3.6). The user needs to understand the EVM at a low level to write a passing specification, and the actual tight bound to make this specification pass is hard for a user to calculate. Furthermore, the user’s first attempt at writing this specification would likely be to erroneously constrain
DATA_LEN to and when the prover fails to prove this, it is very non-obvious why and the output of the K prover does not help because it only gives the set of all intermediate proof states. It is possible that, if the prover implementation was able to generate a counter-example, that is, a concrete transaction that demonstrates the program violating the specification, the user may have a better chance of figuring out the reason for the specification failing.
5.5. Transaction Reverts
This next program uses Solidity’s
require statement which reverts the transaction when the
a0 > 0 condition does not hold:
The specification now needs to cover more than one program path: a success path when and revert path when . Thus it has two
spec blocks, one for each path:
a0gt0 specifies the case: the status code of the execute function is
EVMC_SUCCESS and the output is the constant 5. The second
a0le0 specifies the case: the status code of the execute function is
EVMC_REVERT and the output is unspecified.
The specification can be slightly rewritten so that common parts of the two
spec blocks can be shared, instead of duplicated, in a parent
spec by using the
The rewritten specification above introduces a third block named
base that lifts the common parts, and the
a0le0 blocks now
base block. While it may not seem that this refactoring makes the specification more concise in this particular example, when specifying larger programs, this feature will be necessary to keep the specification readable as can be seen in final
SimpleMultiSigT3 specification in Section 6.3.
5.6. Hashing packed encoding
This next program calculates and returns the keccak-256 cryptographic hash of a
bytes array parameter prefixed with the single byte of 0x1:
The specification uses the KEVM semantics function named
keccak256 that abstracts the keccak-256 cryptographic hash function as an uninterpreted function that assumes all the hashed values appearing in each execution trace are collision-free:
This abstraction of keccak-256 is the typical approach taken by practical formal verification systems, where complex library functions are abstracted by writing a formal specification for their behavior called a formal summary, and then applying the formal summary at call sites invoking the library function instead of directly verifying the library function’s code. This method of abstraction enables the user to decompose the verification task into two steps: first formally verifying the program that uses the library function, and then separately verifying the implementation of the library function. See Section 5.8 for another example of how this formal summary technique is used for programs that verify signatures.
Now, when I tried to run the prover, I found that it failed to prove the specification even though the specification is correct. It turned out that the reason is that the actual EVM bytecode that this Solidity program compiles down to has a loop that is essentially an inlined ”memcpy” for copying calldata to local memory. As discussed in Section 3.2, in general, provers will not be able to reason about loops with dynamic bounds automatically unless the user supplies superfluous information such as a loop invariant.
The figure below illustrates this memcpy loop in the control flow graph of the EVM bytecode of the
I could have spent time writing a loop invariant but fortunately, in this case, there’s a simpler approach which was to update to the 0.5.0 version of the Solidity compiler instead of 0.4.24 because 0.5.0 has optimizations that are able to optimize away the loop entirely. The following program, which has been changed to use
pragma solidity 0.5.0, has an acyclic control flow graph and the prover is able verify it.
This program is a good example of several usability challenges that formal verification systems typically exhibit: automation, low-level specifications, and prover debugging. It would be difficult for a user to figure out that the compiled EVM bytecode has a hidden loop that does not appear in the original Solidity source program. If the prover was fully automated, this hidden loop would not be an issue because the prover would just be able to prove the specification without user intervention. But, because it cannot be fully automated due to undecidability, the user needs to supply the prover with a loop invariant in terms of the compiled EVM bytecode, which means the user will have to decompile the bytecode, identify the exact instruction counters of the loop entry/exit and make sense of the difficult-to-follow stack-based push/pop/dup instructions of the EVM.
5.7. Statically-sized loops
This next program has a loop that iterates three times after an initial
Note that statically-bound loops, such as this one, are much easier for a prover to reason about automatically than a dynamically-bound loop. In principle, a prover can automatically and fully unroll a statically-sized loop into straight-line code. In contrast, dynamically-bound loops require that the user supply a loop-invariant.
The specification below checks that, if the function parameter is an integer within [0, 10), the transaction will return the value of with an
EVMC_SUCCESS status code. Otherwise, it will revert with a
EVMC_REVERT status code:
5.8. Recovering an address from a signature
This next program accepts a 32-byte message and the (V,R,S) components of an ECDSA signature. It uses the Solidity
ecrecover function to verify the signature on the message and recover the signer address, and otherwise reverts:
The specification below has three blocks: 1) the
sigvalid block specifies that the behavior of the program on a transaction with a valid signature is to return the signer address, 2) the
siginvalid block specifies that it should revert if the signature is not valid, and 3) the
base block is a parent of the other two blocks that keeps the common components.
The KEVM semantics defines two functions named
#ecrecEmpty that abstract the
ecrecover precompile as an uninterpreted function, similar to how the keccak-256 cryptographic hash function is abstracted (Section 5.6).
5.9. Recovering a static array of signatures
This next program uses a static loop (Section 5.7) to iterate over a static array (Section 5.3) of signatures (Section 5.8) and reverts (Section 5.5) if any of them do not verify. Otherwise, the function returns successfully. This program uses four different constructs that were previously verified independently of each other in earlier sections, but now they are combined into a single program. This is a good example of how a user can incrementally build up the specification of their program without needing to understand the internals of how the prover works.
The specification has four blocks. The
base block serves as the prefix of the three other blocks and specifies the behavior related to function entry such calldata. The
sigs-valid block specifies the success case where all signatures can be verified. The
sigs1-invalid blocks specify the cases where the first and second signatures, respectively, cannot be verified:
The prover was able to prove the specification above. But, then, I made a slight modification to the
ecrecoverloop00 program by only adding a new
bytes-typed parameter named
data, which is not used at all in the body of the Solidity function:
To clarify, the reason I added this dead parameter is that my goal is to eventually reconstruct the
SimpleMultiSig program which accepts such a
bytes parameter in the same function that verifies signatures using a loop.
Unintuitively, simply adding this dead
data parameter causes the prover to timeout on the
sig1-invalid specification, when using a time limit of 30 minutes. In contrast, without this parameter, the prover returns successfully within three minutes. It turned out that, in order to fix this issue, I needed to add the additional, superfluous
not: #ecrecEmpty(ECREC_DATA0) on line 6 to the
This additional constraint is superfluous in the logical sense because it does not change the specification’s logical meaning since is equivalent to . Conceptually, adding this additional conjunct helps the prover succeed because it directly captures the behavior of the for-loop. There may exist multiple invalid signatures, but the for-loop terminates when it reaches the very first invalid signature, so the specification needed to faithfully capture this behavior.
5.10. Reading from storage
This next program reads the storage variable
n and returns its value:
The corresponding specification below shows how the KEVM semantics abstracts storage as an integer map represented by fresh symbolic variable
S and uses the KEVM
select function to calculate the offset per Solidity’s interpretation of the EVM storage:
5.11. Writing to storage
This next program writes the constant 5 to the storage variable
n and then returns its value:
The EVM will give a refund of ether to the caller if a transaction writes the value of 0 to a storage location, which enables the EVM to stop explicitly tracking that storage location on-chain. The rationale of this refund mechanism is to incentivize users to free up storage they are not using anymore. For this specific program, however, there is never a refund because storage is always written with the value 5, not 0. Thus, the specification below has the
refund cell set to 0 indicating a caller will never receive a refund under any transaction.
The prover should have been able prove the specification above, however, it was not able to do. Interestingly, the prover constructed a symbolic expression that had a value of zero, but it was unable to simplify this expression to zero on its own. To fix this, I needed to add a new lemma that helped the prover understand that the expression does in fact simplify to zero:
5.12. Writing overflow to storage
This next program increments the storage variable
n which will overflow when
n is :
The specification needs to handle the non-overflow and overflow cases separately because the
+Int operator used by the KEVM semantics is integer addition, rather than modular addition:
The prover was able to prove the
no-overflow specification automatically without any user intervention, but not the
overflow specification. The reason was that
output cell of the final proof state had the term
which represents a 32-byte buffer whose value is one plus the zeroth storage slot under modular addition. The final proof state also had the constraint
which constrains the zeroth storage slot to . Ideally, the prover should have been able to simplify
chop(select(S0, 0) + 1) to the value 0 on its own, but I needed to add the following lemma which simply states that applying the
chop function to should simplify to 0:
5.13. Calling an address
This next program invokes the
destination parameter with a call payload passed by the caller:
The specification checks the following correctness properties:
destinationis invoked at least once if
destinationis not invoked if
destinationis invoked at most once if
destinationis invoked with the correct value, gas limit, return start/length
destinationis invoked with correct data start/length
execute reverts if call to
destinationreturns an error code execute succeeds if call to
destinationreturns a success code
No storage reads or writes after the call to
To check these properties, I needed to extend the KEVM semantics by adding a new cell in the KEVM configuration named
callLog to keep track of a list of all the call invocations during transaction execution. Each element in the list is a tuple representing callsite information including the call index, program counter, gas limit, memory offset and size of the parameters and return value. I also added two new cells named
writeLog that keep track of all the storage reads and writes to check that none occur after call instructions. The details are omitted here due to space limitations.
Multisig wallets are compelling targets for formal verification because they control high-value assets. The
SimpleMultiSig smart contract (Lundqvist, 2015, 2017) is a minimal multisig wallet written in Solidity for the EVM. Its code is included in the appendix under Listing 1.
6.1. Correctness Properties
This section aims to give a complete, but informal, list of high-level correctness properties that a
SimpleMultiSig implementation should have.
If the transaction does not include a call payload, then the transaction is rejected and has no effect on the Ethereum state.
The threshold and set of owners can never be changed once the program is deployed to an account.
The program is not susceptible to replay attacks. Once a transaction’s call payload is executed successfully on-chain, the transaction, or a part thereof, can never be used to have any further effect on the Ethereum state.
If the transaction does not have at least a threshold number of owner signatures that each sign the transaction’s call payload, the transaction has no effect on the Ethereum state.
If the transaction does have at least a threshold number of owner signatures that each sign the transaction’s call payload, and the call payload is invoked exactly once intraprocedurally.
If the transaction’s call payload is invoked but does not execute successfully, the transaction has no effect on the Ethereum state.
The program is effectively callback free (Grossman et al., 2018) and thus has no re-entrancy vulnerabilities.
Ether is never locked in the program.
The program has no other functionality.
I made two changes to the SimpleMultiSig implementation to make it easier to formally verify using KEVM and called this new version
SimpleMultiSigT3 as shown in Listing 2 in the appendix.
6.2.1. Modification #1: Statically-sized Arrays
SimpleMultiSig uses dynamically-sized array parameters in the
execute function to pass the components of the signatures. In principle, KEVM can be used to reason about dynamic arrays, however, to automate this reasoning, KEVM would need to be extended with additional lemmas, which are difficult for a non-expert to write correctly. Thus, I modified the code by replacing these dynamically-sized array parameters with statically-sized arrays instead, which means that the program would only work for a specific threshold. For example, Listing 2 in the appendix shows the program would change when the threshold is statically set to 3.
Fortunately, this modification does not restrict the functionality of the
SimpleMultiSig because the threshold is immutable anyway once the constructor initializes the contract account. However, it does mean that users would need to code-generate the program for the specific threshold that they needed. Incidentally, using statically-sized arrays also has the advantage of reducing gas costs by:
reducing transaction size since the ABI encoding of the static arrays is shorter,
reducing overhead of copying the array calldata into memory,
eliminating the two
requirechecks on the lengths of the signature parameters since the lengths are statically-enforced,
eliminating memory reads on the signature array-length checks in signature validation loop, and
reducing the size of the bytecode
To simplify the presentation in this paper, I use the specific code-generation of the SimpleMultiSig, named to
SimpleMultiSigT3, where the threshold is statically fixed to three, but the discussion and the specification can be straightforwardly adapted to any threshold.
6.2.2. Modification #2: Solidity version
I also replaced the directive
pragma solidity ^0.4.24 with
pragma solidity 0.5.0 to restrict the scope of the verification to EVM bytecode generated by a Solidity 0.5.0 compiler. The Solidity 0.4.24 compiler is missing some code optimizations that the 0.5.0 version introduces. These optimizations generate significantly simpler bytecode which was easier for KEVM to reason about automatically as discussed in Section 5.6.
6.3. Partial Formal Specification of SimpleMultiSigT3
The partial formal specification written for
SimpleMultiSigT3 is given in Listing 5 in the Appendix. The specification consists of seven individual K rules informally described below.
executor-invalidrule checks that the
executefunction reverts if the
executorargument is not 0 and is not equal to the
msg.senderof the transaction.
sigcheck-fail-revert-0rule checks that the
executefunction reverts if the first signature is not valid for the EIP712 encoding of the multisig payload, which includes the
dataarguments, and the
sigcheck-fail-revert-1rule is the same as the previous rule but for the second signature.
sigcheck-fail-revert-2rule is the same as the previous rule but for the third signature.
ownercheck-fail-revertrule checks that the
executefunction reverts if any of the three addresses recovered from the signatures are not owners. And, the function reverts if the three addresses are not in a strictly increasing order. In other words, the first address must be numerically less than the second address, and the second address must be numerically less than the third address. This requirement ensures the addresses are unique.
call-failurerule checks that the
executefunction reverts if the call to the
destinationargument returns 0, which indicates the call reverted, for example, due to running out of gas or other reasons.
call-successrule checks that the
executefunction succeeds if 1) the
executorargument is either 0 or is equal to
msg.sender, 2) the three signatures are all valid and each recover to unique owners of the wallet, 3) the call invokes the
destinationexactly once and with the correct multisig payload, 4) the call returns 1, which indicates that it succeeded, and 5) no storage variables are modified after the call.
Note that these rules do not comprehensively cover all the correctness properties listed in Section 6.1. For example, they do not check the following aspects of the SimpleMultiSig, among others, which we leave for future work:
storage initialization by constructor
ether balance updates
bounds on gas usage
dynamic array calldata
Furthermore, a full formal verification would also require informally, yet rigorously, arguing that the formal specification faithfully captures the correctness properties in Section 6.1.
The table below gives the results of running the seven K rules through the prover over the
SimpleMultiSigT3 program using a c5.2xlarge AWS EC2 instance, which has 16GB of RAM and a 3.4 GHz Intel Xeon Platinum 8000 processor.
||2.2||[HTML]D4EDDA proved true|
||7.1||[HTML]D4EDDA proved true|
||11.2||[HTML]D4EDDA proved true|
||18.9||[HTML]D4EDDA proved true|
||44.2||[HTML]D4EDDA proved true|
||45.9||[HTML]D4EDDA proved true|
||48.6||[HTML]D4EDDA proved true|
7. Testing Specifications
The output of the prover is a simple ”yes” or ”no”: was it, or was it not, able to prove the specification? This meant that I had to trust that the prover itself did not have bugs, but, as is usually the case with formal verification provers, their implementations are large, extremely complex, and depend on sophisticated mathematics and algorithms which could easily have been coded incorrectly (Section 3.5). Because the stakes of a bug in the
SimpleMultiSig is so high, I had to find a way to at least spot check the prover’s work.
Similarly, I also had to trust that the formal specification I wrote itself did not have any bugs, which is very possible considering how long it is and how unusual specification languages are to a programmer.
To try to mitigate both of these trust issues, I created thirty-two faulty variations of the
SimpleMultiSigT3 program. In the software testing literature, these faulty variations are called mutants (Jia and Harman, 2011) and are constructed by taking the original program and changing it slightly to add a bug. Traditionally, mutants have been used to evaluate the effectiveness of manual test suites, but here I use the same idea to spot check the specification and prover. I ran the prover with the formal specification in Section 6.3 over each of the mutants and then checked to make sure that the prover failed to prove the specification on each mutant. The remainder of this section discusses a few examples of the different types of mutants used, and the full list of the test results can be found at
7.1. Example Test Case: Call Mutation
call_7.sol mutant adds a bug to
SimpleMultiSigT3 by adding a second call to the
Table 2 below gives the results of running the prover on this mutant. This test case successfully passes because at least one of the K rules fail, specifically, the
call-failure and the
||2.0||[HTML]D4EDDA proved true|
||6.6||[HTML]D4EDDA proved true|
||11.3||[HTML]D4EDDA proved true|
||19.2||[HTML]D4EDDA proved true|
||42.5||[HTML]D4EDDA proved true|
7.2. Example Test Case: EIP712 Encoding Mutation
eip712_0.sol mutant adds a bug to the EIP712 encoding of the multisig payload by changing the first argument of the
abi.encoding function invocation to 0 instead of
Table 3 below gives the results of running the K prover on this mutant. All except one of the specifications time out, which means that the prover process failed to exit within the three-hour time limit. I chose three hours because, for the correct
SimpleMultiSigT3.sol program, the prover exits in under an hour for each of the specifications. I make the assumption that if the prover takes at least three times longer than it would on the correct program, then it will not be able to prove the specification on the incorrect program even if it was run for longer than three hours.
||2.2||[HTML]D4EDDA proved true|
7.3. Example Test Case: Signature Checking Mutation
sigcheck_5.sol mutant adds a bug to
SimpleMultiSigT3 by removing the
requires check which ensures the signatures are unique by enforcing that they are passed to the function in order of the strictly increasing value of their recovered addresses:
Table 4 below gives the results of running the prover on this mutant. This test case successfully passes because at least one of the K rules fail, specifically, all of them except for the
||2.1||[HTML]D4EDDA proved true|
Acknowledgements.This work was funded by ConsenSys R&D. Thanks to Mario Alvarez, Joseph Chow, Robert Drost, Christian Lundkvist, and Valentin Wüstholz for their thoughtful feedback throughout the project. Thanks to Grigore Rosu for his support of this work, his commitment to open research and generously open-sourcing and liberally licensing the K Framework. Thanks to Denis Bogdanas, Dwight Guth, Everett Hildenbrandt, Daejun Park, and Yi Zhang for answering my technical questions about the K prover and devising the additional lemmas needed to push some of the proofs through.
- Boogie: A modular reusable verifier for object-oriented programs. In Formal Methods for Components and Objects, 4th International Symposium, FMCO 2005, Amsterdam, The Netherlands, November 1-4, 2005, Revised Lectures, pp. 364–387. Cited by: §3.3.3.
- Formal verification of smart contracts: short paper. In Proceedings of the 2016 ACM Workshop on Programming Languages and Analysis for Security, PLAS ’16, New York, NY, USA, pp. 91–96. External Links: Cited by: §3.3.3.
- K-java: A complete semantics of java. In Proceedings of the 42nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2015, Mumbai, India, January 15-17, 2015, pp. 445–456. External Links: Cited by: §2.
- K lab proof explorer. External Links: Cited by: §3.6.
- A complete formal semantics of x86-64 user-level instruction set architecture. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2019, Phoenix, AZ, USA, June 22-26, 2019., pp. 1133–1148. External Links: Cited by: §2.
- Z3: an efficient smt solver. In Tools and Algorithms for the Construction and Analysis of Systems, C. R. Ramakrishnan and J. Rehof (Eds.), Berlin, Heidelberg, pp. 337–340. External Links: Cited by: item 5.
- Solidity release page. External Links: Cited by: item 2.
- Yul. External Links: Cited by: §3.3.3.
- Online detection of effectively callback free objects with applications to smart contracts. PACMPL 2 (POPL), pp. 48:1–48:28. External Links: Cited by: 7th item.
- Solc-verify: A modular verifier for solidity smart contracts. CoRR abs/1907.04262. External Links: Cited by: §3.3.3.
- Defining the undefinedness of c. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’15), pp. 336–345. External Links: Cited by: §2.
- KEVM: A complete formal semantics of the ethereum virtual machine. In 31st IEEE Computer Security Foundations Symposium, CSF 2018, Oxford, United Kingdom, July 9-12, 2018, pp. 204–217. External Links: Cited by: §2, §5.
- An analysis and survey of the development of mutation testing. IEEE Trans. Software Eng. 37 (5), pp. 649–678. External Links: Cited by: §7.
- Formal specification and verification of smart contracts for azure blockchain. CoRR abs/1812.08829. External Links: Cited by: §3.3.3.
- LLVM: a compilation framework for lifelong program analysis & transformation.. pp. 75–88. Cited by: §3.3.3.
- SimpleMultiSig.sol. External Links: Cited by: Listing 1, §6.
- Exploring simpler ethereum multisig contracts. External Links: Cited by: §6.
- Proof-carrying code. In Proceedings of the 24th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL ’97, New York, NY, USA, pp. 106–119. External Links: Cited by: §3.5.
- Classes of recursively enumerable sets and their decision problems. Transactions of the American Mathematical Society 74, pp. 358–366. External Links: Cited by: §3.2.
- An overview of the K semantic framework. Journal of Logic and Algebraic Programming 79 (6), pp. 397–434. External Links: Cited by: item 1, §2.
- Matching logic. Logical Methods in Computer Science 13 (4), pp. 1–61. External Links: Cited by: §4.
- On computable numbers, with an application to the entscheidungsproblem. Proceedings of the London Mathematical Society s2-42 (1), pp. 230–265. External Links: Cited by: §3.2.
- Formal verification report for the gnosissafe. External Links: Cited by: §5.4.
-  ETHEREUM: a secure decentralised generalised transaction ledger eip-150 revision. External Links: Cited by: §2.