VeriSmart: A Highly Precise Safety Verifier for Ethereum Smart Contracts

08/29/2019 ∙ by Sunbeom So, et al. ∙ Korea University 0

We present VeriSmart, a highly precise verifier for ensuring arithmetic safety of Ethereum smart contracts. Writing safe smart contracts without unintended behavior is critically important because smart contracts are immutable and even a single flaw can cause huge financial damage. In particular, ensuring that arithmetic operations are safe is one of the most important and common security concerns of Ethereum smart contracts nowadays. In response, several safety analyzers have been proposed over the past few years, but state-of-the-art is still unsatisfactory; no existing tools achieve high precision and recall at the same time, inherently limited to producing annoying false alarms or missing critical bugs. By contrast, VeriSmart aims for an uncompromising analyzer that performs exhaustive verification without compromising precision or scalability, thereby greatly reducing the burden of manually checking undiscovered or incorrectly-reported issues. To achieve this goal, we present a new domain-specific algorithm for verifying smart contracts, which is able to automatically discover and leverage transaction invariants that are essential for precisely analyzing smart contracts. Evaluation with real-world smart contracts shows that VeriSmart can detect all arithmetic bugs with a negligible number of false alarms, far outperforming existing analyzers.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Safe smart contracts are indispensable for trustworthy blockchain ecosystems. Blockchain is widely recognized as one of the most disruptive technologies and smart contracts lie at the heart of this revolution (e.g., [1, 2]). Smart contracts are computer programs that run on blockchains in order to automatically fulfill agreed obligations between untrusted parties without intermediaries. Unfortunately, despite their potential, smart contracts are more likely to be vulnerable than traditional programs because of their unique characteristics such as openness and immutability [3]. As a result, unsafe smart contracts are prevalent and are increasingly becoming a serious threat to the success of the blockchain technology. For example, recent infamous attacks on the Ethereum blockchain such as the DAO [4] and the Parity Wallet [5] attacks were caused by unsafe smart contracts.

In this paper, we present VeriSmart, a fully automated safety analyzer for verifying Ethereum smart contracts with a particular focus on arithmetic safety. We focus on detecting arithmetic bugs such as integer over/underflows and division-by-zeros because smart contracts typically involve lots of arithmetic operations and they are major sources of security vulnerabilities nowadays. For example, arithmetic over/underflows account for 95.7% (487/509) of CVEs assigned to Ethereum smart contracts, as shown in Table I. Even worse, arithmetic bugs, once exploited, are likely to cause significant but unexpected financial damage (e.g., the integer overflow in the SmartMesh contract [6] explained in Section II). Our goal is to detect all arithmetic bugs before deploying smart contracts on the blockchain.

Arithmetic Bad Access Unsafe Input Others Total
Over/underflow Randomness Control Dependency
487 (95.7 %) 10 (1.9 %) 4 (0.8 %) 4 (0.8 %) 4 (0.8%) 509
TABLE I: Statistics on CVE-reported security vulnerabilities of Ethereum smart contracts (as of May. 31, 2019)

Unlike existing techniques, VeriSmart aims to be a truly practical tool by performing automatic, scalable, exhaustive, yet highly precise verification of smart contracts. Recent years have seen an increased interest in automated tools for analyzing arithmetic safety of smart contracts [7, 8, 9, 10, 11, 12]. However, existing tools are still unsatisfactory. A major weakness of bug-finding approaches (e.g., [7, 9, 8, 10]) is that they are likely to miss fatal bugs (i.e., resulting in false negatives), because they do not consider all the possible behaviors of the program. On the other hand, verification approaches (e.g., [11, 12]) are exhaustive and therefore miss no vulnerabilities, but they typically do so at the expense of precision (i.e., resulting in false positives). In practice, both false negatives and positives burden developers with error-prone and time-consuming process for manually verifying a number of undiscovered issues or incorrectly reported alarms. VeriSmart aims to overcome these shortcomings of existing approaches by being exhaustive yet precise.

To achieve this goal, we present a new verification algorithm for smart contracts. The key feature of the algorithm, which departs significantly from the existing analyzers for smart contracts [7, 8, 9, 10, 11, 12], is to automatically discover domain-specific invariants of smart contracts during the verification process. In particular, our algorithm automates the discovery of transaction invariants, which are distinctive properties of smart contracts that hold under arbitrary interleaving of transactions and enable to analyze smart contracts exhaustively without exploring all program paths separately. A technical challenge is to efficiently discover precise invariants from the huge search space. We propose an effective algorithm tailored for typical smart contracts, which iteratively generates and validates candidate invariants in a feedback loop akin to the CEGIS (counter example-guided inductive synthesis) framework [13, 14, 15]. Our algorithm is general and can be used for analyzing a wide range of safety properties of smart contracts besides arithmetic safety.

Experimental results show that our algorithm is much more effective than existing techniques for analyzing Ethereum smart contracts. We first evaluated the effectiveness of VeriSmart by comparing it with four state-of-the-art bug-finders: Osiris [7], Oyente [9], Mythril [8], and MantiCore [10]. An in-depth study on 60 contracts that have CVE vulnerabilities shows that VeriSmart detects all known vulnerabilities with a negligible false positive rate (0.41%). By contrast, existing bug-finders failed to detect a large amount () of known vulnerabilities with higher false positive rates (). We also compared VeriSmart with two state-of-the-art verifiers, Zeus [11] and SMTChecker [12]. The results show that VeriSmart is significantly more precise than them thanks to its ability to discover transaction invariants of smart contracts automatically.

Contributions

Our contributions are as follows:

  • We present a new verification algorithm for smart contracts (Section III). This is the first CEGIS-style algorithm that leverages transaction invariants automatically during the verification process.

  • We provide VeriSmart, a practical implementation of our algorithm that supports the full Solidity language, the de facto standard programming language for writing Ethereum smart contracts.

  • We provide in-depth evaluation of VeriSmart in comparison with six analyzers [7, 9, 8, 10, 11, 12]. All experimental results are reproducible as we make our tool and data publicly available.222http://prl.korea.ac.kr/verismart

Ii Motivating Examples

In this section, we illustrate central features of VeriSmart with examples. We use three real-world smart contracts to highlight key aspects of VeriSmart that differ from existing analyzers.

Example 1

Figure 1 shows a simplified function from the SmartMesh token contract (CVE-2018-10376). In April 2018, an attacker exploited a vulnerability in the function and succeeded to create an extremely large amount of unauthorized tokens ( USD). This vulnerability, named proxyOverflow, was due to unexpected integer overflow.

The transferProxy function is responsible for transferring a designated amount of tokens (value) from a source address (from) to a destination address (to) while paying transaction fees (fee) to the message sender (msg.sender). The core functionality is implemented at lines 8–10, where the recipients’ balances (balance[to] and balance[msg.sender]) are increased (lines 8 and 9) and the sender’s balance (balance[from]) is decreased by the same amount of the sent tokens at line 10.

1function transferProxy (address from, address to, uint value, uint fee) {
2  if (balance[from] < fee + value) revert();
3
4  if (balance[to] + value < balance[to] ||
5    balance[msg.sender] + fee < balance[msg.sender])
6      revert();
7
8  balance[to] += value;
9  balance[msg.sender] += fee;
10  balance[from] -= value + fee;
11}
Fig. 1: A vulnerable function from SmartMesh (CVE-2018-10376).

Note that the developer is aware of the risks of integer over/underflows and has made effort to avoid them. The conditional statement at line 2 checks whether the sender’s balance (balance[from]) is greater than or equal to the tokens to be sent (fee+value), aiming to prevent integer underflow at line 10. The guard statements at lines 4 and 5 check that the recipients’ balances are valid after the transaction, intending to prevent integer overflows at lines 8 and 9, respectively.

However, the contract still has a loophole at line 2. The expression fee+value inside the conditional statement may cause integer overflow, which enables the token sender to send more money than (s)he has. Suppose all accounts initially have no balances, i.e., balance[from]=0, balance[to]=0, and balance[msg.sender]=0, and the function is invoked with the arguments value=0x8ff...ff and fee=0x700...01, where 256-bit unsigned integer variables (value and fee) are represented in hexadecimal numbers comprised of 64 digits (e.g., value has 63 fs and one 8). Suppose further the two unspecified address values are given as the same but different from the sender’s (i.e., ). These crafted inputs then make the sanity checks at lines 2–6 powerless (i.e., the three conditions at lines 2, 4, and 5 are all false because and ). Therefore, lines 8–10 for token transfer are executed unexpectedly, creating a huge amount of tokens from nothing (i.e., and .

This accident could have been prevented by VeriSmart, as it pinpoints the vulnerability at line 2. Indeed, VeriSmart is an exhaustive verifier, aiming to detect all arithmetic issues in smart contracts. By contrast, inexhaustive bug-finders are likely to miss critical vulnerabilities. For example, among the existing bug-finders [7, 9, 8, 10], only Osiris [7] is able to find the vulnerability. Mythril [8] and Oyente [9] fail to detect the well-known proxyOverflow vulnerability.

1function multipleTransfer(address[] to, uint value) {
2 require(value * to.length > 0);
3 require(balances[msg.sender] >= value * to.length);
4 balances[msg.sender] -= value * to.length;
5 for (uint i = 0; i < to.length; ++i) {
6   balances[to[i]] += value;
7 }
8}
Fig. 2: A vulnerable function from Neo Genesis Token (CVE-2018-14006).

Example 2

Figure 2 shows the multipleTransfer function adapted from the Neo Genesis Token contract (CVE-2018-14006). The function has a similar vulnerability to that of the first example. At line 3 in Figure 2, it prevents the underflow possibility of the token sender’s account but does not protect the overflow of the tokens to be sent (value * to.length), which is analogous to the situation at line 2 of Figure 1. That is, in a similar way, an attacker can send huge amounts of tokens to any users by spending only few tokens [16].

Despite the similarity between vulnerabilities in Example 1 and 2, bug-finders have no guarantees of consistently finding them. For example, Osiris, which succeeded to detect the vulnerability in Example 1, now fails to report the similar bug in Example 2. The other bug-finders are ineffective too; Mythril does not report any issues and Oyente obscurely reports that the entire function body is vulnerable without specifying certain operations. On the other hand, VeriSmart reliably reports that the expression value * to.length at lines 2–4 would overflow.

One of the main reasons for the unstable results of bug-finders is that they rely heavily on a range of heuristics to avoid false positives (e.g., see 

[7]). Though heuristics are good at reducing false positives, the resulting analyzer is often very brittle; even small changes in programs may end up with missing fatal vulnerabilities as shown in Example 1 and 2, which is particularly undesirable for safety-critical software like smart contracts.

Example 3

Figure 3 shows a simplified version of the contract, called BTX. The program has two global state variables: balance stores balances of each account address (line 2), and totalSupply is the total amount of the supplied tokens (line 3). The constructor function initializes totalSupply with tokens (line 6), and gives the same amount of tokens to the creator of the contract (line 7). The transfer function sends value tokens from the transaction message sender’s account to the recipient’s account (lines 12–13), if it does not incur the underflow in the message sender’s balance (line 11). The transferFrom function is similar to transfer with an exception to the order of performing addition and subtraction.

The contract has four arithmetic operations at lines 12, 13, 18, and 19, all of which are free of integer over/underflows. However, it is nontrivial to see why they are all safe. In particular, the safety of the two addition operations at lines 13 and 18 is tricky, because there are no direct safety-checking statements in each function. To see why they do not overflow, we need to discover the following two transaction invariants that always hold no matter how the transactions (transfer and transferFrom) are interleaved:

  • the sum of all account values is , i.e.,

    (1)
  • and computing does not cause overflow.

By combining these two conditions and the preconditions expressed in the require statements at lines 11 and 17, we can conclude that, at lines 13 and 18, the maximum values of both balance[to] and value are , and thus the expression balance[to]+value does not overflow in 256-bit unsigned integer operations.

Since reasoning about the safety in this case is tricky, it is likely for human auditors to make a wrong conclusion that the contract is unsafe. This is in fact what happened in the recent CVE report (CVE-2018-13326)333https://nvd.nist.gov/vuln/detail/CVE-2018-13326; the CVE report incorrectly states that the two addition operations at lines 13 and 18 are vulnerable and thus the operations may overflow. Unfortunately, existing safety analyzers do not help here. In particular, verifiers, Zeus [11] and SMTChecker [12], are not precise enough to keep track of the implicit invariants such as (1) and therefore cannot prove the safety at lines 13 and 18. Bug-finders Osiris and Oyente also produce false alarms. Mythril does not report any issues, but this does not mean that it proved the absence of vulnerabilities.

By contrast, VeriSmart is able to prove that the contract is safe without any false alarms. Notably, VeriSmart does so by automatically inferring hidden invariants described above. To our knowledge, VeriSmart is the first of its kind, which discovers global invariants of smart contracts and leverages them during the verification process in a fully automated way.

1contract BTX {
2  mapping (address => uint) public balance;
3  uint public totalSupply;
4
5  constructor () {
6    totalSupply = 10000;
7    balance[msg.sender] = 10000;
8  }
9
10  function transfer (address to, uint value) {
11    require (balance[msg.sender] >= value);
12    balance[msg.sender] -= value;
13    balance[to] += value; // Safe
14  }
15
16  function transferFrom (address from, address to, uint value) {
17    require (balance[from] >= value);
18    balance[to] += value; // Safe
19    balance[from] -= value;
20  }
21}
Fig. 3: Example contract simplified from CVE-2018-13326.

Iii VeriSmart Algorithm

This section describes the verification algorithm of VeriSmart. We formally present the algorithm in a general setting, so it can be used for analyzing other safety properties as well beyond our application to arithmetic safety.

Language

For brevity, we focus on a core subset of Solidity [17]. However, VeriSmart supports the full Solidity language as the extension is discussed in Section IV. Consider the following subset of Solidity:

We assume a single contract is given, which consists of a sequence of global state variable declarations () and a sequence of function definitions (), where and denote the sets of global variables and functions in the contract, respectively. We assume a constructor function exists in . Each function is defined by a function name (), argument (), and a body statement (). A statement is an atomic statement (), a conditional statement, or a while loop. An atomic statement is an assignment to a variable (), an assignment to an array element (), an statement, or an statement. In our language, we model mapping variables in Solidity as arrays. In our language, differs from ; while the former models the require statements in Solidity and stops execution if the condition evaluates to false, the latter does not affect program semantics. and stand for conventional arithmetic and boolean expressions, respectively, where we assume arithmetic expressions produce 256-bit unsigned integers. In our language, loops are annotated with labels (), and the entry and the exit of each function are annotated with special labels and , respectively. Let Label be the set of all labels in the program. We assume each function has public (or external) visibility, meaning that all functions in the contract can be called from the outside.

Goal

Our goal is to develop an algorithm that proves or disproves every assertion (which we also call query) in the contract. We assume that safety properties to verify are expressed as the statements in the program. In our application to arithmetic safety, assertions can be automatically generated; for example, for each addition a+b and multiplication a*b, we generate assert(a+b>=a) and assert(a==0||(a!=0 && (a*b)/a==b)), respectively.

Notation

We use the lambda notation for functions. For example, is the function that takes and returns . We write FOL for the set of first-order formulas in the combined theory of fixed-sized bitvectors, arrays with extensionality, and equality with uninterpreted functions. When is an expression or a formula, we write for the new expression where gets replaced by . We write for the set of free variables in .

Iii-a Algorithm Overview

VeriSmart departs significantly from existing analyzers for smart contracts [7, 8, 9, 10, 11, 12, 18, 19, 20, 21] in that VeriSmart applies a CEGIS-style verification algorithm that iteratively searches for hidden invariants that are required for verifying safety properties.

Invariants of Smart Contracts

We consider two kinds of invariants for smart contracts: transaction and loop invariants. We say a formula is a transaction invariant if it is valid at the end of the constructor and the validity is preserved by the execution of public functions that can be invoked by transactions. Loop invariants are more standard; a formula is an invariant of a loop if the formula is valid at the entry of the loop and is preserved by the loop body. Transaction invariant is global and thus it is a single formula, whereas loop invariants are local and must be separately given for each loop in the program. Thus, our algorithm aims to discover a pair , where is a transaction invariant and is a mapping from loop labels to formulas. We write for pointwise conjoining operation between two mappings and , i.e., .

1contract RunningExample {
2  uint public n;
3  constructor () { n = 1;}
4  function f () public {
5    assert (n + 1 >= n);
6    n = n + 1;
7    if (n >= 100) { n = 1; }
8  }
9}
Fig. 4: Example contract.
Example 1

Consider the contract in Figure 4. The program has one global variable n, which is initialized to 1 in the constructor. The function f can be invoked from the outside of the contract; it increases the value of n by 1 every time it is called, but resets it to 1 whenever n is 100. Note that is a transaction invariant: 1) it holds at the end of the constructor, and 2) supposing that holds before entering f, we can prove that it also holds when exiting the function. Our algorithm automatically discovers the invariant and succeeds to prove that the assertion at line 5 is safe; upon entering f, holds and is valid in the theory of unsigned 256 bitvector arithmetic.

Fig. 5: Algorithm overview.

Algorithm Structure

Figure 5 describes the overall structure of our algorithm. The input is a smart contract written in Solidity, and the output is a verification result that indicates whether each query (i.e., assertion) in the program is proven safe or not. The algorithm consists of two components, a validator and a generator, where the validator has a solver as a subcomponent.

The algorithm aims to find contract-specific invariants that are inductive and strong enough to prove all provable queries in the given contract. The role of the generator is to produce candidate invariants that help the validator to prove as many queries as possible. Given a candidate invariant, the validator checks whether the invariant is useful for proving the queries. If it fails to prove the queries, it provides the set of unproven queries as feedback to the generator. The generator uses this feedback to refine the current invariant and generate new ones. This way, the validator and generator form an iterative loop that continuously refines the analysis results until the program is proven to be safe or the given time budget is exhausted. Upon termination, all unproven queries are reported to users as potential safety violations.

Algorithm 1 shows our verification algorithm. It uses a workset () to maintain candidate invariants, which initially contains the trivial invariant (line 1): the transaction invariant is and the loop invariant mapping maps every label () to . The repeat-until loop at lines 2–11 correspond to the feedback loop in Figure 5. At lines 3 and 4, the algorithm chooses and removes a candidate invariant from the workset. We choose a candidate invariant that is the smallest in size. At line 5, we run the validator to check whether the current candidate is inductive and strong enough to prove queries, which returns a pair of the boolean variable , indicating whether the current candidate invariant is inductive or not, and the set of unproven queries. If is empty (line 6), the algorithm terminates and the contract is completely proven to be safe. Otherwise (line 8), we generate a new set of candidate invariants and add them to the workset. Finally, when the current candidate fails to prove some queries but is known to be at least inductive (line 9), we strengthen the remaining candidate invariants using it (line 10), because we can potentially prove more queries with stronger invariants. By doing so, we can find useful invariants more efficiently. The algorithm iterates until it times out or the workset becomes empty. We assume that the algorithm implicitly maintains previously generated invariants to avoid redundant trials.

Technical Contributions

Although the overall algorithm follows the general framework of CEGIS [13, 14, 15], we provide an effective, domain-specific instantiation of the framework in the context of smart contract analysis. Now we describe the details of this instantiation: validator (III-B), generator (III-C), and solver (III-D).

1:A smart contract to verify
2:Verification success or potential safety violations
3:
4:repeat
5:     Choose a candidate invariant from
6:     
7:     
8:     if  then verification succeeds
9:     else
10:         
11:         if  then
12:                             
13:until  or timeout
14:return potential safety violations
Algorithm 1 Our Verification Algorithm

Iii-B Validator

The goal of the validator is to check whether the current candidate invariant is inductive and strong enough to prove safety of the queries. The input to the validator is an annotated program , i.e., smart contract annotated with transaction () and loop () invariants. The validator proceeds in three steps.

Basic Path Construction

Given an annotated program , we first break down the program into a finite set of basic paths [22]. A basic path is a sequence of atomic statements that begins at the entry of a function or a loop, and ends at the exit of a function or the entry of a loop, without passing through other loop entries. We represent a basic path by the five components: , where is the label of the starting point (i.e., function or loop entry) of the path, is the invariant annotated at , are atomic statements, is the label of the end point (i.e., function exit or loop entry) of the path, and is the invariant annotated at . The basic path satisfies the following properties:

  1. If is a function entry, (i.e., transaction invariant). An exception: if is entry of constructor. If is a function exit, .

  2. Otherwise, i.e., when and are labels of loops, and (i.e., considering loop invariants).

Note that our construction of basic paths is exhaustive as we consider all paths of the program by summarizing the effects of transactions and loops with their invariants. The basic paths can be computed by traversing control flows of the program.

Example 2

Consider the contract in Figure 4 annotated with the transaction invariant . We do not consider loop invariants as the contract does not have any loops. The annotated program is converted into three basic paths:

where and . represents the basic path of the constructor (whose entry and exit labels are and , respectively). and represent the basic paths of the function f that follow the true and false branches of the conditional statement at line 7, respectively. Note that conditional statements and loops do not appear as they are broken into basic paths with original conditions given as statements.

Generation of Verification Conditions

Let be the set of basic paths constructed from the annotated program. We next generate verification conditions (VCs) for each basic path.

To derive the VCs, we should be able to express effects of program statements in . To do so, we define a strongest postcondition predicate transformer , which is defined in a standard way for each atomic statement as follows:

where unprimed variables (e.g., ) and primed variables (e.g., ) represent the current and previous program states, respectively. In each rule, is a precondition and transforms it into a postcondition while accumulating the safety conditions of assertions in . We write for the modified array that stores the value of at position . With , we define the procedure GenVC that generates the VC of a basic path:

where . The generated VC consists of two parts: is a formula for checking that the annotated invariants are inductive, and is a formula for checking the safety properties in assertions.

Example 3

Consider the basic path in Example 2. The corresponding VC is a pair of , both of which are valid in the bitvector theory.

Collecting Unproven Paths

Finally, we return a pair of the boolean variable and the subset of basic paths whose VCs are invalid:

and denote the first (i.e., the VC on inductiveness) and the second (i.e., the VC on safety) component of , respectively. We also write for a clause of , where corresponds to the safety condition of a single query. In the above procedure, we first check whether some VCs regarding inductiveness are invalid. If it does so (if-case), we set to and becomes the basic paths where inductiveness checking failed. Note that, in this case, we accelerate our verification procedure by excluding from the paths where safety checking may fail. That is, we first focus on refining invariants to be inductive and then strengthen them further to prove safety rather than trying to achieve both at the same time. When the current candidate invariant is inductive (else-case), we set to and collect the basic paths where some queries are not proven to be safe. To check the validity of the VCs, we use a domain-specific solver, which will be explained in Section III-D.

Iii-C Generator

The generator takes the set as feedback and produces new candidate invariants by refining the current one . returns the following set:

where Loop and Tran generate new loop and transaction invariants, respectively, based on the current ones. We define so as to return the following set of refined loop invariants:

where we assume and are loop labels, and is the sequence of atomic statements in the basic path. The definition of :

where we assume is the label of a function entry or is the label of a function exit. In the definitions above, the procedures RefineL and RefineT are actually responsible for refining loop and transaction invariants, which ultimately determine the effectiveness of the generator and the overall verification algorithm.

Domain-Specific Refinement

We define RefineL and RefineT in terms of refinement relation. A refinement relation is a binary relation on logical formulas, parameterized by variable set and constant set , which describes how a candidate invariant is refined in one step: i.e., can be refined to any of . In our approach, choosing a right refinement relation holds the key to cost-effective verification since it defines the search space of candidate invariants. For example, simply choosing a very general or specific refinement relation would not be practical because of the huge or too limited search space. Instead, we have to carefully design a refinement relation tailored for real-world smart contracts to make our algorithm cost-effective.

Fortunately, we observed that smart contracts in practice share common properties and accordingly considered the following points when we design the refinement relation. First, smart contracts often use loops in simple and restricted forms, e.g., for(i = 0; i < x ; i++), and therefore it is sufficient to consider simple numerical invariants. In particular, we decided to focus on invariants of the forms , , , , and , where are variables and denotes integer constants. That is, we do not consider non-linear or compound invariants such as and . Second, because smart contracts use the mapping datatype extensively (e.g., balance in token contracts), it is particularly important to capture their common properties (e.g., the sum of balance is equal to totalSupply). Currently, we support the function symbol for variables of mapping type: for example, balance means the sum of all balances. Third, we consider invariants that are quantifier-free conjunctive formulas. That is, we do not allow disjunctions or quantifiers to be used in candidate invariants.

Based on the observations, we define the refinement relation:

where is the set of atomic predicates of the forms , where , , and . That is, the current invariant is strengthened with a linear and quantifier-free atomic predicate (). Note that we only use the symbol in the equality predicate as we found invariants of other forms such as are rarely used in practice. Finally, we define RefineT and RefineL using as follows:

where and are the variables and constants appearing in the atomic statements , respectively. globals and cnstr represent the set of global variables and constants in the constructor function, respectively. We instantiate the sets and differently because transaction invariants often involve global state variables and constants of the entire contract while loop invariants involve local and global variables and constants that appear in the enclosing function. In both cases, we reduce the search space by focusing on local variables and constants to those of the current basic path ().

Iii-D Solver

The last component is the solver that is used by the validator to discharge the verification conditions. The solver ultimately uses an off-the-shelf SMT solver (we use Z3 [23]) but performs domain-specific preprocessing and optimization steps before using it, which we found important to make our approach practical for real-world contracts. For a basic path , we assume its verification condition (either the inductiveness condition, i.e., , or the safety condition of a query, i.e., ) is given.

Preprocessing

Since may contain symbols (i.e., ) that conventional SMT solvers cannot understand, we must preprocess so that all such uninterpretable symbols get replaced by equi-satisfiable formulas in conventional theories. For example, let contains as follows:

where we elide portions of that are irrelevant to the mapping variable (i.e., is only accessed with and in the given basic path ). Our idea to translate into a formula without is to instantiate the symbol with respect to the context where is evaluated. In this example, we can translate the formula into the following:

where asserts that the sum of distinct elements of equals . Because is used in the given basic path with two index variables and , we consider two cases: and . When , we replace by , where is a fresh variable denoting the sum of for all , where is the domain of the mapping. The other case () is handled similarly. is the additional assertion that guarantees the validity of : , where is a fresh propositional variable, meaning that the summations in do not overflow. The general method for our preprocessing is given in Appendix -A.

Note that the verification condition after preprocessing can be checked by a conventional SMT solver. However, we found that the resulting formulas are often too complex for modern SMT solvers to handle efficiently, so we apply the following optimization techniques.

Efficient Invalidity Checking

Most importantly, we quickly decide invalidity of formulas without invoking SMT solvers. We observed that even state-of-the-art SMT solvers can be extremely inefficient when our verification conditions are invalid. For example, consider the following formula:

It is easy to see that the formula is invalid in the theory of 256-bit arithmetic (e.g., it does not hold when and ). Unfortunately, however, the latest version of Z3 [23] (ver 4.8.4) and CVC4 [24] (ver 1.7) takes more than 3 minutes to conclude the formula is invalid.

To mitigate this problem, we designed a simple decision procedure based on the free variables of formulas; given a VC of the form , we conclude that it is invalid if . The intuition is that must include more variables than , as a necessary condition to be stronger than . In the above example, we conclude the formula is invalid because . In practice, we found that this simple technique improves the scalability of the verification algorithm significantly as it avoids expensive calls to SMT solvers.

Let us explain why our technique is correct. We first review the notion of interpretation in first-order logic [22]. An interpretation is a pair of a domain () and an assignment (). The domain is a nonempty set of values (or objects). The assignment maps variables, constants, functions, and predicate symbols to elements, functions, and predicates over . Let denote an -variant of such that accords with on everything except for . That is, and if , but and may be different. Then, we have the following result (see Appendix -B for proof).

Proposition 1

Let and be first-order formulas. Then, is invalid if the following three conditions hold:

  1. ,

  2. is satisfiable: , and

  3. has a nontrivial variable: there exists such that for any interpretation , if then for some .

Our technique is based on this result but checks the first condition (i) only, which can be done syntactically and efficiently. We do not check the last two conditions (ii) and (iii) as they require invoking SMT solvers in general. Therefore, our technique may decide valid VCs as invalid (i.e., producing false positives) although no invalid VCs are determined to be valid (i.e., no false negatives). Because the technique causes no false negatives, it can be used by sound verifiers.

Although approximated, our technique rarely produces false positives in practice. For example, consider the valid formula . Our technique may incorrectly conclude that the formula is invalid, since but we do not check the condition (iii) that the formula violates. Note that, however, such a trivial formula is unlikely to appear during the verification of real-world smart contracts; the verification condition would be generated from the trivial expression that does not appear frequently in programs. Even when they appear, we can easily remove the triviality. For example, it is easy to simplify into that is not determined as invalid by our technique since . In fact, no false positives were caused by our technique in our experiments in Section V.

Efficient Validity Checking

We also quickly identify some valid formulas by using a number of domain-specific templates. This is because our verification conditions are likely to involve arrays and non-linear expressions extensively but modern SMT solvers are particularly inefficient for handling them. For example, a simple yet important validity template is as follows:

where denotes an arbitrary formula, a 256-bit unsigned integer variable, and and some integer constants. This template asserts that, regardless of the precondition , holds if . Using the template, we can conclude that a formula is valid (i.e., the subtraction is safe from underflow) without calling an external SMT solver. These templates are used before the preprocessing step; several templates were designed to determine the validity of formulas containing domain-specific symbols at a high level without preprocessing. We provide more examples in Appendix -C.

Iv Implementation

In this section, we explain implementation details of VeriSmart, which consists of about 7,000 lines of OCaml code. Although Section III describes our algorithm for a small subset of Solidity, our implementation supports the full language (except for inline assembly). Most Solidity features (e.g., function modifers) can be desugared into our core language in a straightforward way. We discuss nontrivial issues below.

Function Calls

Basically, we handle function calls by inlining them into their call-sites up to a predefined inlining depth (currently, less than or equal to 2). Exceptions include relatively large functions (with more than 20 statements) that might cause scalability issues and inter-contract function calls (i.e., calling functions in other contracts via contract objects). To perform exhaustive verification, we handle those remaining function calls conservatively as follows.

First, we conservatively reflect side-effects of function calls on the caller side. To do so, we first run a side-effect analysis [25] to find variables whose values may be changed by the called functions. Next, we weaken the formulas at call-sites by replacing each of atomic predicates that involve those variables by true. For example, consider a call statement x:=foo() and assume foo may change the value of variable a in its body. Suppose further the precondition of the call-site is . Then, we obtain the following postcondition of the call-site: where and get replaced by . Regarding inter-contract function calls, it is enough to invalidate the value of return variables only, as inter-contract calls in Solidity cannot directly modify other contracts’ states. For example, consider the precondition above and an inter-contract call x : = o.foo (). We produce the postcondition , where only is replaced by .

Second, we separately analyze function bodies not inlined. This step is needed to detect potential bugs in the functions skipped during the step described in the preceding paragraph. To perform exhaustive verification, we analyze these functions by over-approximating their input states. Specifically, when the function in a main contract has public or external visibility, we run the algorithm in Section III which annotates entry and exit with transaction invariant. On the other hand, when the function in a main contract has internal or private visibility (i.e., the functions which cannot be called from the outside and can only be accessed via function call statements) or the function is defined in other contracts, we generate the VCs after we annotate entries and exits of them with true, i.e., incoming state at the entry is over-approximated as true and inductiveness condition can be trivially checked at the exit.

In summary, VeriSmart performs exhaustive safety verification without missing any possible behaviors. In theory, we may lose precision due to the conservative function-call analysis. However, as our experimental results in Section V demonstrate, our approach is precise enough in practice.

Inheritance

In Section III, we assumed a single contract is given. To support contract inheritance, we copy functions and global variables of parent contracts to a main contract using the inheritance graph provided by the Solidity compiler. During this conversion, we consider function overriding and variable hiding, and do not copy functions with the same signatures and the same variables.

Structures

We encode structures in Solidity with arrays. To do so, we introduce a special mapping variable for each member of a structure type, which maps structures to the member values. For example, given a precondition , the strongest postcondition of command x.y := z is , where is a map (or an array) from structures to the corresponding values of member y and is an uninterpreted symbol for the structure variable x. Note that we are able to handle aliasing among structures using this encoding. For example, if two structures p and q are aliased and they both have y as a member, then we can access the same member y using either of the structures, i.e., .

Inline Assembly

One potential source of false negatives of source code analyzer (e.g., Zeus [11]) is inline assembly. VeriSmart also has this limitation and may miss bugs hidden in embedded bytecode. However, VeriSmart conservatively analyzes the remaining parts of the source code by considering the side-effects of the assembly blocks in a similar way that we handle function call statements, i.e., we replace each atomic predicate by true if it involves variables used in assembly code (using the information provided by the Solidity compiler). Note that this limitation does not impair the practicality of VeriSmart significantly, as inline assembly is not very common in practice. For example, in our benchmarks in Section V, only four contracts (#4, #16, #52 in Table II, #24 in Table IV) contain assembly blocks but none of these assembly blocks include arithmetic operations.

V Evaluation

We evaluate the effectiveness of VeriSmart by comparing it with existing tools. Research questions are as follows:

  1. How precisely can VeriSmart detect arithmetic bugs compared to the existing bug-finders, i.e., Osiris [7], Oyente [9], Mythril [8], MantiCore [10]?

  2. How does VeriSmart compare to the existing verifiers, i.e., Zeus [11] and SMTChecker [12]?

In addition, we conduct a case study to show VeriSmart can be easily extended to support other types of vulnerabilities (Section V-C). We used the latest versions of the existing tools (as of May 1st, 2019). All experiments were conducted on a machine with Intel Core i7-9700K and 64GB RAM.

V-a Comparison with Bug-finders

We evaluate the bug-finding capability of VeriSmart by comparing it with four bug-finding analyzers for Ethereum smart contracts: Osiris [7], Oyente [26], Mythril [8], and MantiCore [10]. They are well-known open-sourced tools that support detection of integer overflows (Osiris, Oyente, Mythril, MantiCore) and division-by-zeros (Mythril). In particular, Osiris is arguably the state-of-the-art tailored for finding integer overflow bugs [7].

Setup

We used 60 smart contracts that have vulnerabilities with assigned CVE IDs. We have chosen these contracts to enable in-depth manual study on the analysis results with known vulnerabilities confirmed by CVE reports. The 60 benchmark contracts were selected randomly from the 487 CVE reports that are related to arithmetic overflows (Table I), excluding duplicated contracts with minor syntactic differences (e.g., differences in contract names or logging events). During evaluation, we found four incorrect CVE reports (#13, #20, #31, #32 in Table II), which will be discussed in more detail at the end of the section.

To run Osiris, Oyente, Mythril, and MantiCore, we used public docker images provided together with these tools. Following prior work [7], we set the timeout to 30 minutes per contract. For fair comparison, we activated only the analysis modules for arithmetic bug detection when such option is available (Mythril, MantiCore). We left other options as default. For VeriSmart, we set the timeout to 1 minute for the last entrance of the loop in Algorithm 1, and set the timeout to 10 seconds for Z3 request, because these numbers worked effectively in our experience; if we set each timeout to a lower value, the precision may decrease (Section V-D). In analysis reports of each tool, we only counted alarms related to arithmetic bugs (integer over/underflows and division-by-zeros) for a main contract whose name is available at the Etherscan website [27].

Results

No. CVE ID Name LOC #Q VeriSmart Osiris [7] Oyente [9, 26] Mythril [8] MantiCore [10]
#Alarm #FP CVE #Alarm #FP CVE #Alarm #FP CVE #Alarm #FP CVE #Alarm #FP CVE
#1 2018-10299 BEC 299 6 2 0 0 0 1 0 2 0 0 0
#2 2018-10376 SMT 294 22 13 0 1 0 2 0 1 0 timeout ( 3 days)
#3 2018-10468 UET 146 27 14 0 9 0 8 0 5 0 0 0
#4 2018-10706 SCA 404 48 33 0 9 0 4 0 2 0 internal error
#5 2018-11239 HXG 102 11 7 0 6 0 2 0 3 0 2 0
#6 2018-11411 DimonCoin 126 15 7 0 5 0 5 0 5 0 3 0
#7 2018-11429 ATL 165 9 4 0 3 0 2 0 0 0 0 0
#8 2018-11446 GRX 434 39 24 2 8 2 12 4 4 2 internal error
#9 2018-11561 EETHER 146 10 5 0 4 0 2 0 2 0 0 0
#10 2018-11687 BTCR 99 20 4 0 2 0 2 0 3 2 0 0
#11 2018-12070 SEC 269 40 8 0 6 0 4 0 3 1 0 0
#12 2018-12230 RMC 161 9 5 0 3 0 5 0 0 0 0 0
#13 2018-13113 ETT 142 9 2 0 N/A 4 2 N/A 2 2 N/A 0 0 N/A 0 0 N/A
#14 2018-13126 MoxyOnePresale 301 5 3 0 0 0 0 0 0 0 0 0
#15 2018-13127 DSPX 238 6 4 0 3 0 3 0 1 0 0 0
#16 2018-13128 ETY 193 10 4 0 3 0 3 0 0 0 0 0
#17 2018-13129 SPX 276 9 6 0 5 0 3 0 1 0 internal error
#18 2018-13131 SpadePreSale 312 4 3 0 0 0 0 0 0 0 internal error
#19 2018-13132 SpadeIco 403 9 6 0 0 0 0 0 0 0 internal error
#20 2018-13144 PDX 103 5 2 0 2 1 2 1 internal error 0 0
#21 2018-13189 UNLB 335 4 3 0 2 0 3 0 1 0 0 0
#22 2018-13202 MyBO 183 17 11 0 5 0 3 0 1 0 internal error
#23 2018-13208 MoneyTree 171 17 10 0 4 0 2 0 2 0 0 0
#24 2018-13220 MAVCash 171 15 10 0 4 0 2 0 1 0 0 0
#25 2018-13221 XT 186 15 10 0 4 0 2 0 2 0 0 0
#26 2018-13225 MyYLCToken 181 17 11 0 5 0 6 0 0 0 0 0
#27 2018-13227 MCN 172 17 10 0 4 0 2 0 2 0 0 0
#28 2018-13228 CNX 171 17 10 0 4 0 2 0 2 0 0 0
#29 2018-13230 DSN 171 17 10 0 4 0 2 0 2 0 0 0
#30 2018-13325 GROW 176 12 2 0 4 2 1 1 0 0 0 0
#31 2018-13326 BTX 135 9 2 0 N/A 4 2 N/A 2 2 N/A 0 0 N/A 0 0 N/A
#32 2018-13327 CCLAG 92 5 2 0 2 1 2 1 0 0 0 0
#33 2018-13493 DaddyToken 344 40 22 0 8 0 2 0 3 0 internal error
#34 2018-13533 ALUXToken 191 23 13 0 8 0 2 0 1 0 1 0
#35 2018-13625 Krown 271 22 9 0 1 0 3 0 0 0 internal error
#36 2018-13670 GFCB 103 14 11 0 6 1 3 1 1 0 0 0
#37 2018-13695 CTest7 301 17 8 0 0 0 0 0 0 0 0 0
#38 2018-13698 Play2LivePromo 131 8 7 0 7 0 7 0 5 0 5 0
#39 2018-13703 CERB_Coin 262 17 8 0 5 0 2 0 2 1 0 0
#40 2018-13722 HYIPToken 410 8 3 0 2 0 2 0 0 0 internal error
#41 2018-13777 RRToken 166 8 3 0 2 0 2 0 0 0 0 0
#42 2018-13778 CGCToken 224 13 6 0 4 0 4 0 1 0 1 0
#43 2018-13779 YLCToken 180 17 11 0 5 0 6 0 0 0 0 0
#44 2018-13782 ENTR 171 17 10 0 4 0 2 0 2 0 0 0
#45 2018-13783 JiucaiToken 271 19 11 0 6 0 4 0 0 0 internal error
#46 2018-13836 XRC 119 22 7 0 5 0 3 0 3 1 timeout ( 3 days)
#47 2018-14001 SKT 152 19 10 0 4 0 3 0 3 0 0 0
#48 2018-14002 MP3 83 12 4 0 2 0 2 0 2 1 timeout ( 3 days)
#49 2018-14003 WMC 200 15 6 0 3 0 2 0 3 0 1 0
#50 2018-14004 GLB 299 40 8 0 5 0 1 0 0 0 0 0
#51 2018-14005 Xmc 255 29 11 0 8 0 1 0 3 0 0 0
#52 2018-14006 NGT 249 27 13 0 1 0 5 0 0 0 timeout ( 3 days)
#53 2018-14063 TRCT 178 9 1 0 1 0 1 0 4 2 0 0
#54 2018-14084 MKCB 273 17 10 0 5 0 4 0 2 0 1 0
#55 2018-14086 SCO 107 16 14 0 7 2 5 2 0 0 0 0
#56 2018-14087 EUC 174 15 7 0 4 0 4 0 0 0 0 0
#57 2018-14089 Virgo_ZodiacToken 208 30 20 0 12 0 5 0 14 0 0 0
#58 2018-14576 SunContract 194 12 4 0 1 0 0 0 0 0 0 0
#59 2018-17050 AI 141 8 3 0 1 0 1 0 0 0 0 0
#60 2018-18665 NXX 79 7 5 0 4 0 4 0 0 0 0 0
Total :58 :41 :20 :10 : 2
12493 976 492 2 : 0 240 13 : 0 171 14 :15 94 10 : 1 14 0 : 0
: 0 :17 :23 :46 :42
TABLE II: Evaluation of existing tools on CVE reports. LOC: lines of code. #Q: the total number of queries for each contract after removing unreachable functions. #Alarm: the number of entire alarms produced by each tool. #FP: the number of false alarms. CVE: a marker that indicates whether each tool successfully detects vulnerabilities in CVE. : a tool successfully pinpoints all vulnerable locations in CVE. : a tool detects only a part of vulnerabilities in CVE, or obscurely reports that an entire function body is vulnerable without pinpointing specific locations. : a tool totally failed to detect vulnerabilities in CVE. N/A: all vulnerabilities reported in CVE are actually safe (#13, #31). For partly correct CVE reports (#20, #32), the CVE information is valid w.r.t. them.

Table II shows the evaluation results on the CVE dataset. For each benchmark contract and tool, the table shows the number of alarms (#Alarm) and the number of false positives (#FP) reported by the tool; regarding these two numbers, we did not count cases where the tools (Oyente and Mythril) ambiguously report that the entire body of a function or the entire contract is vulnerable. The CVE columns indicate whether the tool detected the vulnerabilities in CVE reports or not (: a tool successfully pinpoints all vulnerable locations in each CVE report, : a tool does not detect any of them, : a tool detects only a part of vulnerable points in each CVE report or, obscurely reports the body of an entire function containing CVE vulnerabilities is vulnerable without pinpointing specific locations. N/A: all vulnerabilities in CVE reports are actually safe; see Table III).

The results show that VeriSmart far outperforms the existing bug-finders in both precision and recall. In total, VeriSmart reported 492 arithmetic over/underflow and division-by-zero alarms. We carefully inspected these alarms and confirmed that 490 out of 492 were true positives (i.e., safety can be violated for some feasible inputs), resulting in a false positive rate () of 0.41% (2/492). We also inspected 484 (=976-492) unreported queries to confirm that all of them are true negatives (i.e., no feasible inputs exist to violate safety), resulting in a recall of 100%. Of course, VeriSmart detected all CVE vulnerabilities. In contrast, existing bug-finders missed many vulnerabilities. For example, Osiris managed to detect 41 CVE vulnerabilities with 17 undetected known vulnerabilities. Oyente  pinpointed 20 exact vulnerable locations in CVE, partly detected vulnerabilities in 4 CVE reports, vaguely raised alarms on 11 functions containing vulnerable locations, and missed 23 CVE vulnerabilities. Mythril detected vulnerabilities in 10 CVE reports, obscurely warned that 1 function is vulnerable, and missed 46 known issues. MantiCore was successful in only two CVE reports, failing on 42 CVE reports. The false positive rates of Osiris, Oyente, and Mythril were 5.42% (13/240), 8.19% (14/171), and 10.64% (10/94), respectively.

Efficiency

VeriSmart was also competitive in terms of efficiency. To obtain the results in Table II on the 60 benchmark programs, VeriSmart, Osiris, Oyente, Mythril, and MantiCore took 1.1 hour (3,807 seconds), 4.2 hours (14,942 seconds), 14 minutes, 13.8 hours (49,680 seconds), and 31.4 hours (112,920 seconds) respectively, excluding the cases of timeout (though we set the timeout to 30 minutes, MantiCore sometimes did not terminate within 3 days) and internal errors (e.g., unsupported operations encountered, abnormal termination) of Mythril and MantiCore.

False Alarms of Bug-finders

To see why VeriSmart achieves higher precision than bug-finders, we inspected all 37 (=13+14+10) false positives reported by bug-finders. Bug-finders reported 18 among 37 false positives due to the lack of inferring transaction invariants, all of which are avoided by VeriSmart. The remaining 19 false positives were due to imprecise handling of conditional statements. For example, consider the following code snippet (from #55):

function transfer(address _to, uint _value) {
  if (msg.sender.balance < min)
    sell((min - msg.sender.balance) / sellPrice);
}

where the safety of min - msg.sender.balance is ensured by the preceding guard. Both Osiris and Oyente incorrectly reported that the subtraction is unsafe and integer underflow would occur. This might be because Osiris and Oyente do not keep track of complex path conditions (e.g., involving structures in this case) for some engineering issues. In contrast, VeriSmart analyzes every conditional statement precisely and do not produce such false alarms.

False Alarms of VeriSmart

VeriSmart produced two false alarms in the benchmark #8, because it is currently unable to capture quantified transaction invariants. Consider the unlockReward function in Figure 6. The subtraction operation at line 5 seems to cause arithmetic underflow; the value may be changed at line 4, and thereafter the relation totalLocked[addr] > value seems not to hold anymore. However, the subtraction is safe because the following transaction invariant holds over the entire contract:

(2)

with an additional condition that computing the summation () does not cause overflow. With this transaction invariant, value is always less than totalLocked[addr]. Because VeriSmart considers quantifier-free invariants only (Section III-C), it falsely reported that an underflow would occur at line 5. Osiris and Oyente produced the false alarm too at the same location.

1function unlockReward(address addr, uint value) {
2  require(totalLocked[addr] > value);
3  require(locked[addr][msg.sender] >= value);
4  if(value == 0) value = locked[addr][msg.sender];
5  totalLocked[addr] -= value;  // false positive
6  locked[addr][msg.sender] -= value;
7}
Fig. 6: A function simplified from the benchmark #8. Osiris, Oyente, and VeriSmart warn that the subtraction at line 5 can cause arithmetic underflow, which is false positive (i.e., the subtraction is safe).

False Negatives of Bug-finders

We inspected CVE vulnerabilities that were commonly missed by the four bug-finders, and we found that the bug-finders often fail to detect bugs when vulnerabilities could happen via inter-contract function calls. For example, consider code adapted from #18:

function mint (address holder, uint value) {
  require (total+ value <= TOKEN_LIMIT); // CVE bug
  balances[holder] += value;             // CVE bug
  total += value;                        // CVE bug
}

There is a function call token.mint (...,...) in a main contract, where token is a contract object. We can see that all three addition operations possibly overflow with some inputs. For example, suppose total=1, value=0xfff…ff, and TOKEN_LIMIT=10000. Then, total+value overflows in unsigned 256-bit and thus the safety checking statement can be bypassed. Next, if balances[holder]=0, the holder can have tokens more than the predetermined limit TOKEN_LIMIT. VeriSmart detected the bugs as it conservatively analyzes inter-contract calls (Section IV).

Incorrect CVE Reports Found by VeriSmart

Interestingly, VeriSmart unexpectedly identified six incorrectly-reported CVE vulnerabilities. In Table III, the column # Incorrect Queries denotes the number of queries incorrectly reported to be vulnerable for each CVE ID. We could discover them as VeriSmart did not produce any alarms for those queries and then we manually confirmed that the CVE reports are actually incorrect. We have submitted a request for revising these issues to the CVE assignment team.

With the capability of automatically computing transaction invariants, VeriSmart successfully proved the safety for all the incorrectly reported vulnerabilities (i.e., zero false positives). In other words, VeriSmart could not have discovered incorrect CVE reports if it were without transaction invariants. The transaction invariants generated for proving the safety were similar to those in Example 3 of Section II. In contrast, existing bug-finders cannot be used for this purpose such as proving the safety; for example, Osiris and Oyente produced false positives for all of the 6 safe queries (i.e., the 6 incorrectly reported queries).

CVE ID Name #Incorrect #FP
Queries Osiris Oyente VeriSmart
2018-13113 ETT 2 2 2 0
2018-13144 PDX 1 1 1 0
2018-13326 BTX 2 2 2 0
2018-13327 CCLAG 1 1 1 0
TABLE III: List of incorrect CVE reports found by VeriSmart. #Incorrect Queries: the number of incorrectly reported queries to be vulnerable. #FP: the number of alarms raised by each tool for the incorrectly reported queries.

V-B Comparison with Verifiers

We now compare VeriSmart with SMTChecker [12] and Zeus [11], two recently-developed verifiers for smart contracts. In particular, SMTChecker is the “official” verifier for Ethereum smart contracts developed by the Ethereum Foundation, which is available in the Solidity compiler. Like VeriSmart, the primary goal of SMTChecker is to detect arithmetic over/underflows and division-by-zeros [12].

Setup

First of all, we must admit that the comparison with Zeus and SMTChecker in this subsection is rather limited, because Zeus is not publicly available and SMTChecker is currently an experimental tool that does not support the full Solidity language. Since we cannot run Zeus on our dataset, the only option was to use the public evaluation data [28] provided by the Zeus authors. However, the public data was not detailed enough to accurately interprete as the Zeus

 authors classify each benchmark contract simply as ‘safe’ or ‘unsafe’ without specific alarm information such as line numbers. The only objective information we could obtain from the data 

[28] was the fact that Zeus produces some (nonzero) number of false (arithmetic-overflow) alarms on 40 contracts, and we decided to use those in our evaluation. Starting with those 40 contracts, we removed duplicates with trivial syntactic differences, resulting in a total of 25 unique contracts (Table IV). Thus, the objective of our evaluation is to run VeriSmart and SMTChecker on the 25 contracts to see how many of them can be successfully analyzed by VeriSmart and SMTChecker without false alarms. We ran SMTChecker with the default setting.

Results

No. LOC #Q VeriSmart SMTChecker [12] Zeus [11]
#Alarm #FP Verified #Alarm #FP Verified Verified
#1 42 3 0 0 3 3
#2 78 2 1 0 2 1
#3 75 7 2 0 7 5
#4 70 7 0 0 7 7
#5 103 8 0 0 6 6
#6 141 5 2 0 internal error
#7 74 6 1 0 6 5
#8 84 6 0 0 4 4
#9 82 6 0 0 6 6
#10 99 2 1 0 internal error
#11 171 15 9 0 internal error
#12 139 7 0 0 internal error
#13 139 7 0 0 internal error
#14 139 7 0 0 internal error
#15 139 7 0 0 internal error
#16 141 16 10 0 internal error
#17 153 5 0 0 internal error
#18 139 7 0 0 internal error
#19 113 4 0 0 4 4
#20 40 3 0 0 3 3
#21 59 3 0 0 internal error
#22 28 3 1 0 1 0
#23 19 3 0 0 3 3
#24 457 30 13 6 internal error
#25 17 3 0 0 3 3
Total 2741 172 40 6 :24 55 50 : 1 : 0
: 1 : 12 :25
TABLE IV: Evaluation on the Zeus dataset. Verified: a tool detects all bugs without false positives (: success, : failure)

Table IV shows the evaluation results on the Zeus dataset. For each contract, the table shows the number of alarms (#Alarm), the number of false positives (#FP) produced by VeriSmart and SMTChecker. The column Verified indicates whether each tool detected all bugs without false positives (: success, : failure).

The results show that VeriSmart successfully addresses limitations of Zeus and SMTChecker. The 25 contracts contain 172 arithmetic operations, where VeriSmart pointed out 40 operations as potential bugs. We have manually checked that 34 out of total alarms are true positives. In benchmark #24, VeriSmart produced 6 false positives due to unsupported invariants (quantified invariants and compound invariants, Section III-C), and imprecise function call analysis. We manually checked that the remaining 132 (=172-40) queries proven to be safe by VeriSmart are actually true negatives. By contrast, according to the publicly available data [28], Zeus produces at least one false positives for each contract in Table IV (i.e., false alarms in total). SMTChecker could only analyze 13 contracts as it raised internal errors for the other 12 contracts, which is due to its immature support of Solidity syntax [29]. Among 61 operations from 13 contracts, SMTChecker succeeded to detect all 5 bugs in them thanks to its exhaustive verification approach. However, it reported 55 alarms in total, of which 50 are false positives. In terms of efficiency, SMTChecker took about 1 second per contract and VeriSmart took about 20 seconds per contract.

Importance of Transaction Invariants

The key enabler for high precision was the ability of VeriSmart to leverage transaction invariants. We also ran VeriSmart without inferring transaction invariants (i.e., using as transaction invariants); without transaction invariants, VeriSmart fails to verify 17 out of 25 contracts.

V-C Case Study: Application to Other Types of Vulnerabilities

VeriSmart can be used for analyzing other safety properties as well. To show this, we applied VeriSmart to finding bugs related to access control, where security-sensitive variables can be manipulated by anyone for malicious use. For example, consider the code snippet adapted from the EtherCartel contract for crypto idle game (CVE 2018-11329):

function DrugDealer() public { ceoAddr = msg.sender; }
function buyDrugs () public payable {
  ceoAddr.transfer(msg.value); // send Ether to ceoAddr
  drugs[msg.sender] += ...; // buy drugs by paying Ether
}

Observe that the address-typed variable ceoAddr, the beneficiary of Ether, can be taken by anyone who calls the function DrugDealer. If an attacker becomes the beneficiary by calling DrugDealer, the attacker might illegally take some digital assets whenever benign users buy some digital assets (i.e., drugs) by calling buyDrugs where transfer in it is a built-in function that sends Ether to ceoAddr. This vulnerability was exploited in about 1 hour after deployment [30].

To detect this bug, we used VeriSmart as follows. First, we specified safety properties by automatically generating the assertion assert(msg.sender==addr) right before each assignment of the form addr=...;, where addr is a global address-typed variable which is often security-sensitive (excluding assignments in constructors, which typically set the contract owners). Next, we ran VeriSmart without any modification of its verification algorithm. With this simple extension, VeriSmart worked effectively; it not only detected all known CVE vulnerabilities (2018-10666, 2018-10705, 2018-11329) but also proved the absence of this bug scenario for 55 contracts out of 60 from Table II. VeriSmart could not prove safety of the remaining 5 contracts due to the imprecise specification described above.

V-D Threats to Validity

We summarize limitations of our evaluation and consequent threats to validity. Firstly, the benchmark contracts that we used (60 CVE dataset + 25 Zeus dataset) might not be representative although we made effort to avoid bias in the datasets (e.g., removal of duplicates). Secondly, the performance of VeriSmart may vary depending on the performance of the off-the-shelf SMT solver (i.e., Z3) used internally or timeout options used in the experiments. For example, if we set the Z3 timeout to 5 seconds, VeriSmart produces 1 false positive for #9 in Table IV. Thirdly, we did not study the exploitability of bugs in this paper and did not compare VeriSmart  and other tools in this regard. Thus, the results may be different if those tools are evaluated with exploitability in mind. Lastly, although we did our best, we realized that manually classifying static analysis alarms into true or false positives is extremely challenging and the classification can be even subjective in a few cases.

Vi Related Work

In this section, we place our work in the literature and clarify our contributions regarding existing works. Section VI-A compares our work with existing smart contract analyses. Section VI-B discusses verification techniques for other domains.

Vi-a Analyzing Smart Contracts

Compared to existing techniques for analyzing smart contracts [9, 26, 8, 18, 7, 31, 32, 33, 34, 12, 11, 19, 20, 35, 36, 37, 38, 39, 40], VeriSmart is unique in that it achieves full automation, high precision, and high recall at the same time. Below, we classify existing approaches into fully automated and semi-automated approaches.

Fully Automated Approaches

VeriSmart belongs to the class of fully automated tools based on static or dynamic program analysis techniques that require no manual effort and can be used by end-users who lack expertise in formal verification. Instead, these approaches focus on relatively simple safety properties (e.g., overflows).

One popular approach is bug-finders based on symbolic execution or fuzz testing. For example, Oyente [9, 26], Mythril [8], Osiris [7], MantiCore [10] and Maian [18] discover bugs by symbolically executing EVM bytecode. Oyente is the first such tool for Ethereum smart contracts, which detects various bug patterns including arithmetic bugs. Mythril is also a well-known open-sourced tool for detecting a variety of bugs by performing symbolic execution. Osiris [7] is a tool that is specially designed for detecting arithmetic bugs. Maian [18] focuses on finding violations of trace properties. Gasper [31] uses symbolic execution to identify gas-costly programming patterns. ReGuard [34] and ContractFuzzer [41] use fuzz testing to detect common security vulnerabilities. Although symbolic execution and fuzz testing are effective for finding bugs, they inevitably miss critical vulnerabilities, which is particularly undesirable for safety-critical software like smart contracts.

Other approaches are verifiers that perform exhaustive analyses based on static analysis or automatic program verification techniques. Zeus [11] is a sound static analyzer that can detect arithmetic bugs or prove their absence. Zeus leverages abstract interpretation and software model checking [42]. SMTChecker [12] is the “official” verifier for Solidity developed by the Ehtereum Foundation. Its primarily goal is to verify the absence of arithmetic bugs such as integer over/underflows and division-by-zeros [12] by performing SMT-based bounded verification. Unlike VeriSmart,