SolidityCheck : Quickly Detecting Smart Contract Problems Through Regular Expressions

11/21/2019 ∙ by Pengcheng Zhang, et al. ∙ 0

As a blockchain platform that has developed vigorously in recent years, Ethereum is different from Bitcoin in that it introduces smart contracts into blockchain.Solidity is one of the most mature and widely used smart contract programming language,which is used to write smart contracts and deploy them on blockchain. However, once the data in the blockchain is written, it cannot be modified. Ethereum smart contract is stored in the block chain, which makes the smart contract can no longer repair the code problems such as re-entrancy vulnerabilities or integer overflow problems. Currently, there still lacks of an efficient and effective approach for detecting these problems in Solidity. In this paper, we first classify all the possible problems in Solidity, then propose a smart contract problem detection approach for Solidity, namely SolidityCheck. The approach uses regular expressions to define the characteristics of problematic statements and uses regular matching and program instrumentation to prevent or detect problems. Finally, a large number of experiments is performed to show that SolidityCheck is superior to existing approaches.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Ethereum is the largest blockchain that supports smart contracts with a market capital of 18 billion [1]. Smart contracts [2] are autonomous programs running on the blockchain platform. They are usually developed in several high-level languages and then compiled into bytecode. Once the bytecode of smart contract is deployed to blockchain, its functions can be invoked by others but the bytecode cannot be changed. Unfortunately, it is inevitable that many smart contracts contain bugs but they cannot be patched because of the data immutability of blockchain [3, 4]. Consequently, it is particularly important to have automated tools that can help developers thoroughly check their smart contracts before deploying their bytecode to the blockchain.

Types of problems Problem Description Severity
Security problem Vulnerabilities in contract code cause developers to suffer losses High
Performance problem Smart contract execution costs too much or performs poorly due to the use of certain statements Medium
Hidden threats of coding problems Statements that may cause security problems in specific situations or reduce code readability Low
TABLE I: Classification definition of smart contract problems

A number of recent studies report the possible issues in smart contracts [5, 6, 7, 8, 9, 10]. Base on existing work, we classify them into three categories as listed in Table I.

Security problems. Vulnerabilities in smart contract codes cause developers to suffer losses. For example, DAO, the largest crowdsourcing project in Ethereum, was found to have a re-entrancy vulnerability in its code, resulting in the loss of $12 million worth of ethers in 2016 [11].

Performance problems. This kind of problems increases gas consumption for running contracts. Chen et al. [12] analyzed the deployed smart contracts and found that more than 80% of the smart contracts have performance related problems, even though the codes have been optimized by the recommended compiler.

Hidden threats of coding problems. Statements that may cause penitential security problems in specific situations or reduce code readability are defined hidden threats of coding problems. In general, these problems do not necessarily cause serious problems. However, paying attention to these problems and solving them can make smart contracts safer and easier to maintain.

Some tools have been proposed to check the problems aforementioned in smart contracts [13, 2, 14, 15, 4, 7, 16, 17, 18]. However, most of them can only handle the bytecode of smart contracts. Although processing bytecode directly empowers the tools to analyze all deployed smart contracts, they cannot leverage the useful information in source codes (e.g., naming functions and events) and consequently a tool that can quickly and accurately locate the issues in the source codes of smart contracts would be more useful for the smart contract developers who have the source codes at hand. Moreover, our experiments in Section 5.5.2 and other studies [2, 16, 4, 7] show that the bytecode based tools are less efficient than tools for handling source codes. Although a recent work [3] designed a tool (named SmartCheck) for finding problems from the source codes of smart contracts, existing work has the following limitations:

  1. Existing vulnerability detection criteria is confusing. They do not accurately characterize some problems. For example, an external function call followed by an internal function call is identified as having a re-entrancy vulnerability, which could not accurately capture the characteristics of re-entrancy vulnerability and would cause a large number of misjudgments and omissions.

  2. SmartCheck cannot detect some important security problems, such as integer overflow. Missing this problem may lead to serious consequences. For example, the integer overflow problem led to the big loss of the BEC project.

  3. The detection efficiency of the existing work is very low. SmartCheck runs lexical and grammatical analysis on the Solidity source codes and then generates the corresponding XML parse tree for the source codes. Based on the parse tree, it uses XPath to retrieve the problematic statement [3]. Lexical analysis and grammatical analysis reduce the efficiency of SmartCheck analysis.

To address these limitations, in this paper, we propose, SolidityCheck, a novel approach using regular expressions to quickly and accurately locate 20 kinds of problems in the source codes of smart contracts. In particular, we can prevent two particularly dangerous security problems: (re-entrancy and integer overflow). We also conduct extensive experiments to evaluate the usability, efficiency and effectiveness. In summary, we make the following novel contributions:

  • We propose a new classification criterion, which identifies 20 kinds of code problems that have adverse effects on smart contracts, including several previously undetected ones, covering the vast majority of smart contract problems currently.

  • We detect problematic statements through regular expressions. Most kinds of problematic statements can be accurately detected by regular expressions, and then the test results can be reported to the user for easy modification. Furthermore, two important problems which are difficult to detect and have a significant impact on the security of smart contracts are retrieved by regular expressions, and then these problems are prevented by program instrumentation.

  • We particularly designed a set of experiments to validate SolidityCheck. The experimental results show that our tool is superior to the existing static code analysis tools in recall, precision, and other indexes.

The rest of this paper is organized as follows. Section 2 provides the basic concepts used in this paper. Section 3 proposes a novel classification criterion for smart contract problems and discusses the characteristics of each problem. We detail the design and the implementation of SolidityCheck in Section 4. Section 5 reports our extensive evaluation results of SolidityCheck. We discuss the limitations of SolidityCheck in Section 6. After discussing the related work in Section 7, we conclude the paper and point out the future work in Section 8.

2 Preliminaries

2.1 Smart contract

Smart contracts are computer programs that can automatically execute contract terms [19]. Smart contracts are automatically executed when their execution conditions are satisfied, and the execution results are generated according to the behaviors in the contracts. Using a smart contract to sign a contract can effectively avoid disputes. Blockchain is well suitable for the operating environment of smart contracts because of its decentralization and network-wide consensus. Ethereum smart contract accounts share the same address space with the external accounts, and the smart contract can be invoked by sending transactions to the contract address. To prevent the unwarranted waste of Ethereum’s calculating power, Ethereum collects gas from each executed smart contract statement, which is converted from ethers.

2.2 Solidity

Solidity is the most mainstream, mature and widely used Ethereum smart contract programming language [20]. Unlike the lower-level language, Solidity is a Turing-complete high-level programming language, which is capable of expressing arbitrary complex logic. Smart contracts programmed in Solidity language are compiled into the Ethereum virtual machine bytecodes and running in each Ethereum node. Soliditylanguage is specially developed for the compilation of Ethereum smart contracts. It contains built-in functions to complete various functions of Ethereum. For example, transfer and send functions are used to execute transfer ethers, and keywords such as require and assert are designed for checking status. Solidity is a fast iterative language. The same keyword may have different semantics in different language versions. To improve this situation, when smart contracts in Solidity are written, it is necessary to specify the versions of the compiler that the contract can accept.

3 Classification of Existing Problems in Smart Contracts

Based on existing studies [3, 4, 5, 6, 7, 8, 9, 10], we give a classification criterion for smart contract problematic statements (see Fig. 1 for detail), which summarizes 20 kinds of common problems that need to be detected. Furthermore, we also describe the consequences of each problem and the corresponding detection approach we propose. The corresponding regular expressions we designed for these detection approaches are shown in appendix A.

Fig. 1: A classification criterion for smart contract problems

3.1 Security Problems

Balance equality [3]. An adversary can forcibly send ethers to the attacked contract by mining or via selfdestruct, so that the conditional judgment part in Listing 1 is always false.

    if (this.balance == 1997 ether){
        //do something
    }
Listing 1: balance equality

Mishandled exception [2]. In Ethereum, contracts can call other contracts in several ways (eg., via send, delegatecall or call [6]). If an exception occurs in the callee contract, the call terminates, rolls back the status of the callee contract and returns false. Therefore, the return value of an external call should be checked to properly handle the exception [5, 2]. Listing 2 shows a possible loss scenario in which the contract reduces addr’s holding of tokens when addr fails to receive the transfer for some reasons.

    addr.call.value(1 wei); //transfer 1 wei to addr
    balance[addr] -= 1;     //reduce addr’s tokens
Listing 2: mishandled exceptions

DoS by external contract [3]. External contracts may be maliciously controlled or killed, which may result in the invalidation of some or all functions of this contract. As shown in Listing 3, when the dependent external contract self-destructs, the function getService fails. It is particularly noteworthy that when contracts depend on external libraries, the security of libraries should be carefully reviewed.

  function getService(address _provider, address _customer) public{
    Provider provider = Provider(_provider);
    //if _customer is a user of the service,
    //the _customer can get service of the _provider
    if(provider.isCustomer(_customer)){
        //providing service
    }
  }
Listing 3: dos by external contract

Re-entrancy vulnerability [16]. The re-entrancy vulnerability leads to the dissolution of The DAO and the division of the Ethereum community. Source based code analysis cannot accurately determine whether a statement or a piece of codes introduces re-entrancy vulnerabilities. Listing 4 shows a contract with a re-entrancy vulnerability (hereinafter referred to as an attacked contract). An attacker can write a specific attack contract. He can first deposit ethers into the attacked contract through the attack contract, and then retrieve his deposit by calling withdrawBalance function in the attacked contract. The fallback function of the attacking contract calls again the withdrawBalance function of the attacked contract, but now the balance of the attacking contract has not deductions, consequently it can withdraw many times.

pragma solidity ^0.4.15;
contract Reentrance{
    mapping (address => uint) userBalance;
    function withdrawBalance(){
        //send userBalance[msg.sender] ethers to msg.sender
        //if msg.sender is a contract, it responds with the fallback function.
        if (msg.sender.call.value(userBalance[msg.sender])()){
              throw;
        }
        userBalance[msg.sender] = 0;
    }
}
Listing 4: an smart contract with re-entrancy vulnerability

Using tx.origin for authentication [3]. tx.origin is different from msg.sender (both keywords are provided in Solidity). tx.origin points to the initiator of the transaction, while msg.sender is the sender of the message. tx.origin always points to the external account controlled by the user. As shown in Fig. 2, for Contract , Contract is the msg.sender of this call, and User is the tx.origin of this call.

Fig. 2: Differences between tx.origin and msg.sender

Using tx.origin for authentication, malicious users can easily bypass authentication and steal ethers from the attacked contract after cheating your trust. The contract in Listing 5 is authenticated using tx.origin. If the attacker induces the victim to transfer ethers to the attack contract shown in Listing 6 by various means, the attack contract can steal all deposits of the contract in Listing 5.

contract Attacked{
    address public owner;
    constructor (address _owner){
        owner = _owner;
    }
    function withdrawAll(address _recipient) public{
        require(tx.origin == owner);
        _recipient.transfer(this.balance)
    }
}
Listing 5: victim contract
import ”Attack.sol”;
contract Attacker{
    Attacked attacked;
    address attacker;
    constructor(Attacked _attacked, address _attacker){
        attacked = _attacked;
        attacker = _attacker;
    }
    function () public payable{
        attacked.withdrawAll(attacker);
    }
}
Listing 6: attack contract

Missing constructor. If the developer does not intend to write constructor function, the harm of this problem is very limited (eg., incomplete contract structure). But if the developer intends to write a constructor but write the wrong function name, then any user can call the function (eg., contract in Listing 7), which can cause serious security risks [8, 10]. This paper recommends using constructor keyword to declare constructors, which can effectively avoid the loss caused by the misspelling of constructor names. We check whether there is a constructor in each contract body.

 pragma solidity 0.5.0;
 contract Foo{
    address public owner;   //owner is the owner of the contract
    //Anyone cal now be the owner of the contract because the
    //function name is misspelled.
    function foo() public{
        owner = msg.sender;
    }
 }
Listing 7: harm of misspelling constructor name

Locked money. If a contract needs to receive ethers, at least any function in the contract should be declared as payable. At the same time, at least one statement should be included in the contract to enable the transfer ethers. Otherwise, all ethers in the contract account will be locked and can never be transferred.

Integer overflow. The integer overflow problem exists widely in computer science. Because source based code analysis cannot accurately determine which statement may cause integer overflow, we adopt program instrumentation technique to prevent integer overflow through code insertion.

Unsafe type inference [3]. The keyword var is provided in Solidity language, which automatically assigns types to variables. In Solidity, the type of variable is inferred to be the smallest type of storage space that can accommodate the initial value. As shown in Listing 8, the type of i is matched to uint8, which is the type that can store an initial value of 0 and require the smallest storage space. Using var as a variable matching type can have security risks. The loop shown in Listing 8 is an infinite loop, because uint8 can represent a maximum value of 255, and more than 255 will return to zero. In Ethereum, calls containing infinite loops are not packaged into blocks.

            for (var i = 0; i <= 256; i++){
                //do something
            }
Listing 8: infinite loop caused by unsafe type inference

3.2 Performance Problems

byte[ ]. byte[ ] can play the role of the byte array, but this is a very wasteful storage space, which may lead to much gas consumption [21]. The recommendation approach is to use the bytes type.

Costly loop. The user of calling a contract can specify the number of gases that this call carries before making the call. If gases are sufficient, the remaining gases will be returned by Ethereum after the call is completed. If gases are insufficient, the call will fail and Ethereum will not return the consumed gases. Loops that execute too many statements can lead to excessive costs for a call, and transactions with excessive costs may not be packaged into blocks, which means that transactions will never succeed.

3.3 Hidden threats of coding problems

Timestamp dependence [22]. Miners can control the mining time, thus gaining an unequal competitive advantage (eg., codes in Listing 9). Avoiding contract execution results depends on environmental variables. If necessary, environmental variables are costly to miners (eg., use block.difficulty). Consequently, it is especially noteworthy that now and block.timestamp should not be used as parameters of cryptographic functions, so that the random number generated will be controlled by miners.

         if (now % 2 == 0)
           winner = addr1;
         else
           winner = addr2;
Listing 9: guessing contract affected by miners

Token API violation [3]. Ethereum allows the distribution of tokens. Before May 7, 2019, Ethereum had more than 100,000 token contracts. The ERC20, ERC721, and ERC165 [23, 24, 25] token standards are currently popular token standards that specify the most basic state variables, events, functions, and function return types in token contracts. Throwing an exception in some functions that return a Boolean value is not recommended because throwing an exception prevents the caller from getting a return value, which can lead to dysfunction for the caller. When a function fails, it can tell execution failure by returning a Boolean value.

Using fixed point number type. Solidity supports declaring variables of fixed point number type, but it cannot assign these variables or assign them to other variables [21], so there is no need to use fixed point number type at all.

Private modifier [3]. The private keyword is provided in Ethereum to indicate that the external visibility of a state variable or function is private. But unlike other programming languages, the use of private does not make state variables and functions invisible to the outside world. Miners can view all the codes of the contract and the values of state variables, consequently the password in Listing 10 is available to the miner.

pragma solidity ^0.4.18;
contract Vault{
    bytes32 private password;
    function Vault(bytes32 _password) public payable{
        password = _password;
    }
    //Miners can take all the money of the contract
    function unlock(address _owner, bytes32 _password) public{
        if(password == _password){
            _owner.transfer(this.balance);
        }
    }
}
Listing 10: visible password

Redundant refusal of payment [3]. Starting from Solidity 0.4.0, contracts without the fallback function will reject payment by default. This makes the function in Listing 11 redundant.

    function() external payable{
        revert();
    }
Listing 11: redundant refusal of payment

Compiler version problem [3]. The operator is provided in Solidity to specify that this contract accepts compilation of the specified version number and its subsequent version compiler. However, the future development trend of Solidity is unpredictable and may lead to semantic changes of some statements in future versions. In this way, we using the symbol should be avoided . Several methods for declaring the compiler version are shown in Listing 12. The recommended approach is to use the second or third.

pragma solidity ^0.5.0; //bad: 0.5.0 and above
pragma solidity 0.5.0;  //good: only 0.5.0
pragma solidity >=0.5.0 <0.6.0; //best: 0.5.0 to 0.6.
Listing 12: three ways to declare compiler versions

Style guide violation [3]. In the official development document for Solidity, the declaration and definition of functions, events, and arrays are standardized [21]. We think that the function name shown in the second line of code in Listing 13 is inappropriate because the two function names in Listing 13 do not allow one to understand the difference between their uses. It is recommended that function names begin with lowercase letters, event names begin with uppercase letters, and there is no space between type and left brackets when array declarations are made.

    //the nameing of twn functions is confusing
    function transfer() public{ /*do something*/}
    function _transfer() public{ /*do something*/}
Listing 13: confusing function naming

Integer division. The support for floating-point and decimal types in Ethereum is not perfect. All the results of integer division are rounded down, the use of integer division to calculate the number of ethers may cause economic losses, so it is try to avoid integer division.

Implicit visibility level [3]. Although Solidity provides default visibility for each type of variable and function, explicitly specifying the visibility of each state variable and function improves the readability of the code.

4 SolidityCheck

4.1 Overview of SolidityCheck

Fig. 3: Overview of SolidityCheck

It is non-trivial to develop a tool that leverages regular expressions to locate the problems described in Section 3 because of the following three reasons.

First, the format of the source codes may not be suitable for the regular expressions, which usually handle code statements written in one line, and different programming habits make the source codes format different. The input source codes needs to be formatted to facilitate regular expression retrieval and matching. Consequently, we need implement the appropriate formatting method.

Second, regular matching of every line of codes will bring huge performance burden and make the retrieval efficiency extremely poor. Because regular expressions are usually based on the NFA (non-deterministic finite automaton) engine and are implemented by the “matching backtracking” algorithm. However, NFA backtracking allows it to access the same state multiple times (if it arrives at that state through different paths). Therefore, in the worst case, it may be very slow to execute and spend a lot of CPU resources. To avoid the problem, we need to reduce the number of code statements that need regular matching without missing the problematic statements. We use keyword filtering to reduce unnecessary regular matching (statements with different kinds of problems always contain different specific characters).

Third, regular expressions are only suitable for detecting problems that exist within a single line of statements, but they are powerless for problems that span multiple lines of codes, such as costly loop. Some programming tricks are needed to make the detection ability of regular expressions span multiple lines. For example, we use bracket matching to get the start and end positions of a loop statement.

The main process of SolidityCheck is divided into four steps, shown in Fig. 3. The first step is formatting codes, which enables regular expressions to easily detect the sentence characteristics of each code, and improves detection efficiency. The second step is keyword filtering, SolidityCheck extracts statements that may contain problems according to different keywords. Then, according to the functions selected by users, the filtered codes are processed in the third step, detection and prevention. In this step, problems are detected or prevented according to the functions selected by users. The final step is detection report and preventive contract, in this step problems detection report or preventive contract is output. The details of these steps are described in the following four subsections.

4.2 Formatting Codes

The source code format of the smart contract is closely related to the programming habits of the developers, making the source codes in a variety of formats. The format in Listing 14 is Solidity’s official recommendation for function header and its parameters declaration style. However, such kind of source codes is extremely unfriendly for regular expression matching because a complete statement has been written in several lines. Consequently, before retrieving the problematic statements, SolidityCheck pre-processes the format of the source codes to write a sentence expressing complete semantics in one line. In general, the criteria for code formatting are described as follows:

  1. All comments and blank lines in the original contract are filtered. The extra spaces in the statement are also eliminated. SoliditCheck first stores the source codes in a string array by line, and then checks each item in the array sequentially. If there is a ”//” sub-string in one line, all comment characters in that line are replaced by ” ” (including ”//”); if there is a ”/*” sub-string in one line, all characters are replaced by ” ” before the next ”*/” sub-string appears, and then ”/*” and ”*/” substrings are also be replaced by ” ”. After that, all comments in the source codes are filtered out, and then SolidityCheck discards all blank lines and transfers the processed source codes to another string array.

  2. Each formatted line of codes ends with a semicolon (;) or a left bracket ({) or a right bracket (}). In Solidity, the definitions of any contract header, function header and function modifier header are marked by left curly brackets ({) to end the statement, the terminations of contract body, function body, and function modifier body are marked by right curly brackets (}), while any other statement ends with semicolon (;). SolidityCheck gets the source codes into a string, replaces all line breaks with ” ”,. Then it scans the string sequentially, and adds a line break characters after encountering left brackets ({), right brackets (}) or semicolons (;). After this processing, all statements except for-statement are written in one line.

  3. After the second step, a for-statement spans three lines, which is not conducive to regular expression matching. so the semicolon (;) in the for-statement is specially handled. SolidityCheck retrieves the for-statements, and then replaces the first two line breaks characters in the for statement with ” ” so that the for-statement is written in one line.

Listing 14 shows the codes before preprocessing, and Listing 15 shows the codes after the same contract is preprocessed. It is obvious that the number of lines of the codes to be detected is significantly reduced, and the format of the codes becomes easy for feature matching of regular expressions.

    function deposit(
        address to,
        uint256 amount
        ){
        //Receiving address: to,Number: amount
        userBalance[to] += amount;
    }
Listing 14: codes before contract preprocessing keywordstyle=
    function deposit(address to,uint256 amount){
        userBalance[to] += amount;
    }
Listing 15: codes after contract preprocessing

4.3 Keyword filtering

Soliditycheck does not regular match every line of codes, which greatly reduces the detection efficiency. We use keyword filtering to improve detection efficiency. There are specific sub-strings in regular expressions that we used to describe the characteristics of various problematic statements. A code statement that contains a certain type of string may have a type of problem. Through keyword filtering, we can effectively reduce the number of statements that need regular matching without increasing missed judgment. We illustrate keyword filtering with an example. The criteria for costly loop problems is described as follows:

  • The conditional part of for-statement or while-statement contains function calls or identifiers.

  • for-statement or while-statement with a maximum number of statements executed exceeding 23 (The reason is described in Appendix B) .

From the above criteria, we can conclude that for-statements and while-statements may be costly loops and statements that do not include “for” or “while” must not be costly loops. Therefore, only the statements containing the keywords ”for” and ”while” can be matched with the features of costly loop problems.

4.4 Detection and Prevention

This step includes two sub-steps: detection and prevention. For the detection step, we detect 18 kinds of problems besides re-entrancy vulnerability and integer overflow. For the prevention step, we use program instrumentation to prevent the re-entrancy vulnerability and integer overflow problem. The reason is that we cannot accurately detect these two problems through regular expression matching, but reporting all suspicious statements reduces the guidance of the detection results. Consequently, SolidityCheck combines regular expressions and program instrumentation to achieve the purpose of prevention.

4.4.1 Detection

During the detection step, SolidityCheck matches problematic statements using regular expressions. Soliditycheck distributes formatted codes to 18 problem detection classes, each of which detects only one type of problems. In each problem detection class, the formatting codes are stored in an array of strings, and the program traverses line by line. The program first filters the code statements to be regularly matched by keywords filtering, and then matches the filtered statements according to the different regular expressions and corresponding detection rules we defined for each problem. If there is a problem, the line number of the codes is recorded, and the detection results of the 18 problem detection classes are summarized into a text file, which is the detect report.

4.4.2 Prevention

Prevention is further divided into re-entrancy vulnerability and integer overflow problems.
A. Re-entrancy vulnerability prevention.
During the re-entrancy vulnerability prevention step, SolidityCheck matches and inserts the codes using regular expressions. Re-entrancy vulnerabilities are very special and dangerous. According to the harm of re-entrancy vulnerability, we divide re-entrancy vulnerability into the following two categories:

  1. Re-entrancy vulnerability with no ether transfer. This kind of re-entrancy vulnerabilities is called by a call to the fallback function of the contract, carrying more than 2300 gas but not sending ether. Such vulnerabilities can cause the attack contract to repeatedly enter the attacked contract to perform operations, which may cause the state variable to be changed multiple times.

  2. Re-entrancy vulnerability with ether transfer. This kind of re-entrancy vulnerabilities are the most dangerous, usually resulting in a total loss of contract balance, and it has the following four characteristics:

    1. Using call to transfer ethers.

    2. Unrestricted gas.

    3. Deducing the balance after the transfer is completed

    4. Call does not specify which function of the receiver will be called. Consequently, the contract uses the fallback function to respond to the transfer.

Detecting a line or a piece of codes does not mean accurately detecting re-entrancy vulnerabilities. Current smart contract detection tools cannot fully cover every re-entrancy vulnerability in every contract. Unfortunately, missing any re-entrancy vulnerability can cause a devastating blow to the contract. At present, the main way to detect re-entrancy vulnerabilities is to report every sentence that may introduce re-entrancy vulnerabilities, but this may lead to a lot of false positives and make the test results not instructive.

Our approach aims at the re-entrancy vulnerability with ether transfer. First, we define the dangerous statement, which is the source of the re-entrancy vulnerability. A statement that contains the following three characteristics is called a dangerous statement:

  1. Using call to transfer ether.

  2. Unrestricted gas.

  3. Call does not specify which function of the receiver will be called.

Dangerous statements are the source of re-entrancy vulnerabilities, but this does not mean that using dangerous statements necessarily includes re-entrancy vulnerabilities. Deducting the balance before the transfer can effectively avoid re-entrancy attacks. Codes in Listing 16 can prevent re-entrancy attacks.

1    function withdrawBalance_fixed(){
2        //to protect against re-entrancy, the state variable
3        //has to be changed before the call
4        uint amount = userBalance[msg.sender];
5
6        userBalance[msg.sender] = 0;    //First,deduction
7
8        if(!(msg.sender.call.value(amount)())){
9            throw;  //After,transfer
10        }
11    }
Listing 16: codes that effectively prevents re-entrancy keywordstyle=

We use regular expressions to define dangerous statements: statements that match feature 4.1 or feature 4.2.

(4.1)
(4.2)

Now we can match all the dangerous statements with regular expressions. To prevent re-entrancy vulnerability, we need to insert specially constructed statements at some locations in the contract. The purpose of the insert statements is to terminate the operation before the transfer if there is transfer before deduction. The insertion statement will not interfere with the normal operation of the contract without the feature of first transfer and then deduction.

Second, for the convenience of describing the locations of the insertion statements, the concept of the function call chain is introduced. In Listing 17, function C contains re-entrancy vulnerabilities, but it is an internal function and cannot be invoked by an external contract. This does not mean that the contract can be protected from re-entrancy attacks, because an attacker can invoke function C by calling function A, and can also launch re-entrancy attacks.

    contract example_1{
        mapping(address => uint256) userBalance;
        function A() public{
            B();
        }
        function B() internal{
            C();
        }
        function C() internal{
            msg.sender.call.value(1)();
            userBalance[msg.sender] -= 1;
        }
    }
Listing 17: call chain schematic code

Calls between functions constitute a call chain, and as long as any function in the chain has re-entrancy vulnerabilities, an attacker can achieve the effect of launching re-entrancy attacks by calling the prefix function in the chain of the function.

In our approach, function A is called chain-head function, and function C is called chain-tail function. All chain-tail functions are functions which contain dangerous statement. Chain-tail functions are called direct call function, and other functions in the chain are called indirect call function except chain-tail functions.

By defining the call chain, we can accurately describe where the prevention statements (we call the prevention statements vaccines) are inserted.

Then, we define the ledger: the variable used to record the correspondence between the address and the number of tokens the address holds is called the ledger. There are many variables in a contract that can act as a ledger, and due to the lack of available tools to determine which variable is the ledger. Therefore, we set the ledger as the mapping (address uint256) variable of the first declaration in the contract, which is based on our own experience.

Now, we describe the structure of 4 vaccines and the insertion positions of different vaccines in table II. The original codes move backwards after insertion.

Code type Composition structure Insertion position
A if(Bexe == 0) { Bexe = ledger[etherReceiver];} first line of direct call functions and first line of chain-head functions
B Aexe = ledger[etherReceiver]; require(Aexe<Bexe); The front line of a dangerous statement of direct call functions
C Aexe = 0; Bexe = 0; The next line of a dangerous statement of direct call functions
D uint256 Aexe = 0; uint256 Bexe = 0; First line of the contract
TABLE II: Insertion code composition and insertion location

Table III explains the purpose of inserting four types of code.

Code type Insertion purpose
A Number of tokens held at the ethers receiving address at the beginning of the transfer business
B Using to obtain the number of tokens held at the address of receiving ethers before the transfer is initiated. If the number of tokens does not decrease at this time, execution will be aborted
C Resetting Aexe, Bexe
D Declaring Aexe, Bexe
TABLE III: Insertion purpose of 4 kinds of codes

By inserting four vaccines, vaccines are able to abort and roll back the operation on time if the contract used dangerous statements and contained the feature balance deducted after transfer.

Because the re-entrancy vulnerability prevention function does not apply to all contracts (e.g., those contracts that do not declare ledger), to facilitate use, we list the switch of this function separately. In this way, developers can use it according to their circumstances.

Besides, SolidityCheck inserts functions (named deposit_test) into the contract. Calling the function in a private chain environment and sending enough ethers can achieve the effect of detecting re-entrancy vulnerabilities (by observing the results of function execution).

B. Integer overflow problem prevention.
During the integer overflow problem prevention step, SolidityCheck matches and inserts the codes using regular expressions.

The BEC project was officially launched on February 23, 2018, with a maximum market value of more than $28 billion. However, two months after its launch, the attacker found that there was an integer overflow problem in the BEC contract, and launched an attack against the problem, leading to an unlimited issue of BEC tokens, which eventually triggered a wave of selling. The final result was that the market value of BEC token was almost zero.

Listing 18 shows the BEC source codes [26] with an integer overflow problem, and in line 3 there is an integer overflow problem.

1 function batchTransfer(address[] _receivers, uint256 _value) public whenNotPaused returns (bool) {
2    uint cnt = _receivers.length;
3    uint256 amount = uint256(cnt) * _value;
4    require(cnt > 0 && cnt <= 20);
5    require(_value > 0 && balances[msg.sender] >= amount);
6
7    balances[msg.sender] = balances[msg.sender].sub(amount);
8    for (uint i = 0; i < cnt; i++) {
9        balances[_receivers[i]] = balances[_receivers[i]].add(_value);
10        Transfer(msg.sender, _receivers[i], _value);
11    }
12    return true;
13}
Listing 18: BEC source code

Detection of integer overflow depends on logic analysis and semantics understanding of codes. It is difficult to determine that any integer operation statement has the risk of integer overflow.

At present, the common method to prevent integer overflow in the Ethereum is to use SafeMath library 111https://ethereumdev.io/safemath-protect-overflows/ for integer operation. In Listing 18, lines 7 to 9 use SafeMath library functions for addition and subtraction. The SafeMath library has several versions of implementations, and part codes of the most popular implementation are shown in Listing 19 [27].

1 function add(uint256 a, uint256 b) internal pure returns (uint256) {
2    uint256 c = a + b;
3    require(c >= a, ”SafeMath:additionoverflow”);
4
5    return c;
6    }
Listing 19: the part codes of SafeMath

The integer overflow prevention draws on the idea of SafeMath library. For each integer operation statement, different verification codes are inserted before and after the statement to verify whether the results of this integer operation is correct, and the verification code can terminate in time after the overflow occurs. It is equivalent to actively implement the function of SafeMath library. When the program is implemented, the program will not add the verification codes in the SafeMash library to avoid unnecessary gas consumption.

First, SolidityCheck captures statements with the characteristics of 4.3 or 4.4:

(4.3)
(4.4)

The sentence structure defined in Formula 4.3 is as follows:

(4.5)

The sentence structure defined in Formula 4.4 is as follows:

(4.6)

For statements with different operations, the composition of the preventive codes inserted after the statement is shown in Table IV.

Statement type Preventive code (after) composition architecture
TABLE IV: The Corresponding Relation between Integer Operational Code and Preventive Code(after)

Table V shows the two statement types and the code inserted before the statement. Among them, the count part of variable naming is an integer that grows by 1 for each insertion of code into a particular statement, starting from 1, to prevent repeated declarations of variables. If the code is inserted before and after the integer operation code, the variable names before and after the code correspond.

As with the function of preventing re-entrancy vulnerability, it is not appropriate to insert codes into each tested contract (increasing gas consumption). Consequently, we also list the functions of preventing integer overflow separately, and users can choose according to their situation.

Statement type Preventive code (before) composition architecture
TABLE V: The Corresponding Relation between Integer Operational Code and Preventive Code(before)

4.5 Detection Report and Preventive Contract

SolidityCheck outputs two different files depending on the functionality selected by the user.

  1. Detection report. That is the detection report of 18 kinds of problems besides re-entrancy vulnerability and integer overflow problem.

  2. Preventive contracts. A contract that prevents problems after inserting code.

A preventive contract is a contract after inserting codes base on the original contract. SolidityCheck prints out the number of lines of inserted codes to help users to know the location.

Listing 20 shows the specific format of the test report.

<Detect Report>
    <Reporting information>
        <Smart contract file path/>
        <Number of lines of original contract code/>
        <Detection time/>
        <Total number of problematic  statements/>
    </Reporting information>
    <Details of the  problem>
        < Problem number/>
        < Problem name/>
        < Problem code line number/>
         < Problem description/>
        <Suggested modifications/>
    </Details of the  problem>
    <!– Eliminate the details of the next 17  problems –>
</Detect Report>
Listing 20: detection report format

Listing 21 shows a prevention contract (the original contract: Reentrancy.sol from not-so-smart-contracts [28]). As shown in listing 21, SolidityCheck does not insert codes in function withdrawBalanc_fixed2, the inserted codes prevent the re-entrancy vulnerability in function withdrawBalance and the inserted codes do not affect the execution of function withdrawBalanc_fixed, while the inserted function deposit_test detects the re-entrany vulnerability.

 pragma solidity ^0.4.15;
 contract Reentrance {
    uint256 public Aexe=0;
    uint256 public Bexe=0;
    mapping (address => uint) userBalance;
    function getBalance(address u) constant     returns(uint){
        return userBalance[u];
    }
    function addToBalance() payable{
        userBalance[msg.sender] += msg.value;
    }
    function withdrawBalance(){
        if(Bexe==0){
            Bexe=userBalance[msg.sender];
        }
        Aexe=userBalance[msg.sender];
        require(Aexe<Bexe);
        if(!(msg.sender.call.value(userBalance[msg.sender])())){
           Aexe=0;
           Bexe=0;
        throw;
        }
        userBalance[msg.sender] = 0;
    }
    function withdrawBalance_fixed(){
        if(Bexe==0){
            Bexe=userBalance[msg.sender];
        }
        uint amount = userBalance[msg.sender];
        userBalance[msg.sender] = 0;
        Aexe=userBalance[msg.sender];
        require(Aexe<Bexe);
        if(!(msg.sender.call.value(amount)())){
            Aexe=0;
            Bexe=0;
            throw;
        }
    }
    function withdrawBalance_fixed_2(){
    msg.sender.transfer(userBalance[msg.sender]);
        userBalance[msg.sender] = 0;
    }
    function deposit_test() public payable{
        userBalance[msg.sender]+=msg.value;
        withdrawBalance();
        withdrawBalance_fixed();
    }
}
Listing 21: A contract to prevent re-entrancy vulnerability

5 Experimental Design

5.1 Research Questions

In this section, a series of experiments is conducted to validate SolidityCheck based on a large amount of data sets collected by us. The purpose of the experiments is to explore the following four research questions:

  • RQ1: Can SolidityCheck detect any smart contract written in Solidity language and correctly output detection reports?

  • RQ2: Is SolidityCheck more efficient than similar tools?

  • RQ3: Is the detection quality of SolidityCheck better than that of similar tools?

  • RQ4: Can SolidityCheck prevents important vulnerabilities such as re-entrancy vulnerabilities and integer overflow problems?

We designed RQ1 to validate the usability of SolidityCheck. RQ2 is used to investigate whether the detection efficiency of SolidityCheck is higher than that of similar tools. RQ3 is used to verify the detection quality of SolidityCheck. We use recall and precision to judge the detection quality. RQ4 is used to validate the effectiveness of our problem prevention ability.

5.2 Experimental Data Set

Using Web crawler technology, we collected 1363 smart contracts written in Solidity from the etherscan.io[29], totaling 1,239,927 lines of codes. To understand the size of the contract in the data set, we counted the number of code lines of each contract, and the results are shown in Table VI.

Size Number Proportion
0-500 699 51.3%
500-1000 334 24.5%
1000+ 329 24.2%
TABLE VI: Distribution of contract code lines

5.3 Contrast Tools

Based on the theory described in Section 4, we implemented a tool named SolidityCheck

for Solidity language, which is now open source 

222https://github.com/xf97/SolidityCheck. Appendix C provides the implementation details of the tool.

To measure SolidityCheck’s capability, we select several state-of-the-art tools for comparison. Because the software and security engineering research community has relied on free and open source software [10] for a long time, we choose comparison tools from open source projects on github [30]. According to their popularity, We select the following eight tools:

  1. Remix[31]. The Solidityintegrated development environment officially recommended by Ethereum. We use the Solidity static analysis function of Remix.

  2. Mythx [32]. Security analysis tool for EVM bytecode. Supporting smart contracts built for Ethereum. MythX uses Mythril and other non open source tools like a static analysis tool - Maru and a greybox fuzzer - Harvey, so it detects a wide range of vulnerabilities. There are many implementations available for Mythx, and we use Pythx (a library for the MythX smart contract security analysis platform.).

  3. Oyente[33]. The commonly used static code analysis tool for Ethereum smart contracts was put forward earlier in the same kind of tools [2].

  4. Solhint [34]. Solhint is an open source project for linting Solidity code. This project provides both security and style Guide validations.

  5. Securify [35]. The project sponsored by the Ethereum Foundation is a security scanner for Ethereum smart contracts [16].

  6. SmartCheck [36], Static code analysis tool for Ethereum smart contracts [3].

  7. ContractFuzzer [37]. The Ethereum smart contract fuzzer for security vulnerability detection [22].

  8. Osiris [38]. A tool to detect integer bugs in Ethereum smart contracts [7].

5.4 Experimental Environment

Our experimental environment is a computer running Ubuntu (Ubuntu 16.04) system. The memory is 32GB, the CPU is Inter Xeon Silver 4108, and the GPU is NVIDIA Quadro P4000. In the quality experiment, we installed different versions of solc through docker and obtained the bytecode file of each contract.

5.5 Experimental Results

5.5.1 Usability

To answer RQ1, we used the problem detection function of SolidityCheck to test 1363 smart contracts and verify whether SolidityCheck correctly outputs the detection report of each smart contract. The experimental results show that SolidityCheck correctly outputs the detection report of each smart contract. Furthermore, we use the following indicators to measure the proportion of problems in the experimental data set. ContractsNumber is the number of contracts containing one problem, and NumberOfTestContracts is the number of contracts being tested.

(5.1)

According to our experiments, most of the contracts have problems, 61% of them have reported 10 or more problems. Because the number of different problems varies greatly in the data set, we present some maximal or minimal values as follows: Compiler version problem–99.8%, Implicit visibility level–82.1%, Using fixed point number type–0.1%, byte[ ]–0%, Balance equality–0%, Unsafe type infefrence–0%. The proportion of the remaining 11 problems in the data set is shown in Fig. 4.

Fig. 4: The proportion of 12 problems in experimental data sets

Since most contracts do not introduce a new version of the security specification statement into the contract, most contracts are reported to contain this problem. The number of the hidden threats of coding problems is also high. Compared with security problems, the external visibility of state variables and functions is trivial and less valued, consequently, it is common in smart contracts. The last few problems are about statements that are used less frequently, or that most people know are wrong, and the number of occurrences is very small. This is also in line with our expectations before the test, and to some extent reflects the credibility of SolidityCheck.

More importantly, SolidityCheck found two contracts with re-entrancy vulnerabilities, 18 contracts with miner-controlled random numbers and 10 contracts with locked money. After our manual verification, all problems are true and have not been reported before.

5.5.2 Efficiency

To answer RQ2, we design a series of experiments to record the time consumed and also compare with other tools. For bytecode-based tools such as Oyente, the measure of detection efficiency is the average detection time of each contract. For source-based tools such as SmartCheck, the measure of detection efficiency is the number of lines of codes detected per second. In this way, we design two groups of experiments. In the first group, we used all the tools to test the 50 contracts we randomly selected to measure the average time each tool spent per contract. In the second group, we used SolidityCheck and SmartCheck to test the whole experimental data set and measure the number of lines of codes per second for each tool.

Experiment 1. Because some of the tools have exceptions when contracts are analyzed, and these exceptions can cause analysis interruptions. Consequently, we use the quotient of the total time and the number of contracts successful analyzed as the average time. Table VII shows the average detection time for each tool to detect a contract. 333Remix: we count the run time of compilation and solidity static analysis 444Mythx: we use Pythx and only count runTime

ContractFuzzer Mythx Osiris Oyente Securify Remix SmartCheck Solhint SolidityCheck
time(s) 135.5 86.7 80.90 56.27 52.14 7.63 1.83 0.57 0.23
TABLE VII: Average detecting time for a contract

Experiment 2. We define the following indicator to measure the detection efficiency of different tools. The rps value is a quotient of the total number of lines of codes for all tested contracts and the total time consumed to test all contracts.

(5.2)

The experimental result shows that SmartCheck takes 2491.33 seconds to test the whole experimental data set, while SolidityCheck takes 318.02 seconds. The rps values of the two tools are shown in Fig. 5. The detection efficiency of SolidityCheck is 683.4% higher than that of SmartCheck.

Fig. 5: Performance performance of SmartCheck and SolidityCheck

5.5.3 Quality

To answer RQ3, we design a series of experiments to measure recall and precision of SolidityCheck. We randomly selected 10 contracts from the entire experimental data set as test cases to accurately measure the quality of each tool. We determine the number of actual problems in each contract by manual review. To ensure the accuracy of manual review, we invite trained researchers to assist in identifying problems in the contract.

First, several experimental indicators are defined, as shown in Table VIII. These indicators refer to:

  • TP (True Positive) means the problem actually exists and the tool report exists.

  • FN (Fasle Negative) means the problem does not exist but the tool report exists.

  • FP (False Positive) means the problem actually exists but the tool report does not exist.

For Ethereum smart contracts, missed judgment is much more serious than misjudgment. Because smart contracts are mostly hundreds of lines of codes, it is not difficult to manually review each reported problem, but missing any one of them could be a fatal blow to smart contracts.

Actual existence Non-existence
Detect problem TP FN
Not detected FP TN
TABLE VIII: Definitions of accurate detection, misjudgement and misjudgement

Formulas 5.3 and 5.4 give the definition of recall and precision of a single contract.

(5.3)
(5.4)

The recall and precision for each tool are presented in Table IX. In Table IX, the error means an error occurred when the tool detected the contract so that it could not output the analysis result, and the N/A represents the numerator is zero.

Test case ContractFuzzer Mythx Osiris Oyente Securify Remix SmartCheck Solhint SolidityCheck
1 TP/FP/FN 0/37/0 1/36/1 error error 0/37/4 19/18/2 21/16/0 0/37/0 37/0/1
recall(%) 0 2.7 N/A N/A 0 51.4 56.8 0 100
precision(%) N/A 50 N/A N/A 0 90.47 100 N/A 97.4
2 TP/FP/FN 0/4/0 1/3/2 error error 0/4/1 0/4/2 4/0/0 0/4/0 4/0/4
recall(%) 0 25 N/A N/A 0 0 100 0 100
precision(%) N/A 33.3 N/A N/A 0 0 100 N/A 50
3 TP/FP/FN 0/5/0 0/5/10 error error 0/5/1 0/5/0 1/4/1 0/5/0 3/2/0
recall(%) 0 0 N/A N/A 0 0 20 0 60
precision(%) N/A N/A N/A N/A 0 N/A 100 N/A 97.4
4 TP/FP/FN 0/79/0 0/79/1 error error 0/79/2 1/78/1 42/37/4 0/79/0 79/0/8
recall(%) 0 0 N/A N/A 0 1.3 53.2 0 100
precision(%) N/A 0 N/A N/A 0 50 91.3 N/A 90.8
5 TP/FP/FN 0/28/0 1/27/0 0/28/0 0/28/3 0/28/0 0/28/0 19/9/0 0/28/0 27/1/1
recall(%) 0 3.6 0 0 0 0 67.9 0 96.4
precision(%) N/A 100 N/A 0 N/A 0 100 N/A 96.4
6 TP/FP/FN 0/4/0 1/3/1 error error error 2/2/2 2/2/2 0/4/0 4/0/8
recall(%) 0 25 N/A N/A N/A 50 50 0 100
precision(%) N/A 50 N/A N/A N/A 50 50 N/A 33.3
7 TP/FP/FN 0/8/0 1/7/2 error error 0/8/2 1/7/1 4/4/0 0/8/0 7/1/2
recall(%) 0 12.5 N/A N/A 0 12.5 50 0 87.5
precision(%) N/A 33.3 N/A N/A 0 50 100 N/A 77.8
8 TP/FP/FN 1/19/0 3/17/0 error error 0/20/1 0/20/2 6/14/0 0/20/0 17/3/1
recall(%) 5 15 N/A N/A 0 0 30 0 85
precision(%) 100 100 N/A N/A 0 0 100 N/A 94.4
9 TP/FP/FN 0/45/0 0/45/0 error error 0/45/1 0/45/0 18/27/2 0/45/0 45/0/0
recall(%) 0 0 N/A N/A 0 0 40 0 100
precision(%) N/A N/A N/A N/A 0 N/A 90 N/A 100
10 TP/FP/FN 0/2/0 1/1/0 error error 0/2/1 0/2/0 1/1/0 0/2/0 2/0/2
recall(%) 0 50 N/A N/A 0 0 50 0 100
precision(%) N/A 100 N/A N/A 0 N/A 100 N/A 50
Overall TP/FP/FN 1/231/0 9/223/7 0/232/0 0/232/3 0/232/13 23/209/10 118/114/9 0/232/0 225/7/27
recall(%) 0.43 3.9 0 0 0 9.9 50.9 0 97.0
precision(%) 100 56.25 N/A 0 0 69.7 92.9 N/A 86.3
TABLE IX: Comparative results of recall and precision for different tools

We analyze the experimental results, SolidityCheck performs best in recall, followed by SmartCheck. SmartCheck and SolidityCheck are essentially static analysis approaches based on source codes. The reason for the difference in recall rate is that SolidityCheck’s problem detection criteria are more accurate and reasonable, and consequently SmartCheck may miss some statements that SolidityCheck think are problematic. In terms of precision, SmartCheck is the best (ContractFuzzer reports too few problems), followed by SolidityCheck. The reason for the low accuracy of SolidityCheck is that SolidityCheck’s checking strategy is biased towards recall. SolidityCheck reports some statements with vague characteristics of the problems, so there are relatively more misjudgements. In general, we think that the higher recall rate brings more benefits than the lower precision rate, and the gap between SolidityCheck and SmartCheck is not obvious (6.6%). Consequently, we think that the overall detection quality of SolidityCheck is better.

5.5.4 Important Vulnerabilities

To answer RQ4, we design a set of experiments to detect some important vulnerabilities, such as re-entrancy vulnerability and integer overflow problems. Because SolidityCheck uses program instrumentation to detect these two problems while the other contrast tools only scan codes, to unify evaluation indicator, we use the following criteria for the “effectiveness” of different tools:

  • For SolidityCheck, inserted codes effectively prevent problems from occurring.

  • For other tools, the correct location of the problem was reported.

The first experiment is performed for the re-entrancy vulnerability problem. We used Reentrancy.sol of not-so-smart-contracts [28] and Reentrance.sol of Ethernaut [39] as test cases, both of which are representative and problematic smart contracts. Y represents the effective response of the tool, and N represents not, and we use N/A to flag a tool to analyze the contract failure. The experimental results are shown in Table X.

Solhint SmartCheck Mythx Oyente Osiris Securify ContractFuzzer Remix SolidityCheck
Reentrancy.sol N N N Y Y Y Y Y Y
Reentrance.sol N N N N/A N/A N/A Y Y Y
TABLE X: Experiment results of re-entrancy vulnerability

It is noteworthy that Securify reports that all statements that use call instruction to send ethers may introduce re-entrancy problems. Remix reports that all transfer statements that do not comply with the CEI (Checks-Effects-Interaction) mode may introduce re-entrancy vulnerabilities, even if the statement uses transfer instructions to send ethers.

In the second experiment, we use the integer_overflow_1.sol (Abbreviation:Interger.sol) contract of not-so-smart-contracts project [28], Token.sol of Ethernaut [39] and the BEC contract (Abbreviation:BEC.sol) source code [26] as test cases. BEC contract is the smart contract that has lost the most because of the integer overflow problem so far. The experimental results are shown in Table XI. Y represents the effective response of the tool, while N represents not, and we use N/A to indicate that a tool occurs a failure when the contract is analyzed.

Solhint ContractFuzzer Securify Remix SmartCheck Mythx Oyente SolidityCheck Osiris
BEC.sol N N N N N N N Y Y
Integer.sol N N N N N Y Y Y Y
Token.sol N N N N N Y Y Y Y
TABLE XI: Experimental results of integer overflow problem

It can be seen from the experimental results. For re-entrancy vulnerabilities problem, Securify, Remix, and SolidityCheck have responded effectively. For integer overflow problems, only SolidityCheck effectively handles all integer overflows and Mythx misses one. Listing 22 shows the codes inserted into the BEC contract by SolidityCheck. The codes in line 3 inserted by SolidityCheck effectively prevents the integer overflow problem in line 2, which is the exact problem statement that causes the attack of the BEC contract.

1    uint cnt = _receivers.length;
2    uint256 amount = uint256(cnt) * _value;
3    require(uint256(cnt)==0 || amount/uint256(cnt)==_value);
Listing 22: the part of codes after inserting code into the BEC contract

5.5.5 Answers to Research Questions

We designed a set of experiments to obtain the answers for the four research questions raised in Section 5.1. Now, based on our experimental results, we give corresponding answers to these questions.

  • Answer1: SolidityCheck can generate detection results for any smart contract developed in Solidity language without restricting the programming style and lines of codes of the contract under test.

  • Answer2: SolidityCheck spends the least time among all tools. In comparison with SmartCheck, the detection efficiency of SolidityCheck is also significantly better than that of SmartCheck.

  • Answer3: According to the problematic statement standard defined by us, SolidityCheck is significantly better than the contrast tools in recall rate, while SolidityCheck lags behind SmartCheck in precision rate only a little. But for smart contracts, recall is far more important than precision, and we think that the overall detection quality of SolidityCheck is better than similar tools.

  • Answer4: In our experiments, SolidityCheck is the only tool that can resist both re-entrancy vulnerabilities and integer overflow problems.

6 Discussion

Source code analysis pursues detection efficiency and problem recall rate. Our desire is to bring better user experience and more complete security for smart contract developers. However, SolidityCheck still has its own shortcoming in its current state. We summarize the existing problems and the corresponding solutions in the following:

  1. SolidityCheck cannot feedback detection information, which makes developers have to use the editor again to modify the problematic statement after the detect results are gotten, which brings bad user experience. We plan to make the feedback of SolidityCheck’s detect results timely and in a more convenient way in the future.

  2. The rapid development of smart contract field has brought endless security accidents and a large number of related literature. Many smart contract problems that we do not find at the beginning are discovered now. Based on our capabilities, we cannot always keep updating our target problems set while studying SolidityCheck, which makes it impossible for SolidityCheck to detect some smart contract problems. We plan to continue updating SolidityCheck in the future so that it can detect more types of problems.

  3. Source code based analysis can bring a good problem recall rate, which is very important for smart contracts. However, this does not mean that accuracy rate should be ignored, the low accuracy rate may make the results of SolidityCheck not instructive. In the following study, we will provide more accurate regular expressions, more technical means to continuously enhance the accuracy of SolidityCheck.

  4. Our re-entrancy vulnerability prevention approach can address the most dangerous type of re-entrany vulnerability (re-entrancy vulnerability with ether transfer), but it does not mean that another type of re-entrancy vulnerability (re-entrancy vulnerability with no ether transfer) can be ignored. We plan to further study new approach to detect or prevent this type of re-entrancy vulnerability in the future.

7 Related work

7.1 Safety status of smart contracts in Ethereum

Some researchers focus on investigating on the security status of Ethereum smart contracts. According to their study, we can understand the worrying security status of Ethereum smart contracts. Destefanis et al. [40] investigated the freezing accident of parity wallet in Ethereum, put forward the smart contract programming mode which should be avoided centrally, and finally put forward the necessity of building block chain software engineering. Atzei et al. [20] analyzed the academic literature, blogs and forums in the field of smart contract security in Ethereum, combined with their programming experience, expounded the vulnerabilities of Ethereum and its mainstream smart contract programming language (Solidity), and proposed the classification of common programming vulnerabilities that may lead to vulnerabilities. Nikolić et al. [4] described vulnerabilities in smart contracts as traceable attributes, and tried to find greedy (locked money), prodigal (which may leak ethers to any user) and suicidal (which can be killed by anyone) smart contracts through cross-contract symbolic analysis and verification. They implemented MAIAN, and with this tool, they succeeded in finding vulnerabilities in parity wallet. Wang et al. [41] proposed a research framework for smart contracts based on a six-layer architecture and described the problems existing in smart contracts in terms of contract vulnerability, limitations of the blockchain, privacy, and law. Through interviews with smart contract developers, Zou et al. [42] revealed that smart contract developers still face many challenges when developing contracts, such as rudimentary development tools, limited programming languages and Ethereum virtual machines, and difficulties in dealing with performance issues. However, in our opinion, these studies do not propose an effective classification criterion for smart contract problems, a general classification criterion for Solidity language is still lacking.

7.2 Ethereum virtual machine bytecodes

Analyzing the security of smart contracts based on the bytecodes of the EVM (Ethereum virtual machine) is the mainstream methods at present. Grishchenko et al. [13] proposed the first small semantics of EVM by formalizing bytecodes, obtained executable codes, and successfully verified the official EVM test suite. Luu et al. [2] used attribute analysis to verify the security of smart contracts and developed Oyente to detect security vulnerabilities in smart contracts. Albert et al. [14] introduced the advanced validation engine used to validate the C language program into the security validation of smart contracts, validated the security of contracts through the C language program validation engine, and finally output a report with validation results. Their work bridges the gap between today’s advanced C-program verification technology and the security of Ethereum smart contracts. Tann et al. [15]

used machine learning technology to detect security vulnerabilities in smart contracts. The approach uses LSTM (Long Short-Term Memory) network to learn the behavior pattern of smart contract bytecodes, and thus obtains better detection accuracy than the

MAIAN tool developed by Nikoli’c et al. [4]. Torres et al. [7] realized the integer defect detection tool Osiris of Ethereum smart contract through symbol execution and pollution analysis. The tool can detect arithmetic errors, truncation errors, and signature errors. As far as we know, this tool is one of the few Ethereum smart contract analysis tools that can handle integer errors. Tsankov et al. [16] analyzed the bytecodes of Ethereum virtual machine, obtained the exact semantics of the statement, and judged the security of smart contracts. Based on semantics, a tool called Securify is realized, which can scan the security of Ethereum smart contract. Chen et al. [17] focused on the execution cost of Ethereum smart contracts. Through investigation and analysis, they found that many recommended smart contract compilers generate bytecodes that contain expensive patterns, even though these bytecodes have been optimized. To this end, they proposed and developed Gasper, a symbol-based execution tool for detecting expensive patterns in bytecodes. Bragagnolo et al. [18] realized SmartInspect, a smart contract debugging tool, by analyzing the bytecode distribution in the Ethereum virtual machine memory.

Various analysis methods based on the bytecodes of the Ethereum virtual machine can accurately detect security vulnerabilities in contracts. However, the detection efficiency of them is relatively low, and they do not adapt quickly to Ethereum updates. Furthermore, these methods cannot accurately locate the possible issues in the source codes of smart contracts.

7.3 Smart contract static code analysis

Static code analysis based on contract source codes has high problem coverage and detection efficiency, and can also detect problems that affect the readability of the codes. It is a useful supplement to the methods based on EVM bytecode. To the best of our knowledge, only Tikhomirov et al. [3] have done some research on static code analysis based on the source codes of the Ethereum smart contract. They classify and summarize the existing Ethereum smart contract problems, and use lexical analysis, grammar analysis and other technologies to achieve static code analysis of smart contracts. However, their problem detection criteria are not very accurate, and cannot detect important security problems such as integer overflow problems and re-entrancy vulnerabilities which have a significant impact on the security of smart contracts.

8 Conclusion and future work

With the vigorous development of blockchain technology, the Ethereum smart contract has been paid more and more attention. In this paper, we propose a novel approach, namely SolidityCheck to detect Ethereum smart contract problems based on regular expressions and program instrumentation. A series of experiments show that the tool corresponding to SolidityCheck has more advantages than existing ones in terms of detection quality and efficiency, and SolidityCheck is can deal with important issues of smart contracts such as re-entrancy vulnerabilities and integer overflow problems.

For future work, we have the following plans:

  • Determining a comprehensive, reasonable, accurate and up-to-date code problem criterion for Ethereum smart contract.

  • According to the criterion of Ethereum smart contract problems, the problems that cannot be detected by SolidityCheck can be added to SolidityCheck in the future.

  • Continuously optimizing SolidityCheck to further improve detection quality, performance, and stability.

9 Acknowledgements

The work is supported by the National Natural Science Foundation of China under Grant No. 61572171, the Natural Science Foundation of Jiangsu Province under Grant No. BK20191297, and the Fundamental Research Funds for the Central Universities under Grant No. 2019B15414.

References

  • [1] G. Wood et al., “Ethereum: A secure decentralised generalised transaction ledger,” Ethereum project yellow paper, vol. 151, no. 2014, pp. 1–32, 2014.
  • [2] L. Luu, D.-H. Chu, H. Olickel, P. Saxena, and A. Hobor, “Making smart contracts smarter,” in Proceedings of the 2016 ACM SIGSAC conference on computer and communications security.   ACM, 2016, pp. 254–269.
  • [3] S. Tikhomirov, E. Voskresenskaya, I. Ivanitskiy, R. Takhaviev, E. Marchenko, and Y. Alexandrov, “Smartcheck: Static analysis of ethereum smart contracts,” in 2018 IEEE/ACM 1st International Workshop on Emerging Trends in Software Engineering for Blockchain (WETSEB).   IEEE, 2018, pp. 9–16.
  • [4] I. Nikolić, A. Kolluri, I. Sergey, P. Saxena, and A. Hobor, “Finding the greedy, prodigal, and suicidal contracts at scale,” in Proceedings of the 34th Annual Computer Security Applications Conference.   ACM, 2018, pp. 653–663.
  • [5] S. Kalra, S. Goel, M. Dhawan, and S. Sharma, “Zeus: Analyzing safety of smart contracts.” in NDSS, 2018.
  • [6] J. Krupp and C. Rossow, “teether: Gnawing at ethereum to automatically exploit smart contracts,” in 27th USENIX Security Symposium (USENIX Security 18), 2018, pp. 1317–1333.
  • [7] C. F. Torres, J. Schütte et al., “Osiris: Hunting for integer bugs in ethereum smart contracts,” in Proceedings of the 34th Annual Computer Security Applications Conference.   ACM, 2018, pp. 664–676.
  • [8] R. Fontein, “Comparison of static analysis tooling for smart contracts on the evm,” in 28th Twente Student conference on IT, 2018.
  • [9] I. Grishchenko, M. Maffei, and C. Schneidewind, “Foundations and tools for the static analysis of ethereum smart contracts,” in International Conference on Computer Aided Verification.   Springer, 2018, pp. 51–78.
  • [10] R. M. Parizi, A. Dehghantanha, K.-K. R. Choo, and A. Singh, “Empirical vulnerability analysis of automated smart contracts security testing on blockchains,” in Proceedings of the 28th Annual International Conference on Computer Science and Software Engineering.   IBM Corp., 2018, pp. 103–113.
  • [11] M. I. Mehar, C. L. Shier, A. Giambattista, E. Gong, G. Fletcher, R. Sanayhie, H. M. Kim, and M. Laskowski, “Understanding a revolutionary and flawed grand experiment in blockchain: the dao attack,” Journal of Cases on Information Technology (JCIT), vol. 21, no. 1, pp. 19–32, 2019.
  • [12] T. Chen, Y. Zhu, Z. Li, J. Chen, X. Li, X. Luo, X. Lin, and X. Zhange, “Understanding ethereum via graph analysis,” in IEEE INFOCOM 2018-IEEE Conference on Computer Communications.   IEEE, 2018, pp. 1484–1492.
  • [13] I. Grishchenko, M. Maffei, and C. Schneidewind, “A semantic framework for the security analysis of ethereum smart contracts,” in International Conference on Principles of Security and Trust.   Springer, 2018, pp. 243–269.
  • [14] E. Albert, J. Correas, P. Gordillo, G. Román-Díez, and A. Rubio, “Safevm: A safety verifier for ethereum smart contracts,” arXiv preprint arXiv:1906.04984, 2019.
  • [15] A. Tann, X. J. Han, S. S. Gupta, and Y.-S. Ong, “Towards safer smart contracts: A sequence learning approach to detecting vulnerabilities,” arXiv preprint arXiv:1811.06632, 2018.
  • [16] P. Tsankov, A. Dan, D. Drachsler-Cohen, A. Gervais, F. Buenzli, and M. Vechev, “Securify: Practical security analysis of smart contracts,” in Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security.   ACM, 2018, pp. 67–82.
  • [17] T. Chen, X. Li, X. Luo, and X. Zhang, “Under-optimized smart contracts devour your money,” in 2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER).   IEEE, 2017, pp. 442–446.
  • [18] S. Bragagnolo, H. Rocha, M. Denker, and S. Ducasse, “Smartinspect: solidity smart contract inspector,” in 2018 International Workshop on Blockchain Oriented Software Engineering (IWBOSE).   IEEE, 2018, pp. 9–18.
  • [19] P. He, G. Yu, Y. Zhang, and Y. Bao, “Survey on blockchain technology and its application prospect,” Computer Science, vol. 44, no. 4, pp. 1–7, 2017.
  • [20] N. Atzei, M. Bartoletti, and T. Cimoli, “A survey of attacks on ethereum smart contracts (sok),” in International Conference on Principles of Security and Trust.   Springer, 2017, pp. 164–186.
  • [21] Ethereum, “Solidity official documents,” https://solidity.readthedocs.io/en/v0.5.10/, accessed June 27,2019.
  • [22] B. Jiang, Y. Liu, and W. Chan, “Contractfuzzer: Fuzzing smart contracts for vulnerability detection,” in Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering.   ACM, 2018, pp. 259–269.
  • [23] V. B. Fabian Vogelsteller, “Eip 20: Erc-20 token standard,” https://eips.ethereum.org/EIPS/eip-20/, accessed Oct 4,2019.
  • [24] J. E. William Entriken, Dieter Shirley, “Eip 721: Erc-721 token standard,” https://github.com/ethereum/EIPs/blob/master/EIPS/eip-721.md, accessed Oct 9,2019.
  • [25] F. V. Christian Reitwiener, Nick Johnson, “Eip 165: Erc-165 token standard,” https://github.com/ethereum/EIPs/blob/master/EIPS/eip-165.md, accessed Oct 9,2019.
  • [26] B. chain, “Bec token contract source code,” https://etherscan.io/address/0xc5d105e63711398af9bbff092d4b6769c82f793d#contracts, accessed June 27,2019.
  • [27] Zeppelin, “Safemath,” https://github.com/OpenZeppelin/openzeppelin-solidity/blob/master/contracts/math/SafeMath.sol, accessed May 20,2019.
  • [28] T. of Bits, “Vulnerable smart contracts,” https://github.com/crytic/not-so-smart-contracts, accessed June 27,2019.
  • [29] ETHERSCANERS, “Market value of ethereum,” https://etherscan.io/, accessed May 20,2019.
  • [30] G. Inc, “Github, build sotware better, together,” https://github.com/, accessed Oct 10,2019.
  • [31] Ethereum, “Remix-ethereum ide,” https://github.com/ethereum/remix-ide, accessed June 27,2019.
  • [32] ConsenSys, “Security analysis tool for evm bytecode. supports smart contracts built for ethereum, quorum, vechain, roostock, tron and other evm-compatible blockchains.” https://mythx.io, accessed AUG 30,2019.
  • [33] melonproject, “An analysis tool for smart contracts.” https://github.com/melonproject/oyente, accessed AUG 30,2019.
  • [34] protofile, “This is an open source project for linting solidity code,” https://github.com/protofire/solhint, accessed Oct 17,2019.
  • [35] C. A. ICE center, “Securify:security scanner for ethereum smart contracts,” https://securify.chainsecurity.com/, accessed June 27,2019.
  • [36] smartdec, “Smartcheck,a static analysis tool that detects vulnerabilities and bugs in solidity programs (ethereum-based smart contracts),” https://tool.smartdec.net/, accessed May 20,2019.
  • [37] g. Anqi7721, FrankLinliu, “The ethereum smart contract fuzzer for security vulnerability detection,” https://github.com/gongbell/ContractFuzzer, accessed Oct 10,2019.
  • [38] christoftorres, “A tool to detect integer bugs in ethereum smart contracts,” https://github.com/christoftorres/Osiris, accessed Oct 10,2019.
  • [39] Openzeppelin, “Representative, problematic smart contracts,” https://ethernaut.openzeppelin.com, accessed Oct 14,2019.
  • [40] G. Destefanis, M. Marchesi, M. Ortu, R. Tonelli, A. Bracciali, and R. Hierons, “Smart contracts vulnerabilities: a call for blockchain software engineering?” in 2018 International Workshop on Blockchain Oriented Software Engineering (IWBOSE).   IEEE, 2018, pp. 19–25.
  • [41] S. Wang, L. Ouyang, Y. Yuan, X. Ni, X. Han, and F.-Y. Wang, “Blockchain-enabled smart contracts: Architecture, applications, and future trends,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2019.
  • [42] W. Zou, D. Lo, P. S. Kochhar, X. D. Le, X. Xia, Y. Feng, Z. Chen, and B. Xu, “Smart contract development: Challenges and opportunities,” IEEE Transactions on Software Engineering, pp. 1–1, 2019.