Domain Specific Code Smells in Smart Contracts

05/04/2019 ∙ by Jiachi Chen, et al. ∙ Singapore Management University Monash University 0

Smart contracts are programs running on a blockchain. They are immutable to patch for bugs once deployed -- it is critical to ensure they are bug-free and well-designed before deploying. Code smells are symptoms in source code that possibly indicate deeper problems. The detection of code smells is a method to avoid potential bugs and improve the design of existing code. However, traditional code smell patterns are designed for centralized OO programs, e.g., Java or C++; while smart contracts are decentralized and contain numerous distinctive features, such as the gas system. To fill this gap, we collected smart-contract-related posts from Stack Exchange, as well as real-world smart contracts. We manually analyzed these posts and defined 20 kinds of code smells for smart contracts. We categorized these into security, architecture, and usability problems. To validate if practitioners consider these contract smells as harmful, we created an online survey and received 96 responses from 24 different countries. Feedback showed these code smells are harmful and removing them would improve quality and robustness of smart contracts. We manually identified our defined code smells in 587 contract accounts and publicly released our dataset. Finally, we summarized 5 impacts caused by contract code smells. These help developers better understand the symptoms of the smells and removal priority.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

The considerable success of decentralized cryptocurrencies has attracted great attention from both industry and academia. Bitcoin  [35] and Ethereum  [27, 44] are the two most popular cryptocurrencies whose global market cap reached $162 billion by April 2018  [3]. Blockchain is the underlying technology of cryptocurrencies, which runs a consensus protocol to maintain a shared ledger to secure the data on blockchain. Both Bitcoin and Ethereum allow users to encode rules or scripts for processing transactions. However, scripts on Bitcoin are not Turing-complete, which restrict the scenarios of its usage. Unlike Bitcoin, Ethereum provides a more advanced technology named Smart Contracts.

Smart contracts are Turing-complete programs that run on the blockchain, in which consensus protocol ensures their correct execution [27]. With the assistance of smart contracts, developers can apply blockchain techniques to different fields like gaming and finance. When developers deploy smart contracts to Ethereum, the source code of contracts will be compiled into bytecode and reside on the blockchain. Once a smart contract is created, it is identified by a 160-bit hexadecimal address, and anyone can invoke this smart contract by sending transactions to the corresponding contract address. Ethereum uses Ethereum Virtual Machine (EVM) to execute smart contracts and transaction are stored on its blockchain.

However, a blockchain ensures that all data on it is immutable, i.e., cannot be modified, which means that smart contracts cannot be patched when bugs are detected or feature additions are desired. In contrast to classical distributed applications, smart contracts on Ethereum operate on a permission-less network. Arbitrary developers even attackers can call the methods to execute the contracts. The famous DAO attack  [4] made the DAO (Decentralized Autonomous Organization) lost 3.6 million Ethers ($150/Ether on Feb 2019), which caused a controversial hard fork  [10, Etc] of Ethereum.

It is thus critical to ensure that smart contracts are bug-free and well-designed before deploying them to the blockchain. In software engineering, code smells are the symptoms in the source code that possibly indicate deeper problems [40]. Code smells are related to not only security issues but also design flaws which might slow down development or increase the risk of bugs or failures in the future. Detecting and refactoring out code smells helps increase software robustness and enhance development efficiency [42, 31].

There are many code smell detection tools [11, 12, 16, 15] which have been widely applied in industry. However, these tools are designed for traditional centralized software based on 22 code smells defined by Martin et al. [28]. These traditional code smells are not enough for the decentralized smart contracts due to their revolutionary new features, such as gas system and decentralized feature. For example, the number one code smell in the stink parade is duplicated code [28]. To remove the duplicated code in two unrelated classes, the standard method is extracting duplicated code in one class and then use the new component in the other classes. This method removes this code smell in traditional software and makes programs easier to read. However, for smart contracts, calling functions of other smart contracts is ”gas-consuming” and can lead to an economic loss.

In this paper, we conduct an empirical study on defining code smells for smart contracts on Ethereum which is the most popular decentralized platform that runs smart contracts. Please note that some previous works [33, 36, 30] focus on improving the quality of smart contracts from the security aspect, but this is the first paper which aims to provide a systematic study of code smells from three aspects (i.e., security, architecture and usability). Our results are conducted from 17,128 Ethereum.Stack Exchange111https://ethereum.stackexchange.com/ posts and have been validated by an online survey. To help developers better understand the symptoms and distribution of smart contract code smells, we manually labeled a dataset and released it publicly to help further study. In this paper, we address the following research questions:

RQ1: What are the code smells in smart contracts?

We defined 20 code smells from Stack Exchange posts and real-world smart contracts. These 20 kinds of code smells are considered from security, architecture and usability aspects. By removing the defined code smells on the contracts, it is likely to improve the quality and robustness of the programs.

RQ2: How do practitioners perceive the code smells we identify?

To validate the acceptance of our newly defined smart contract code smells, we conducted an online survey and received 96 responses and 62 comments from developers in 24 countries. The options in the survey are from ’Very important’ to ’Very unimportant’ and we give each option a score from 5 to 1, respectively. The average score of each code smell is 4.28. The feedbacks and comments show that developers believe removing the defined code smells can improve the quality and robustness of smart contracts.

RQ3: What are the distributions and impacts of the code smells in real-world smart contracts?

We manually labeled 2,522 smart contracts on 587 contract accounts. We found that more than 99% of smart contracts contain at least one of our defined code smells. Besides, we summarized 5 impacts that can help researchers and developers better understand the symptoms of these code smells.

The main contributions of this paper are:

  • We define 20 code smells for smart contracts considering three aspects: security, architecture, and usability. We list symptoms and give a code example of each code smell, which can help developers better understand the defined code smells.

  • We manually identify whether the defined 20 code smells exist in real-life smart contracts. Our dataset222The dataset can be downloaded at https://github.com/CodeSmell2019/CodeSmell. contains a collection of 2,522 smart contracts on 587 contract accounts, which can assist future studies on smart contract analysis and testing. Also, we analyze the impacts of the defined code smells and summarize 5 common impacts. These impacts can help developers decide the priority of code smell removal.

The remainder of this paper is organized as follows. In Section 2, we provide background knowledge of smart contracts. In Sections 3-5, we present the answers to the three research questions, respectively. We discuss the implications and threats to validity in Section 6. Finally, we elaborate the related work in Section 7, and conclude the whole study and mention future work in Section 8.

Ii Background

Smart Contract - A Decentralized Program. A smart contract is ”a computerized transaction protocol that executes the terms of contract”  [39]. Their bytecode and transactions are all stored on the blockchain and visible to all the users. Since Ethereum is an add-only distribute ledger, once smart contracts are deployed to a blockchain, they are immutable to be modified even when bugs are detected. Once a smart contract is created, it is identified by a unique 160-bit hexadecimal string referred to as its contact address. The executions of smart contracts depend on their code. For example, if a contract does not contain functions that can transfer Ethers, even the creator can not withdraw the Ethers. Once smart contracts are deployed, they will exist as long as the whole network exists unless they execute selfdestruct function [22]. selfdestruct is a function that if it is executed, the contract will disappear and its balance will transfer to a specific address. In this paper, we describe smart contracts developed using Solidity, the most popular smart contract programming language in Ethereum.

The Gas System. In Ethereum, miners run smart contracts on their machines. As compensation for miners who contribute their computing resources, the creators and users of smart contracts will pay a certain amount of Ethers to the miners. The Ethers that are paid to miners are computed by: gas cost * gas price. Gas cost depends on the computational resource the transaction will take and gas price is offered by the transaction creators. The minimum unit of gas price is Wei (1 Ether = Wei). The miners have the right to choose which transaction can be executed and broadcasted to the other nodes on the blockchain  [44]. Therefore, if the gas price is too low, the transactions may not be executed. Besides, to limit the gas cost, when a user sends a transaction to invoke a contract, there will be a limit (Gas Limit) that determines the maximum gas cost. If the gas cost exceeds the Gas Limit, the execution is terminated with an exception often referred to as out-of-gas error.

Data location. In smart contracts, data can be stored in storage, memory or calldata [22]. storage is a persistent memory area to store data. For each storage variable, EVM will assign a storage slot ID to identify it. Writing and reading storage variable is the most expensive operation as compared with reading from the other two locations. The second memory area is named memory. The data of the memory variables will be released after their life cycle finished. Writing and reading to memory is cheaper than storage. Calldata is only valid for parameters of external contract functions. Reading data from the Calldata is much cheaper than memory or storage.

Fallback Function. The fallback function [22] is the only unnamed function in Ethereum. This function cannot have arguments or return values. It is only executed when an error function call happens. For example, a user calls function ”A” but the callee contract does not contain this function. The fallback function will be executed to handle the error call. Also, if a fallback function is marked by payable333If a function wants to receive Ethers, it has to add payable, it will be executed automatically when the contract receives Ethers. It is worth noting that transfer and send444In solidity, transfer and send functions are used to send Ethers to another smart contract. functions will limit the gas of fallback function in callee contracts to 2300 gas [22]. This gas is not enough to write to storage, call functions or send Ethers.

Iii RQ1: Code Smells in Smart Contracts

Iii-a Motivation

Smart contracts cannot be patched after deploying them to the blockchain. Removing code smells in programs is a good way to ensure their robustness. As we mentioned in Section 1, the revolutionary new features made traditional code smells not enough for decentralized smart contracts. To fill this gap, we define code smells from Stack Exchange posts. We give definitions and examples of defined code smells specialized for Ethereum smart contracts.

Iii-B Approach

Stack Exchange Posts: To define code smells for smart contracts, we collected issues that developers encountered. Programmers often collaborate and share experience over Q&A site like Ethereum Stack Exchange [9], the most popular and widely-used question and answer site for users of Ethereum. By analyzing posts on Ethereum Stack Exchange, we can summarize code smells on Ethereum. In this paper, we crawled 17,128 Stack Exchange posts and analyzed them further. However, it is time-consuming to find important information from thousands of Q&A posts. Therefore, we utilized keywords to filter important information from Stack Exchange posts. To ensure the completeness of our keywords list, two authors of this paper read the solidity documents carefully and recorded the keywords they think are important (These keywords are also published). After that, they merged the keywords list and used these keywords to filter Stack Exchange posts. When reading the posts, we added new keywords to enrich our list and filter new posts. We finally used 66 keywords to filter 4,141 posts.

Category Description
Gas Limitation Bugs caused by gas limitation.
Permission Check Bugs caused by permission check failure.
Inappropriate Logic There are inappropriate logics inside a contract, which can be utilized by attackers.
Specialized DApp Features Developers do not realize there are some differences between traditional apps with DApps, for example, Ethereum does not support concurrency, so they make mistakes.
Version Gaps Errors due to the update of Ethereum or Solidity.
Inappropriate Standard Ethereum provides several standards, but many contracts do not follow them.
TABLE I: Classification scheme.

Open Card Sorting : We followed the card sorting [38] approach to analyze and categorize the filtered posts. We created one card for each post. The card contains the information of its title, body, and comments. Two authors of this paper who have rich experience in smart contract development worked together to figure out the labels of posts. The detailed steps are:

Iteration 1: We randomly chose 20% of the cards, and two authors discussed their root cause. If the root causes of the posts are unclear, we omitted them from our card sort. All the themes are generated during the sorting.

Iteration 2: Two authors independently categorized the remaining 80% cards into the initial classification scheme. We found Inappropriate Standard are common in the remaining card, and we finally categorized 6 themes, the detailed information is shown in Table I. We used Cohen’s Kappa [25] to measure the agreement between the two authors. Their overall Kappa value is 0.82, indicating strong agreement.

Definition: After categorizing the filtered posts, we summarized 6 high-level root causes from Stack Exchange post. Then, two authors of this paper read the posts on each categorizing again, and aim to find more detail behaviors as the definition of code smells. Finally, we summarized 16 code smells. Following are two examples:

Example 1: Nested Call

Post Example: ”Too much gas consumption … This method can be executed with around 500k gas … ”.

Defining Process: The post belongs to ’Gas Limitation’. It describes one’s smart contract has a very high gas consumption. We carefully analyzed the code example given by the post and found that it executes CALL instruction inside a for loop who does not limit its length. Since the times of iterations are unlimited, it causes an out-of-gas error. Finally, we give definition Nest Call as ’Executing CALL instruction inside an unlimited-length loop.’

Example 2: High Gas Consumption Function Type

Post Example: ”What are the best practices of using external vs. public keyword? … public functions are expensive … ”

Defining Process: The post belongs to ’Specialized DApp Features’ because the keyword ’external’ is a new feature in Solidity compared to the traditional programming language like Java or C++. The post asks the difference between external and public functions. The answer explains that the biggest difference is that ’public’ requires more gas compared to ’external’. So, we defined a code smell ’High Gas Consumption Function Type’ as ’Using inappropriate function type which can increase gas consumption’.

Code Smell Definition Code Smell Definition
Unchecked External Calls Do not check the return value of external call functions. DoS Under External Influence Throwing exceptions inside a loop which can be influenced by external users
Strict Balance Equality Using strict balance quality to determine the execute logic. Unmatched Type Assignment Assigning unmatched type to a value, which can lead to integer overflow
Transaction State Dependency Using tx.origin to check the permission. Re-entrancy The re-entrancy bugs.
Hard Code Address Using hard code address inside smart contracts. Block Info Dependency Using block information related APIs to determine the execute logic.
Nested Call Executing CALL instruction inside an unlimited-length loop. Deprecated APIs Using discarded or unrecommended AIPs or instructions.
Unspecified Compiler Version Do not fix the smart contract to a specific version. Misleading Data Location Do not clarify the reference types of local variables of struct, array or mapping.
Unused Statement Creating values which never be used. Unmatched ERC-20 standard Do not follow the ERC-20 standard for ICO contracts.
Missing Return Statement A function denote the type of return values but do not return anything. Missing Interrupter Missing backdoor mechanism in order to handle emergencies.
Missing Reminder Missing events to notify caller whether some functions are successfully executed. Greedy Contract A contract can receive Ethers but can not withdraw Ethers.
High Gas Consumption Function Type Using inappropriate function type which can increase gas consumption. High Gas Consumption Data Type Using inappropriate data type which can increase gas consumption.
TABLE II: Definitions of the 20 code smells.

Besides, to assist future studies on smart contract analysis and testing, we identified whether the defined code smells exist in these 2,522 real-world smart contracts from 587 contract addresses. We first crawled all 72,723 open-sourced smart contracts from 17,013 contract accounts with Ethereum. We randomly chose smart contracts on 600 contract accounts. Then, we filtered smart contracts on 13 contract accounts because they do not contain any functions in their contracts. Finally, we obtain 2,522 smart contracts with 231,098 lines of code from the 587 contract accounts. The total amount of Ethers in these accounts are more than 4 million Ethers. During the process of labeling, we found another 4 code smells which are common in real-world smart contracts. Finally, we defined 20 code smells.

Iii-C Results

We define and give examples of each smell from the three categories, i.e., Security Smells, Architecture Smells, Usability Smells. We first give a brief definition of each code smells in Table II. Then, we give detailed definitions and code examples in the followed paragraphs:

Iii-C1 Security Smell

In this subsection, we identify 7 code smells that can lead to security issues. These may be exploited by attackers to gain financial benefits or attack vulnerable contracts.

(1) Unchecked External Calls: To transfer Ethers or call functions of other smart contracts, Solidity provides a series of external call functions for raw addresses, i.e., address.send(), address.call(), address.delegatecall() [22]. Unfortunately, these methods may fail due to network errors or out-of-gas error, e.g., the 2300 gas limitation of fallback function introduced in Section II. When errors happen, these methods will return a boolean value (False), but never throw an exception. If callers do not check return values of external calls, they cannot ensure whether code logic is correct.

Example: An example of this code smell is given in Listing 1. In function getWinner (L21), the contract does not check the return value of send (L24), but the array participators is emptied by assigning participatorID to 0 (L25). In this case, if the send method failed, the winner will lose 8 Ethers.

(2) DoS Under External Influence: When an exception is detected, the smart contract will rollback the transaction. However, throwing exceptions inside a loop is dangerous.

Example: In line 31 of Listing 1, the contract uses transfer to send Ethers. However, In Solidity, transfer and send will limit the gas of fallback function in callee contracts to 2,300 gas [22]. This gas is not enough to write to storage, call functions or send Ethers. If one of member[i] is an attacker contract and the transfer function (L31) can trigger an out-of-gas exception due to the 2,300 gas limitation. Then, the contract state will rollback. Since the code cannot be modified, the contract can not remove the attack address from members list, which means that if the attacker does not stop attacking, no one can get bonus anymore.

(3) Strict Balance Equality: Attackers can send Ethers to any contracts forcibly by utilizing selfdestruct(victim_address) API [22]. This way will not trigger the fallback function, meaning the victim contract cannot reject the Ethers. Therefore, the equation logic of contract balance will fail to work due to the unexpected ethers send by attackers.

Example: Attackers can send 1 Wei (1 Ether = Wei) to Contract Gamble in Listing 1 by utilizing selfdestruct method. This method will not trigger fallback function (L11). Thus, the Ethers will not be thrown by ReceiveEth(L13). If this attack happens, the getWinner() (L21) would never be executed, because the getWinner can only be executed when the balance of the contract is strictly equal to 10 Ethers(L19).

1contract Gamble{
2 address owner;
3 address[] members;
4 address[] participators;
5 uint participatorID = 0;
6 modifier onlyOwner{ /*Transaction State Dependency*/
7   require(tx.origin==owner);  _; }
8 function constructor(){ //constructor function
9   owner = //this is the address of tx.origin
10        0xdCadd1D3AD; /*Hard Code Address*/}
11 function() payable{ //Executed when receiving Ethers
12   ReceiveEth();}
13 function ReceiveEth() payable{
14   if(msg.value!=1 ether){
15    revert();}//msg.value is the number of received ETHs
16   members.push(msg.sender);
17   participators[participatorID] = msg.sender;
18   participatorID++;
19   if(this.balance==10 ether){/*Strict Balance Equality*/
20        getWinner();}}
21 function getWinner(){ //choose a member to be the winner
22    /*Block Info Dependency*/
23   uint winnerID = uint(block.blockhash(block.number)) % participators.length;
24   participators[winnerID].send(8 ether);
25   participatorID = 0;}
26 function giveBonus() returns(bool){ //send 0.1 ETH to all members as bonus
27   /*Unmatched Type Assignment, Nested Call*/
28   for(var i = 0;i < members.length; i++){
29     if(this.balance > 0.1 ether)
30         /*DoS Under External Influence*/
31        members[i].transfer(0.1 ether); }
32    /*Missing Return Statement*/ }
33 function suicide(address addr) onlyOwner{ //Remove the contract from blockchain
34    selfdestruct(addr);}
35 function withDraw(uint amount) onlyOwner{ //withdraw certain Ethers to owner account
36    address receiver = 0x05f4d27;
37    receiver.call.value(amount);}}
Listing 1: A ”Gamble” smart contract. Each gambler sends 1 Ether to this contract. When the contract receives 10 Ethers, it chooses one gambler as the winner and sends 8 Ethers to him. However, this contract contains several code smells.

(4) Unmatched Type Assignment: Solidity supports different types of integers (e.g., uint8, uint256). The default type of integer is uint256 which supports a range from 0 to 2 256. uint8 takes less memory, but only supports numbers from 0 to 2 8. Solidity will not throw an exception when a value exceeds its maximum value. The progressive increase is a common operation in programming, and performing an increment operation without checking the maximum value may lead to overflow.

Example: The variable i in line 28 of Listing 1 is assigned to uint8, because 0 is in range of uint8 (0-255). If the members.length is larger than 255, the value of i after 255 is 0. Thus, the loop will not stop until running out of gas or balance of account is less than 0.1.

(5) Transaction State Dependency: Contracts need to check whether the caller has permissions in some functions like suicide (L33 in Listing 1). The failure of permission checks can cause serious consequences. For example, if someone passes the permission check of suicide function, he/she can destroy the contract and stole all the Ethers. tx.origin can get the original address that kicked off the transaction, but this method is not reliable since the address returned by this method depends on the transaction state.

Example: We can find this smell in line 7 of Listing 1. The contract uses tx.origin to check whether the caller has permission to execute function suicide. However, if an attacker uses function attack in Listing 4 to call suicide function (L33 in Listing 1), the permission check will fail. suicide function will check whether the sender has permission to execute this function. However, the address obtained by tx.origin is always the address who creates this contract (0xdCad…d1D3AD L10 in Listing 1). Therefore, anyone can execute the suicide function and withdraw all of the Ethers in the contract.

(6) Block Info Dependency: Ethereum provides a set of APIs (e.g., block.blockhash, block.timestamp) to help smart contracts obtain block related information, like timestamps or hash number. Many contracts use these pieces of block information to execute some operations. However, the miner can influence block information; for example, miners can vary block time stamp by roughly 900 seconds [2]. In other words, block info dependency operation can be controlled by miners to some extent.

Example: In Listing 1 line 23, the contract uses blockhash to generate which member is the winner. However, the gamble is not fair because miners can manipulate this operation.

(7) Re-entrancy: Concurrency is an important feature of traditional software. However, Solidity does not support it, and the functions of a smart contract can be interrupted while running. Solidity allows parallel external invocations using call method. If the callee contract does not correctly manage the global state, the callee contract will be attacked – called a re-entrancy attack.

Example: Listing 2 shows an example of re-entrancy. The attacker contract invokes Victim contract’s withDraw() function. However, Victim contract sends Ethers to attacker contract before resetting the balance. Line 6 will invoke the fallback function (L9) of attacker contract and lead to repeated invocation.

1contract Victim {
2    mapping(address => uint) public userBalannce;
3    function withDraw(){
4        uint amount = userBalannce[msg.sender];
5            if(amount > 0){
6                msg.sender.call.value(amount)();
7                userBalannce[msg.sender] = 0;}} …}
8contract Attacker{
9    function() payable{
10        Victim(msg.sender).withDraw();}
11    function reentrancy(address addr){
12        Victim(addr).withDraw();} …}
Listing 2: Attacker contract can attack Victim contract by utilizing Re-entrancy
1pragma solidity ^0.4.25;/*Unspecified Compiler Version*/
2contract SmellExample{
3    uint variable;
4    uint[] investList;
5    function() payable{}
6    function reAssignArray(){
7        /*Misleading Data Location*/
8        uint[] tmp;
9        tmp.push(0);
10        investList = tmp;}
11    function changeVariable(uint value1, uint value2){
12        /*Unused Statement*/
13        uint newValue = value1;
14        variable = value2;}
15    /*High Gas Consumption Function Type*/
16    function highGas(uint[20] a) public returns (uint){
17        return a[10]*2;}
18    function lowGas(uint[20] a) external returns (uint){
19        return a[10]*2;}}
Listing 3: SmellExample
1contract attacker{
2    
3    function attack(address addr, address myAddr){
4        Gamble gamble = Gamble(addr);
5        gamble.suicide(myAddr);}}
Listing 4: An attacker contract by utilizing Transaction State Dependency smell.

Iii-C2 Architecture Smell

We define 6 code smells related to architecture. These may not be utilized by attackers but are a bad design for contracts and reduce their readability and maintainability, or lead to unpredicted vulnerabilities in the future.

(1) Hard Coded Address: Since we cannot modify smart contracts after deploying them, hard coded addresses can lead to vulnerabilities.

Example: There are two main kinds of errors this code smell can lead to. The first is Illegal Address. Ethereum uses a mixed-case address checksum to verify whether an address is legal or not. The rule is defined in EIP-55 [8]. There is an error address in line 10 of Listing 1. The owner address is an illegal address, the last bit of the address should be ’F’, but by mistake, it becomes ’D’. The illegal address makes no one that can withdraw the amount of this contract. The second is Suicide Address. selfdestruct function (L34) can remove the code from the blockchain and make the contract become a suicide contract, but it is potentially dangerous. If someone sends Ether to suicide contracts, the Ether will forever be lost. receiver (L36) is a smart contract who contains selfdestruct function. Its address is hardcoded in line 36 of Listing 1 and cannot be modified. If the receiver performed the selfdestruct function, it will become a suicide contract. All the Ethers send to receiver will be lost forever.

(2) Nested Call: Instruction CALL is very expensive (9000 gas paid for a non-zero value transfer as part of the CALL operation [44]

). If a loop body contains CALL operation but does not limit the number of times the loop is executed, the total gas cost would have a high probability of exceeding the gas limitation because the number of iterations may be high and it is hard to know its upper limit.

Example: In Listing 1, the function giveBonus (line 26) uses transfer (L31) which generates CALL to send Ethers. Since the members.length (L28) do not limit its size, giveBonus has a probability to cause out of gas error. When this error happens, this function can not be called anymore because there is no way to reduce the members.length.

(3) Deprecated APIs: Some instructions will be modified or discarded after hard forks. As Solidity is a young and evolving programming language, some APIs/instructions will be discarded or updated in the future. If developers use these APIs, they need to refactor the code, leading to resource waste.

Example: CALLCODE operation will be discarded in the future [22], throw, suicide, sha3 are replaced by revert, selfdestruct, keccak256 respectively in the recent version.

(4) Unspecified Compiler Version: Different versions of Solidity may contain different APIs/instructions(Deprecated APIs code smells described before). In Solidity programming, developers should indicate the compiler version.

Example: In the first line of Listing 3, pragma solidity 0.4.25 means that this contract supports compile version 0.4.25 and above (except for v0.5.0) while pragma solidity 0.4.25 means that the contract only supports compile version 0.4.25. Since it is hard to foresee the language constructions in the future version, it is recommended to indicate a specific compiler version to avoid unnecessary bugs.

(5) Misleading Data Location: In traditional programming languages like Java or C, variables created inside a function are local variables. Data is stored in memory and the memory will be released after the function exits. In Solidity, the data of struct, mapping, arrays are stored in storage even they are created inside a function. However, since storage in solidity is not dynamically allocated, storage variables created inside a function will point to the storage slot555Each storage variables has its own storage slot to identify its position. 0 by default [22]. This can cause unpredictable bugs.

Example: Function reAssignArray (L6) in Listing 3 creates a local variable tmp. The default data location of tmp is storage, but EVM cannot allocate storage dynamically. There is no space for tmp, but instead, it will point to the storage slot 0 (variable in L3 of Listing 3). For the result, once function reAssignArray is called, the variable variable will add 1, which can cause bugs for the contract.

(6) Unused Statement: If function parameters or local variables do not affect any contract statements nor return a value, it is better to remove these to improve code readability.

Example: function parameter value1 and local variable newValue in function changeVariable (L11 of Listing 3) are useless, because they never affect contract statements nor return values. Although the compiler will remove these useless statements when compiling source code to binary code, these can reduce contract readability.

Iii-C3 Usability Smell

We define 7 code smells related to usability. Removing usability code smells can reduce cost and avoid unnecessary errors when others call contracts.

(1) Unmatched ERC-20 Standard: ERC-20 Token Standard [5] is a technical standard on Ethereum for implementing tokens of cryptocurrencies. It defines a standard list of rules for Ethereum tokens to follow within the larger Ethereum ecosystem, allowing developers to predict the interaction between tokens accurately. These rules include how the tokens are transferred between addresses and how data within each token is accessed. The function name, parameter types and return value should strictly follow the ERC20 standard. ERC-20 defines 9 different functions and 2 events to ensure the tokens based on ERC20 can easily be exchanged with other ERC20 tokens. However, we find that many smart contracts miss return values or miss some functions.

Example: transfer and transferFrom are two functions defined by ERC20. They are used to transfer tokens from one account to another. ERC20 defines that these two functions have to return a boolean value, but many smart contracts miss this return value, leading to errors when transferring tokens.

(2) Missing Return Statement: Some functions denote return values but do not return anything. For these, EVM will add a default return value when compiling the code to bytecode. Since the callers may not know the source code of the callee contract, they may use the return value to handle code execution and lead to unpredictable bugs.

Example: Function giveBonus (L26) in Listing 1 declares the return type bool, but the function does not return true or false. Then, EVM will assign the default return value as false. If developers call this function, the return value will always be the false and some functions in the caller contracts may never be executed.

(3) Missing Interrupter: When bugs are detected by attackers, they can attack the contracts and steal their Ethers. The DAO lost $50 million Ethers due to a bug in the code that allowed an attacker to draw off the Ethers [4] repeatedly. The interrupter is a mechanism to stop the contract when bugs are detected. We cannot modify contracts after deploying them to the blockchain. However, if a contract contains interrupter, the owner of the victim contract can reduce their losses. The easiest interrupter is adding a selfdestruct function [22], Ethers on the contracts can be withdrawn and the contracts destroyed.

Example: When bugs are found in Listing 1, the Ethers on the contract can be stolen by attackers. Fortunately, the contract contains an interrupter on suicide function (L33). So, the owner of the contract can call suicide. Then, the remain Ethers will send to the given address. After fixing the bugs, the contracts can be redeployed.

(4) Missing Reminder: Other programs can call smart contracts through the contracts’ Application Binary Interface (ABI). ABI is the standard way to interact with contracts in the Ethereum ecosystem, both from outside the blockchain and for contract-to-contract interaction. However, callers do not know the source code of the contracts unless the contracts are open source. Callers usually do not know detailed information about functions. Throwing an event to notify caller whether the function is successfully executed can reduce unnecessary errors and gas waste.

Example: A typical scenario of this code smell is missing reminders when receiving Ethers. In Listing 1, users may not clear the game rules, and send Ethers which not equal to 1 Ether (line 14, 15). However, the smart contract will check whether the received Ether is equal to 1 Ether, then the Ether will return back. There are several reasons for invoking failures. For example, the user may mistakenly believe the error is caused by network and resend the Ethers, which can lead to gas waste. Adding reminders (throwing events) to notify caller whether some functions are successfully executed can avoid unnecessary failure.

(5) Greedy Contract: A contract can withdraw Ethers by sending Ethers to another address or using selfdesturct function. Without these withdraw-related functions, Ethers in contracts can never be withdrawn and will be locked forever. We define a contract to be a greedy contract if the contract can receive ethers (contains payable fallback function) but there is no way to withdraw them.

Example: In Listing 3, the contract has a payable fallback function in line 5, which means this contracts can receive Ethers. However, the contracts cannot send Ethers to other contracts or addresses. Therefore, the Ethers in this contract will be locked forever.

(6) High Gas Consumption Function Type: For public functions, Solidity immediately copies function arguments (Arrays) to memory, while external functions can read directly from calldata [44]. Memory allocation is expensive, whereas reading from calldata is cheap. To lower gas consumption, if there are no internal functions call this function and the function parameters contain array, it is recommended to use external instead of public.

Example: In Listing 3, function highGas (L16) and function lowGas (L18) have the same capabilities. The only difference is that highGas is modified by public which can be called by external and internal functions. lowGas is modified by external which can only be called by external. Calling function highGas costs 496 gas while calling lowGas only costs 261 gas.

(7) High Gas Consumption Data Type: bytes is dynamically-sized byte array in Solidity, byte[] is similar with bytes, but bytes cost less gas than byte[] because it is packed tightly in calldata. EVM operates on 32 bytes a time, byte[] always occupy multiples of 32 bytes which means great space is wasted but not for bytes. Therefore, bytes takes less storage and costs less gas. To lower gas consumption, it is recommended to use bytes instead of byte[].

Example: Replacing byte[] by bytes can save a small amount of gas for each function call. However, as the contract is called more times, a large amount of gas can potentially be saved.

Iv RQ2: Practitioners’ Perspective

Iv-a Motivation

In Section III, we gave the definition of 20 code smells. To validate our defined code smells are harmful, we created an online survey to collect opinions from real-world smart contract developers.

Code Smell Distribution Score No. Smells Impacts Code Smell Distribution Score No. Smells Impacts
Unchecked External Calls 4.64 25 IP3 DoS Under External Influence 4.49 6 IP1
Strict Balance Equality 4.42 5 IP1 Unmatched Type Assignment 4.41 22 IP2
Transaction State Dependency 4.52 5 IP1 Reentrancy 4.70 12 IP1
Hard Code Address 4.05 84 IP3 Block Info Dependency 4.10 42 IP3
Nested Call 4.42 13 IP2 Deprecated APIs 4.12 247 IP5
Unspecified Compiler Version 3.92 533 IP5 Misleading Data Location 4.46 1 IP2
Unused Statement 4.08 10 IP5 Unmatched ERC-20 standard 4.27 45 IP4
Missing Return Statement 4.25 263 IP4 Missing Interrupter 4.0 525 IP4
Missing Reminder 4.06 27 IP4 Greedy Contract 4.21 8 IP2
High Gas Consumption Function Type 4.25 423 IP5 High Gas Consumption Data Type 4.18 0 IP5
TABLE III: Survey results, distributions, and impacts of the 20 code smells.

Iv-B Approach

Iv-B1 Validation Survey

We followed the instructions of Kitchenham et al.  [32] for personal opinion surveys and utilized an anonymous survey [41] to increase response rates. Respondents can choose to leave an email address, as all respondents could choose to take part in a raffle to win two $50 Amazon gift cards. We first conducted a small scale survey to polish our questions. These participators give feedback about: (1) whether the expression of the code smells is clear and easy to understand, and (2) whether the length of each question is suitable. Finally, we modified our survey based on the feedback we collected.

Iv-B2 Survey Design

To help respondents better understanding the aim of our survey, we explained what is code smell at the beginning of the survey and gave detailed definitions and examples of the 20 code smells in related questions. We first captured the following pieces of information to collect demographic information about the respondents:

Demographics:

  • Professional smart contract developer? : Yes / No

  • Involved in open source software development? : Yes / No

  • Main role in developing smart contract.

  • Experience in years

  • Current country of residence

  • Highest educational qualification

  • Rate level of importance of 6 factors to determine proficiency

Examples of Code Smells: Next, we gave detailed definitions and examples of the 20 code smells. We asked respondents to rate the importance of these code smells, i.e., removing them can improve the security, reliability, or usability of a project. Since some of the defined code smells are not easy to understand, we added an option ”I don’t understand” to ensure results are reliable. Finally, we give each question six options (i.e., Very important, Important, Neutral, Unimportant, Very unimportant and I don’t understand). Besides, we give each question a textbox to enable respondents to give their opinions.

Other Questions: We give a textbox so respondents can tell us if they have any other comments, questions, or concerns.

Iv-B3 Recruitment of Respondents

In order to get a sufficient number of respondents from different backgrounds, we first sent our survey to our partners who are working or study in world-famous companies or academic institutions. Besides, we sent our email to 989 practitioners who contribute to open source smart contract related projects on GitHub. All respondents could enter their email to take part in a raffle to win two $50 Amazon gift cards.

Iv-C Results

We received 96 responses from 24 different countries, and we received 62 comments on our defined code smells. 74 (80.43%) of these responses involved in open source software development efforts. The top two countries in which the respondents reside are China (40.63%) and USA (7.29%). The average years of experience in developing smart contracts are 1.47 years. Since the Ethereum was published in late 2015, we believe the average year of 1.47 years can show that the respondents have good experience in developing smart contracts. Among these respondents, 60 (63.04%), 12 (13.04%), 13 (11.95%), 4 (4.35%) described their job roles as development, testing, management and security audit respectively. The other 7 responses said they have multiple roles.

Table III shows the results of our survey. The first column indicates each code smell and the second column illustrates the distribution of respondents’ choice. The distribution is from ”Very unimportant” (left-most red bar) to ”Very important” (right-most green bar). To clearly show the result, we give each option a score and count the weighted average score which is shown in the third column. To be specific, we give ”very important” a score 5 and give ”very unimportant” a score 1.

We received very positive feedback from developers with almost all code smells’ scores are larger than 4, and the average score is 4.28. The score of ”Unspecified Compiler Version” is 3.92 but it is also a positive score. To understand the reasons, we reviewed comments about this smell. We found that many developers who voted ”unimportant” mentioned the difference among different minor versions in the same major version (e.g., 0.4.19 and 0.4.20) is small. However, they admitted that the difference among different major versions (e.g., 0.4.0 and 0.5.0) is significant. Besides, some developers gave comments that removing this code smell is very important when they want to reuse code in the future.

Some of our smart contract code smells received negative feedback (”Unimportant” and ”Very unimportant”). For example, “Unspecified Compiler Version”, “Missing Interrupter”, and “Missing Reminder” received 7,7, and 6 negative feedbacks, respectively. For “Missing Interrupter”, some developers mentioned that adding interrupters in smart contracts will ensure the benefits of the smart contract owners. However, the back door mechanism may cause users to distrust contracts. This worry makes sense, but we believe it can be fixed if the contract owners add some insurance mechanism on the contracts. For example, they can define rules to detect abnormal states, and the back door mechanism can only be executed when the abnormal state is detected. For ”Missing Reminder”, we do not receive comments from respondents who chose negative options. We sent emails to the developers who gave their email address and received three feedbacks. All mentioned that the smart contracts they developed are used inside their companies. They will write a detailed document of each function and when other developers in their companies have problems, and they fix the problems face to face. Therefore, this code smell is not important for them. However, we believe if the smart contracts are deployed on Ethereum and other developers can call the functions, removing this code smell can reduce potential problems.

Some positive comments we received included:

  • You provide a very good summary of some very important security checkpoints.

  • Those controls and warnings should be integrated into the Solidity compiler, and displayed in common development tools like Remix and Truffle.

  • It is nice to have such a summary of these vulnerabilities among smart contracts, I think it would be very helpful for the blockchain practitioners as well as the researchers.

  • These suggestions above are very useful to avoid various kinds of flaws.

  • Generally speaking, all of these code smells can lead to serious problems. I learned a lot from this survey.

V RQ3: Distribution and Impact of Code Smells

V-a Motivation

To help developers and researchers better understand the impacts of the defined code smells, we summarized 5 impacts and manually label 2,522 smart contracts from 587 contract addresses to show their distribution in the real-world smart contracts. Our labeling results provided ground truth for future studies on smart contract code smell detection. As it is not easy to remove all code smells due to tight project schedules or financial reasons, the impacts and distributions of different code smells can help developers decide which smell should be fixed first.

V-B Approach

We obtained 2,522 smart contracts from 587 real-world Ethereum contract addresses. Two authors independently read these smart contracts and determined whether the contracts contained our defined code smells. Their overall Kappa value was 0.71, which indicates substantial agreement between them. After completing the labeling process, they discussed with their disagreement and gave a final result. During the process of labeling, they analyzed the impacts of each code smell and summarized 5 common impacts. Since smart contracts in an account cannot be separated (all public and external functions can be called), we labeled the smart contract at the account level.

V-C Results

The defined code smells in this paper can lead to the following five impacts, IP1 highest to IP5 lowest:

Impact 1 (IP1): The smart contracts containing the related code smells can lead to unwanted behaviors. These bugs can be triggered by attackers and they can make profits by controlling sensitive functions of the contracts like transferring Ethers.

Impact 2 (IP2): The smart contracts containing the related code smells can lead to unwanted behaviors. These can be triggered by attackers and can result in a contract being unable to work normally. Unlike IP1, attackers cannot control functions to make profits.

Impact 3 (IP3): The smart contracts containing the related code smells can lead to unwanted behaviors. Unlike IP2, these bugs will not cause fatal problems like the crash or cannot work again, but the contracts may lose some Ethers in some situations.

Impact 4 (IP4): The smart contracts containing the related code smells can work normally, but may be hard to use. When outside programs call contracts with the smell, it may lead to errors.

Impact 5 (IP5): The smart contracts containing the related code smells can work normally, but the code smells can lead to gas waste. Besides, if other smart contracts want to reuse the code, it may lead to some mistakes due to version gaps.

Table III lists the detailed distribution of each code smell (the fourth column) in our dataset and its related impact (the last column). We calculate the distribution for Impacts 1 – 5 are 4.77%, 6.98%, 23.51%, 93.86%, 99.14%, respectively. Note that one smart contract can have multiple impacts simultaneously. Almost all smart contracts contain at least one code smell of the impact 4 or impact 5. Deprecated APIS, Unspecified Compiler Version, Missing Interrupter and High Gas Consumption Function Type are the most popular code smells in our dataset. These code smells will not affect the function of the contracts, but it may have unpredictable impacts to the future. The distribution may illustrate that the developers focus more on the functionality but do not consider the code reuse or handle unpredictable behaviors caused by attackers.

This finding is similar to Chen et. al [24]. They found that 96% of smart contracts are involved no more than 5 transactions, and they are not be used anymore, indicating that many developers do not consider the future reuse of these contracts. We also found that ERC-20 related smart contracts are the most popular (36.11%) in the Ethereum. However, 21.22% of them do not follow ERC-20 standards. We did not find any smart contracts which contain High Gas Consumption Data Type. However, since the size of our dataset is limited, and this code smell has related posts on stack exchange. This code smell might exist if we investigate more contracts. To summary, our findings showed that defined code smells are ubiquitous in real-world smart contracts.

Vi Discussion

Vi-a Implications

For Researchers: Research Guidance. In this paper, we defined 20 code smells. Several previous studies analyzed some of them. We have investigated whether there are existing tools that can detect some of the code smells identified by our work. We show the results in Table IV. We first collected the titles of papers which were published at CCS, S&P, USENIX Security, NDSS, ACSAC, ASE, FSE, ICSE, TSE, TIFS, and TOSEM from 2016 to 2018, since Ethereum went live on July 30, 2015 [13]. Then, we used the keywords ”smart contract”, ”Ethereum”, ”blockchain”, ”Contracts” to search for papers which are related to the smart contract technology. After that, we read the abstract of each paper to verify its relevance. Finally, we found a total of 4 related papers (i.e., Oyente [33, 18] , Zeus [30], Maian [36] and Contractfuzzer  [29]). We provide a description of these four tools in Section VII. We find that 7 code smells can be detected by these existing tools and most of them are security related smells. These tools focus more on the security aspects but do not consider the other two aspects considered as equally important by practitioners. Therefore, researchers can pay more attention to developing tools that can detect the other 13 code smells.

Behavior vs. Perception [26]. The belief of whether a code smell is important or not may result in prioritizing testing effort. The survey results and code smell distribution shown in Table III can help us investigate whether the practitioners’ perception is consistent with their behavior. We find that the top two most frequent code smells are ’Unspecified Compiler Version’ and ’Missing Interrupter’ (according to the column No. Smells in Table III). Their survey scores are also the lowest (3.92 and 4.0 according to the column Score in Table III), indicating that practitioners do not perceive them as important as other smells, and thus they pay fewer attention to them in practice which causes them to appear more than other code smells. The appearance of these two code smells is consistent with practitioners’ perception. However, there are many inconsistent examples. According to the definition of 5 impacts introduced in Section 5.3, it is clear that IP1 can cause the most serious problems compared to other impacts. We find the ’Unchecked External Calls’ has the second highest survey score (4.64), which shows that developers think this smell is very important. However, its impact is only IP3, which shows that there is an inconsistency between the practitioners’ perception (high survey score) and their behavior (low impact to the project). Future code smell detection tools should provide rationales that explicitly describe the connection between code smells and its impact. This could assist developers better prioritize testing efforts, and understand the detection results well.

Code Smells in Other Smart Contract Platforms. We propose a method which summarizes code smells from online posts. Our study focused on defining code smells for Ethereum smart contracts, but the same method can be applied to other popular blockchain platforms, e.g., EOS [6], Hyperledger [7]. These blockchain platforms also support the running of smart contracts and have their unique features. There are thousands of posts on Stack Exchange related to these platforms. Researchers can analyze the related posts and find specific features and code smells of these smart contract platforms. Our work defined 20 code smells and provide a dataset which identifies these code smells on 587 contract accounts, which point out a new direction for future research. For example, researchers can develop automatic code smell detection tools, and our dataset can be used as ground truth to validate the performance of these tools.

Code Smells Tools
Unchecked External Calls Oyente, Zeus, Contractfuzzer
Reentrancy Oyente, Zeus, Contractfuzzer
Block Info Dependency Oyente, Zeus, Contractfuzzer
Transaction State Dependency Oyente, Zeus
DoS Under External Influence Zeus
Unmatched Type Assignment Zeus
Greedy Contract Maian
TABLE IV: Tools that detect some code smells identified by our study.

For Practitioners: We are the first to conduct an empirical study by analyzing many online StackExchange posts to understand and define code smells for smart contracts, and utilize an online survey to validate the acceptance of the defined code smells among real-world developers. Our results showed that most of the smart contracts in our dataset contained at least one of the defined code smells. The results may indicate that developers do not consider future use and handle unpredictable attacks. However, since the smart contracts are immutable to patch, the consideration of future use and unpredictable attacks is very important. We also concluded 5 impacts of the defined code of smells to help practitioners better understand the consequences. The defined code smells can be regarded as a coding guidance for practitioners when they develop smart contracts. By removing the defined code smells, they can develop robust and well-designed smart contracts.

Besides, developing code smell detection tools is also a good direction to make profits. Our online survey received many comments from managers of smart-contract-related companies. Some comments are listed in Section 4.3. They showed much interest in developing related tools and highlighted that the related detection tools should be integrated into Solidity compiler and development tools.

For Educators: Educators should emphasize the importance of removing code smells before deploying smart contracts to blockchain. A survey [23] shows that more than 20% of top 50 universities offering blockchain courses until Oct. 2018. However, most courses focus on teaching basic grammar rule of Solidity programming or blockchain related knowledge but ignore other concerns (security, architecture, usability). The distribution of the defined code smells also indicates that many developers do not realize the importance for the reuse of smart contracts and handling unpredictable attacks. Educators can improve such conditions by helping students to better understand the impacts of the code smells. Thus, it is highly recommended that educators should pay more attention to teaching code smell related problems for smart contract development.

Vi-B Threats to Validity

Internal Validity. We used keywords to filter Stack Exchange posts. The scale of our keywords dataset determines how much manual effort we need to pay. It is not easy to cover all keywords, which means we may not cover all code smells. Due to the time and human resource limitation, we defined 20 code smells in this study, but researchers can define more code smells by using our methods. To reduce this threat, we manually labeled 2,522 smart contracts from 587 contracts addresses to validate the existing of these code smells. To provide a more stable labeling process, we followed the card sorting process, and two authors labeled the smart contracts independently. However, it is still possible that some errors exist in our dataset because of misunderstanding of smart contracts.

The impact of smart contract code smells depend on our understanding of each code smell. However, different researchers and developers may have different understandings. To minimize this threat, we read the related posts and real-world examples and discussed with several smart contracts developers to help improve the correctness. We also considered feedback and comments from our survey.

It is possible that many developers have a poor understanding of the defined code smells when doing our survey. To address this, we added an option ”I don’t understand” and removed these responses when analyzing data. Three Chinese authors of this paper translated and reviewed the survey to make sure the translation is correct.

External Validity. Solidity is a fast-growing programming language. In 2018, 9 versions were updated and released [17], which means many features may be added or removed in the future. Ethereum can also be updated through hard fork [10]. The latest hard fork named Constantinople will happen on the first half of 2019 [14]. Constantinople will add five new Ethereum Improvement Proposals(EIPs) to ensure proof-of-work more energy efficient. Some new opcodes will be added (e.g., CREATE2) and some opcodes will be modified (e.g., SSTORE). This means some new code smells may be created, or existing code smells will be modified. Thousands of new smart contracts may quickly be deployed to the blockchain. The distribution of the code smells on real-world smart contracts may change with new developments of smart contract technology. Many new posts are uploaded to the Stack Exchange, and these posts can expose new code smells. Our method can also be applied to this situation, but it needs further effort.

Vii Related Work

Vii-a Code Smells Detection on Centralized Software

Code smell detection has a long history. Webster [43]

wrote the first book about code smells in traditional desktop software development. The book introduced the pitfalls of object-oriented development to assist developers to avoid potential problems. Object-Oriented Design Heuristics  

[37] defined 61 design patterns to help developers make the right design decisions. The design patterns emphasized the relationships between classes and objects, and these patterns help developers write high-quality softwares. The term ”code smell” was first introduced on a book of Beck [28]. His book defined 22 code smells and the term ”code smell” and these 22 code smells were widely used, especially in agile development. Mantyla [34] clarified the effects of these 22 code smells and proposed classifications for them. Removing code smells can improve code robustness and development efficiency. However, decentralized apps (e.g., smart contracts) are very different from traditional software as we introduced in section 1.

Vii-B Bug Detection Tools of Smart Contracts

Oyente [33, 18] is the first bug detection tool of smart contracts, which utilizes symbolic execution to detect four security issues, i.e., mishandled exception, transaction-ordering dependence, timestamp dependence and reentrancy attack. First, Oyente builds a skeletal control flow graph for the input contracts. Then, they faithfully simulate EVM code and execute the instructions to produce a set of symbolic traces. After that, Oyente defines different patterns to check whether the tested contracts contain the security problems or not. Oyente measured 19,366 existing Ethereum contracts and found 8,519 of them contain the defined security problems.

Kalra et al. [30] found many false positives and false negatives in Oyente’s results. They developed a tool called Zeus, an upgraded version of Oyente. Their tool feeds Solidity source code as input and translates them to LLVM bitcode. Zeus detects 7 security issues, 4 of them are the same as Oyente and other 3 problems are unchecked send, Failed send, Integer overflow/underflow. To evaluate their tool, Kalra crawled 1524 distinct smart contracts from Etherscan [21], Etherchain [20] and EtherCamp [19] explorers. The result indicates about 94.6% of contracts contain at least one security problem.

Jiang et al. [29] focus on 7 security vulnerabilities, i.e., Gasless Send, Exception Disorder, Reentrancy, Timestamp Dependency, Block Number Dependency, Dangerous DelegateCall and Freezing Ether. They also developed a tool named ContractFuzzer to detect these issues. Their tool is consisted of an offline EVM instrumentation tool and an online fuzzing tool. Based on smart contract ABI, ContractFuzzer can automatically generate fuzzing inputs to test the defined security issues. They tested 6,991 smart contracts and found that 459 of them have vulnerabilities.

Nikolic [36] et al. focus on security issues that can lead to a contract not able to release Ethers, can transfer Ethers to arbitrary addresses, or can be killed by anybody. Their tool, MAIAN, takes as input data either Bytecode or source code. MAIAN contains two major parts: symbolic analysis and concrete validation. Like Oyente, simulates an Ethereum Virtual Machine, utilizes symbolic execution, and defines several execution rules to detect these security issues. Their results were deduced from 970,898 smart contracts and found that a total of 34,200(2,365 distinct) contracts contain at least one of these three security issues.

We defined 20 code smells from three different aspects. The above four papers introduce some security problems while we focus on a broader problem coverage. We do not just focus on security problems but help developers build better smart contracts. We also define patterns to help developers increase software usability and architecture.

Viii Conclusion and Future work

We conducted an empirical study to understand and characterize smart contract code smells. We first selected 4,141 warning related stack exchange posts from 17,128 posts. Then we manually analyzed these posts and defined 20 smart contract code smells from three aspects – security, architecture and usability problems. To validate our defined code smells, we created an online survey. The feedback from our survey indicates our code smells are important and addressing them can help developers improve the quality of their smart contracts. We analyzed the impacts for each code smell and labeled 2,522 real-world smart contracts from 587 contract addresses.

Two groups can benefit from this study. For smart contract developers, they can develop more robust and better-designed smart contracts. The 5 impacts can help developers decide the priority of removal. For software engineering researchers, our dataset can provide ground truth for them to develop smart contract code smell detection tools. We plan to develop automated code smells detection tools to detect these defined code smells. We also plan to extend our code smell list and dataset, when more posts will be published in Stack Exchange and, more features will be added into Solidity in the future.

References

  • [1] Satoshi Nakamoto 2008 Bitcoin: A peer-to-peer electronic cash system.
  • [2] Apr., 2018 Ethereum Foundation. Block validation algorithm. https://github.com/ethereum/wiki/wiki/Block-Protocol-2.0#block-validation-algorithm/
  • [3] Apr., 2018. marketcap. https://www.ccn.com/marketcap/
  • [4] Apr., 2018 Understanding The DAO Attack. https://www.coindesk.com/understanding-dao-hack-journalists/
  • [5] April., 2018 ERC20 https://github.com/ethereum/EIPs/blob/master/EIPS/eip-20.md
  • [6] Feb., 2019 EOS. https://eos.io/
  • [7] Feb., 2019 Hyperledger. https://www.hyperledger.org/
  • [8] Jan., 2016 EIP-55. https://github.com/ethereum/EIPs/blob/master/EIPS/eip-55.md
  • [9] Jan., 2018 StackExchange. https://ethereum.stackexchange.com/
  • [10] Jan., 2019 Blockchain Hard Fork. https://en.wikipedia.org/wiki/Fork_(blockchain)
  • [11] Jan., 2019 Checkstyle. http://checkstyle.sourceforge.net/
  • [12] Jan., 2019 DECOR. http://checkstyle.sourceforge.net/
  • [13] Jan., 2019. Ethereum Introduction. https://en.wikipedia.org/wiki/Ethereum/
  • [14] Jan., 2019. Ethereum.org. https://www.ethereum.org/
  • [15] Jan., 2019 inFusion. http://www.intooitus.com/inFusion.html/
  • [16] Jan., 2019 iPlasma. http://loose.upt.ro/iplasma/
  • [17] Jan., 2019 Releases of Solidity. https://github.com/ethereum/solidity/releases
  • [18] Mar., 2018 An Analysis Tool for Smart Contracts. https://github.com/melonproject/oyente
  • [19] Mar., 2018. EtherCamp. https://live.ether.camp/
  • [20] Mar., 2018. EtherChain. https://www.etherchain.org/contracts/
  • [21] Mar., 2018. EtherScan. https://etherscan.io/
  • [22] Mar., 2018 Solidity Document. http://solidity.readthedocs.io
  • [23] Oct., 2018 College Cryptocurrency Blockchain Courses. https://www.accounting-degree.org/college-cryptocurrency-blockchain-courses/
  • [24] Ting Chen, Yuxiao Zhu, Zihao Li, Jiachi Chen, Xiaoqi Li, Xiapu Luo, Xiaodong Lin, and Xiaosong Zhange. 2018. Understanding ethereum via graph analysis. In IEEE INFOCOM 2018-IEEE Conference on Computer Communications. IEEE, 1484–1492.
  • [25] Jacob Cohen. 1960. A coefficient of agreement for nominal scales. Educational and psychological measurement 20, 1 (1960), 37–46.
  • [26] Premkumar Devanbu, Thomas Zimmermann, and Christian Bird. 2016. Belief & evidence in empirical software engineering. In 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE). IEEE, 108–119.
  • [27] Ethereum Foundation. 2014. Ethereum’s white paper. //github.com/ethereum/wiki/wiki/White-Pape (2014).
  • [28] Martin Fowler and Kent Beck. 1999. Refactoring: improving the design of existing 1219 code. Addison-Wesley Professional.
  • [29] Bo Jiang, Ye Liu, and WK Chan. 2018. Contractfuzzer: Fuzzing smart contracts 1221 for vulnerability detection. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. ACM, 259–269.
  • [30] Sukrit Kalra, Seep Goel, Mohan Dhawan, and Subodh Sharma. 2018. ZEUS: 1223 Analyzing Safety of Smart Contracts. In 25th Annual Network and Distributed System Security Symposium (NDSS’18).
  • [31] Foutse Khomh, Massimiliano Di Penta, and Yann-Gael Gueheneuc. 2009. An 1225 exploratory study of the impact of code smells on software change-proneness. 1226 In Reverse Engineering, 2009. WCRE’09. 16th Working Conference on. IEEE, 75–84.
  • [32] Barbara A Kitchenham and Shari L Pfleeger. 2008. Personal opinion surveys. In 1228 Guide to advanced empirical software engineering. Springer, 63–92.
  • [33] Loi Luu, Duc-Hiep Chu, Hrishi Olickel, Prateek Saxena, and Aquinas Hobor. 2016. Making smart contracts smarter. In Proceedings of the 2016 ACM SIGSAC 1230 Conference on Computer and Communications Security. ACM, 254–269.
  • [34] Mika Mantyla. 2003. Bad smells in software-a taxonomy and an empirical study. Helsinki University of Technology (2003).
  • [35] Satoshi Nakamoto. 2008. Bitcoin: A peer-to-peer electronic cash system. (2008).
  • [36] Ivica Nikolic, Aashish Kolluri, Ilya Sergey, Prateek Saxena, and Aquinas Hobor. 1234 2018. Finding the greedy, prodigal, and suicidal contracts at scale. In Proceedings of the 34th Annual Computer Security Applications Conference. ACM, 653–663.
  • [37] Arthur J Riel. 1996. Object-oriented design heuristics. Addison-Wesley Publishing 1236 Company.
  • [38] Donna Spencer. 2009. Card sorting: Designing usable categories. Rosenfeld Media.
  • [39] Don Tapscott and Alex Tapscott. 2016. Blockchain revolution: how the technology 1239 behind bitcoin is changing money, business, and the world. Penguin.
  • [40] Michele Tufano, Fabio Palomba, Gabriele Bavota, Rocco Oliveto, Massimiliano 1240 Di Penta, Andrea De Lucia, and Denys Poshyvanyk. 2015. When and why your 1241 code starts to smell bad. In Proc. ICSE. IEEE Press, 403–414.
  • [41] Pradeep K Tyagi. 1989. The effects of appeals, anonymity, and feedback on mail survey response patterns from salespeople. Journal of the Academy of Marketing 1243 Science 17, 3 (1989), 235–241.
  • [42] Eva Van Emden and Leon Moonen. 2002. Java quality assurance by detecting code smells. In Reverse Engineering, 2002. Proceedings. Ninth Working Conference on. 1245 IEEE, 97–106.
  • [43] Bruce F Webster. 1995. Pitfalls of object oriented development. M& T Books.
  • [44] Gavin Wood. 2014. Ethereum: A secure decentralised generalised transaction 1247 ledger. Ethereum Project Yellow Paper (2014). 1248