Traditional financial systems comfort with the centralized environment where a trusted third party manages and validates the transactions from one party to another [RN68, RN100]. Having an intermediary or regulator to process a valuable transaction in a secured platform is essential [RN152]. Though a centralized environment is a reliable and trustworthy method, its drawbacks are manifold: The processing time for transactions may vary from one hour to a few days; the transaction cost charged by the third party service provider, such as banks or non-financial institutions, is an unnecessary expense for the user [RN153]. In consequence of these issues of the traditional financial systems, the technology advances in peer to peer network and decentralized data management were headed up as the way of mitigation. In recent years, the blockchain technology is being the prominent mechanism which uses distributed ledger technology (DLT) to implement digitalized and decentralized public ledger to keep all cryptocurrency transactions [RN68, RN33, RN69, RN126, RN177]. Blockchain is a public electronic ledger equivalent to a distributed database. It can be openly shared among the disparate users to create an immutable record of their transactions [RN126, RN98, RN81, RN82, RN96, RN36]. Since all the committed records and transactions are immutable in the public ledger, the data are transparent and securely stored in the blockchian network. A blockchain network deploys and executes the programming scripts to process a task autonomously. These programs are called smart contracts which are used to define the customized functions and rules invoked during the transactions [RN110, RN145, RN39].
Smart contracts based blockchain technology is being embedded into a wide variety of industry applications, such as finance [RN126, RN152, RN53, RN164], supply chain management, [RN154, RN155, RN156], health care [RN157, RN158, RN159, RN160], energy [RN77, RN161, RN162, RN163], IoT [RN125, RN165, RN166, RN167] and government services [RN126, RN168, RN169]. The financial technology industry has drastically increased the use of blockchain technology and smart contracts executions. It helps reduce infrastructure costs, increase transparency, reduce financial fraud, and improve the time of execution and settlement [RN33, RN39, RN31, RN170]. Some governments in the developing nations are assessing blockchain as a potential replacement for the national currency [RN96, RN97, RN171]. Because of the transparency and traceability features in the blockchain technology, the government can use a permissioned blockchain platform to regulate and analyze how money is flowing in the national financial system [RN82, RN131, RN70]. In the retail and manufacturing industries, blockchain technology helps deliver a better supply chain management and payment with digital currencies in a secure manner [RN172, RN173]. Blockchain allows patients to access their healthcare records securely without a third party verifier [RN157, RN174]. By digitizing the maritime network, the shipping industry can use a blockchain-based ledger to track millions of shipping containers in the ocean [RN175, RN176, RN178].
There are only specific blockchain platforms support smart contracts: Ethereum [RN89] was the first to support smart contracts; other blockchain platforms, such as EOS [RN92], Lisk [RN91], Bitcoin [RN90], RootStock [RN93], and Hyperledger Fabric [RN94], are compatible to deploy and execute the smart contracts. A script type language called Solidity is used to develop smart contracts in Ethereum platform. This paper focuses on smart contracts on the Ethereum network. Smart contracts facilitate to develop decentralized applications and perform credible transactions without third parties. Following the pre-defined rules, smart contracts provide trustworthy services as an intermediary during the execution of the transactions. All smart contracts are stored in a distributed consensus environment. That is, once they are deployed to the network, nobody can modify them so that the functions in the deployed smart contracts are immutable. In Ethereum, smart contracts are considered as an account — they can hold cryptocurrency and transfer between externally owned user accounts and other smart contracts [RN61, RN64]. Since the deployed smart contracts often hold a significant amount of coins [RN98] and perform critical functions [RN145], they should be tested and analysed before the deployment [RN76, RN108]. However, there are several challenges in smart contracts development using Solidity language:
Users and developers have lack of knowledge about the usage and implementation of smart contracts since the technology is still in an early stage [RN179].
There are limitation of defined best practices for the programming and testing methods [RN180].
If any errors identified or detected after the deployment of smart contracts, they cannot be patched and redeployed in the same manner of traditional software updates [RN169, RN142, RN182]. On the contrary, the erroneous smart contract are usually terminated by the owner before an updated contract is deployed.
Considering these challenges and issues in smart contract programs in Ethereum, we have come with the following key research questions.
What are the major attacks occurred in Ethereum smart contracts applications, which caused significant worth of loss in crypto assets?
How are the vulnerabilities in smart contracts affect the systems and how are they exploited by the attackers during the attacks?
What are the security analysis methods available to validate and verify the problems in smart contract programs?
There are several security analysis tools and formal verification methods for identifying the vulnerabilities in smart contracts in Ethereum [RN49, RN86, RN28, RN121, RN116, RN118, RN119, RN129, RN80, RN47, RN115, RN129, RN130, RN140, RN220]. They used different types of technical methods to implement their security analysis on smart contracts bytecode or source code. The existing surveys were often conducted in general with the comparison of a limited number of software tools with their coverage of important vulnerabilities [RN50, RN73, RN74, RN111, RN120, RN117, RN214]. Only very few surveys investigate the challenges and security problems in the whole blockchain system [RN124, RN74]. The three most important surveys are listed below:
Atzei et al. [RN50] surveyed the past security attacks and possible challenges on Ethereum smart contracts.
An empirical analysis of smart contracts was conducted by Bartoletti et al. [RN73].
Li et al. [RN74] reviewed the security of blockchain systems. The security issues of smart contracts in Ethereum network were analyzed in the risk perspectives.
Different from the existing surveys [RN50, RN73, RN74, RN111, RN120, RN124, RN206, RN213, RN243, RN72], this paper aims to specifically analyse the vulnerability detection methods for Ethereum smart contracts in the context of the identified security attacks [RN65, RN66, RN50]. The taxonomy of dependencies in smart contract vulnerabilities are illustrated in Figure 1. We identify the needs of a comprehensive study on security analysis methods [RN49, RN140] of vulnerable smart contracts on Ethereum platform. Moreover, this paper is different from the existing security surveys because we investigate the specific security problems of smart contracts. The major contributions of this survey are as follows:
We identify the security problems and vulnerabilities in Ethereum smart contracts which have caused severe attacks [RN65, RN66, RN50], and significant loss of cryptocurrency [RN183].
We categorize the existing security analysis methods in terms of static analysis [RN49, RN28, RN121, RN116, RN119, RN47, RN120], dynamic analysis [RN86, RN80], and formal verification methods [RN45, RN118, RN113, RN148, RN115, RN128, RN129, RN130, RN140, RN149, RN117, RN221, RN240].
We compare the analysis methods with the vulnerability findings [RN142, RN49, RN50, RN111, RN124, RN242], and coverage using their applications, such as vulnerability scanning tools [RN49, RN28, RN121, RN116, RN119, RN47, RN86] and verification models [RN45, RN118, RN113, RN148, RN117].
This survey selects and presents the papers published in high quality journals and presented at the top conference. The keywords we used to search are “blockchain”, “Ethereum”, “smart contract”, “security analysis”, and “vulnerability”. Around 125 research papers from best quality journals, transactions and conferences are included in our survey.
The rest of this survey is organized as follows: Section II introduces the basic theory of the Ethereum network and smart contracts. Section III covers the major attacks occurred on Ethereum smart contract applications in the recent years. Section IV lists the important vulnerabilities in Ethereum smart contracts with respect to the related attacks. Section V presents different types of security analysis methods of smart contracts. Section VI compares the analysis methods using their applications and providing a summary of vulnerability identification and possible solutions. Section VII covers the research challenges, future research direction and conclusions of this survey.
Ii Background Information
This section briefly provides the theoretical knowledge of Ethereum platform, Ethereum accounts, and the execution of the Ethereum smart contracts.
Ii-a Ethereum Platform
Ethereum [RN98] is an open software platform based on the blockchain technology. The developers can implement, compile, test, deploy, and execute the centralized applications in Ethereum network. The Ethereum Virtual Machine (EVM) [RN61] is an abstract machine designed to serve as a run-time environment for Ethereum smart contracts. EVM runs as an independent process on a server or a computer. An Ethereum network is a distributed and decentralized network with permission-less untrusted peers [RN184, RN185, RN186].
Ethereum network has two types of accounts: One is externally owned user account controlled by the private key, and the other one is smart contract account controlled by its compiled programming code [RN187]. User accounts contain no code and can send messages to other accounts by creating and signing a transaction using their private keys [RN61]. The recipient account can identify the sender by using the sender’s public key. Like autonomous agents, contract accounts in Ethereum always execute a specific sequence of code according to the pre-defined rules when the smart contracts are invoked by a transaction [RN108].
Each Ethereum account is 20 bytes long [RN61], and it consists of a unique address, the current balance in Ether, the data storage, and a nonce [RN187]. A nonce is a counter ensuring that each transaction can only be executed once. Ether is the primary cryptocurrency denomination in Ethereum which is used to process the transactions and pay the transaction fees.
Ii-B Smart Contracts
Smart contracts in Ethereum are computer programs written by programming language called ‘Solidity’ [RN179, RN87, RN188]. The compiled bytecode are deployed in EVM. Any rules and functionalities can be written using compatible programming language and encoded as a smart contract to invoke whenever an action is required by users or other smart contracts. They can implement various kinds of applications of financial instruments such as cryptocurrency management (ABCC, AlterDice [RN195]), crypto wallets (e.g., MyEtherWallet [RN192], MetaMask [RN193], and MyCrypto [RN194]), and autonomous governance applications [RN182, RN190]. Smart contracts are called by users by referring transactions to the contract address. If the transaction is agreed across the network, all the peers have to execute the contract code with the current state of the blockchain with the relevant input parameters [RN182].
The Ethereum Network [RN98, RN89], one of the leading blockchain platforms, supports the execution of smart contracts enforced by the consensus protocol. Etherscan [RN62] is an analytic platform of Ethereum which used to explore blocks, accounts, transactions and statistic data. More than 1,000,000 smart contracts have been deployed in Ethereum platform. We consider a real-world example where we can see how smart contract acts as trusted intermediary between the users. As shown in Figure 2, two users (a seller and a buyer) do business through an application with a smart contract. The whole transaction completes in five steps: In step 1, the buyer sends the required amount of Ethers to the smart contract’s address so that the smart contract holds the balance in escrow. In step 2, the smart contract notifies to the seller by triggering an event indicating the recipient of the buyer’s request. In step 3, the seller checks and verifies the buyer’s request — if the request is valid and there are enough Ethers to purchase the required item, then the seller will ship the item and inform the smart contract with the shipment message. In step 4, after the buyer receives the item, the smart contract is updated with the delivery status. In step 5, the smart contract releases the Ethers to the seller’s account.
Ii-C Software Security Vulnerabilities
Howard et al. [RN228]classified the 19 software security issues and how they affect the software programs in different ways. These are the major software vulnerability categories and followed and cited by plenty of researchers [RN229]. We referred these 19 vulnerability categories and mapped them with Ethereum security vulnerabilities in our analysis.
Iii Unique Security Attacks against Smart Contracts
This section covers the important security attacks and relevant vulnerabilities of smart contracts implementation on the Ethereum platform. Smart contracts can hold and manage a large amount of virtual currencies which could be worth thousands of dollars [RN49, RN169]. Therefore, the adversaries keep attempting to manipulate the execution of smart contracts in favor of their activities. In nature, smart contracts are running on the distributed and permission-less networks, which inherits many vulnerabilities [RN49, RN50]. The attacks occurred due to the malfunctioning of the smart contract execution lead to a massive amount of loss in virtual currencies [RN65, RN66]. In a traditional system, the buggy applications running in a centralized environment can be redeveloped or patched [RN196]. On the contrary, in a decentralized blockchain network, the deployed smart contracts cannot be modified or upgraded in a live network unless extreme measures are taken [RN169, RN109]. The immutable nature of smart contracts makes pros and cons in the means of security aspects. Because of this immutablity, hackers are unable to make changes or modify the contracts for their benefit. However, the smart contract applications cannot be modified even by the developers after the deployment. They can kill or terminate the contract and create new smart contract and deploy it again. Therefore, before the deployment, the smart contracts should be thoroughly tested with a wide range of test cases for security and safety reasons.
Iii-a The DAO Attack
In June 2016, the DAO hack occurred when the attacker managed to steal more than 3.6 million Ethers [RN65]. The DAO attack was caused by a re-entrancy problem [RN49, RN50] existing in the smart contract. The re-entrancy problem allowed the attacker to exhaustively execute recursive calls for requesting and receiving funds from the vulnerable DAO.sol contract listed below. That is, the attacker kept withdrawing Ethers by requesting the DAO smart contract before updating the balance of smart contract. The withdraw function of the target contract (DAO.sol) was called recursively until the contract balance reached zero. More specifically, the attacker embedded the withdraw function in a fallback function of the smart contract DAOAttacker.sol. The fallback function is a default function in the Ethereum smart contracts and can be declared without any explicit function name. Because the fallback function is automatically called whenever the attacker receives any funds, the smart contract inherently calls the embedded withdraw function. This setup allowed the attackers to call the withdraw function recursively before the user’s balance is updated, e.g., before sending any funds.
|Major Attacks||Ethereum Vulnerabilities||Available Solutions||Software Security Issues|
|The DAO attack (2016) [RN65]||Re-entrancy on a single function||Use send() instead of call.value()||Failing to store and protect data|
|Re-entrancy on cross functions||Use send() or transfer() to send funds||Race conditions|
|Re-entrancy on external contract functions||Do internal state changes first and then call external function; use a mutex when the external calls are unavoidable||Improper file access|
Parity Multi-Sig Wallet Attack (2017) [RN66]
|Public functions are callable by anyone||Use the internal modifier for functions instead of public||Information leakage|
|(No access modifier assigned properly)||Explicitly define library functions for the external invocations||Improper file access|
Over/Under flow attack (2018) [RN50]
|Integer underflow and overflow||Check if the integer stays in its byte range before any send operations||Integer range error, Buffer overflow|
A sample target contract named DAO.sol and an attacker contract named DAOAttacker.sol are listed to explain the technical details. First, the attacker sends some Ethers to the DAO contract by invoking the credit function in line 7–9 of DAO.sol. The attacker’s balance is updated by the DAO contract according to the amount of Ether in line 8 of DAO.sol. Then, the attacker sends a request to withdraw the fund. According to this withdraw request, the fund is sent back to the attacker’s contract (Line 17–21 of DAO.sol). After the funds are received, the fallback function is called for a continuous withdrawal as per line 15 of the contract in DAOAttacker.sol. Since the target contract has not updated the attacker’s balance yet, the withdrawal request will be successfully executed. This repeating process ended up stealing the all available funds from the target contract. Finally, the attacker transfers the stolen funds from the DAOAttacker.sol contract to a pre-defined personal account address (Line 20 in the DAOAttacker.sol contract).
As listed in Table I, the DAO attack caused by a Re-entrancy problem as an Ethereum vulnerability is related to a few software security issues including the improper file access problem, race condition issue, and failing to store and protect data. The solidity programming practice, namely the call method, has caused the attacker to invoke the withdraw method of the fallback function. Since the balance is updated after invoking the call method, the data is not properly stored or protected at the correct time. The intermediate state of the data or balance was taken and mistreated by the attacker to his beneficiary action. The actual problem is entirely caused by a smart contract programming error not by the Ethereum network. Any network which had this type of erroneous smart contact would facilitate the re-entrance hack.
For the immediate solution for this attack, there were many arguments of deciding how to refund the funds to the victim and terminate the hacked DAO contract. The hard fork mechanism overwrites the history of transactions by reversing them to the starting state. However, the hard fork did not prevent all Ethereum users to go along with the old main branch. The Ethereum branch created with hard fork is running as original Ethereum, and the old branch is keep working as the Ethereum Classic [RN230, RN231]. The DAO attack has triggered the Ethereum developers to enforce proper coding regulations and practices on smart contract development because the blockchain’s immutability and smart contract’s deterministic features are hard to resolve sudden attacks.
Iii-B Parity Multi-Sig Wallet Attack
The parity multi-sig wallets are smart contract programs which are used to manage digital assets by the wallet users [RN66]. The important data, such as daily withdrawal limits, ownership information, and withdrawal voting, are configured and stored in these wallets [RN197]. If a user wants to own a multi-sig wallet, the user should have multiple signatures (that is, private keys) to withdraw funds from the wallet. This signature requirement strengthens the security of the wallet, especially those that are involved in the transactions with significant worth of crypto assets [RN198]. Some of the frequently used functions and logic of the parity multi-sig wallet are implemented in a public library [RN207]. This shared wallet library is available to every parity multi-sig and supports the essential methods, such as withdrawing fund, setting withdrawal limit, depositing fund, and so on. Multi-sig wallets are able to call these external public functions from their contracts [RN198]. The centralized setup of this library becomes a target of attacks. The parity multi-sig wallet attack occurred when the attacker managed to initialize the public library as a multi-sig wallet and subsequently gained the ownership right and the killing right [RN197]. Since all wallets depend on this public library, their deployed contracts were useless against the attacker. Around 151 wallets were frozen with their balances reaching 15,153,037 Ethers in total [RN66]. This attack is the second largest attack on the Ethereum network in terms of the amount of stolen Ethers [RN142].
The code snippets of the two contracts are shown as WalletLibrary.sol and MulitisigWallet.sol. The attacker’s first transaction was sent to the wallet contract to claim the ownership of the multi-sig wallet. The second transaction was sent to withdraw all the funds from the wallet. In the contract WalletLibrary.sol, the initWallet function initializes a wallet with the parameters of day limit, array of owners or signers and the required number that needed to confirm a transaction. This is a constructor written in the external wallet library and it is publicly available for invocation by anyone using the delegate calls [RN142]. After the attacker claims the ownership with the multi-sig wallet, all the funds available in the wallet can be stolen [RN66]. The function delegatecall is called by a wallet instance as in Line 8 of WalletContract.sol. The main problem caused by this attack is that all the public functions, such as initDayLimit and initMulitowned, in the WalletLibrary.sol contract can be called by anyone without authorization. There was no access modifier used to restrict the invocations from anonymous callers. The modifiers internal or private can be used for the functions to be called within a contract or from derived contracts [RN199].
The parity-multisig wallet attack was related to a few software security issues including improper file access and information leakage problem according, as shown in Table I. The call to an external library caused the problem since the library function did not have proper access control. Thus, the attack mainly focused on the weak library and non-restricted invocations to the external wallet library functions. The non-updatable nature of the blockchain enables the attackers to target the problematic libraries as well as smart contracts to attack the smart contract applications. The initialization logics were developed in the library constructor. Despite this concept of abstraction is good for re-usablitiy, it facilitates the hackers to invoke a call delegatecall to the library functions and gain the full control of the library.
The majority of parity users did not agree to perform another hard folk for refunding the locked Ethers from the affected wallets [RN232]. The hard folk applied in the DAO attack split the Ethereum network into two networks, and the hackers’ stolen funds are still valid in the Ethereum Classic version [RN232]. A white-hat recovery team promised to provide a new parity wallet for each affected wallet with the restored settings same as the ones before the attack. They could recover the remaining fund in the frozen wallets and remove the vulnerability from the wallet contracts. Afterwards, it is recommended for Solidty developers to adopt the private modifier by default to restrict the access for all contract functions [RN233]. This restriction will disable the malicious function calls to wallet library functions by anonymous users.
Iii-C Integer Overflow/Underflow Attack
The Proof-of-Week-Hands (POWH) Coin is a Ponzi scheme developed by a group of people using smart contracts. It had been attacked due to an integer overflow/underflow problem in 2018. The attacker drained around 2,000 Ethers because of the insecure operations of integers [RN200]. An unsigned integer in Solidity is defined as uint256 [RN87]. Each uint256 is limited to 256 bits in size translating to any integers between 0 and 4,294,967,295 (). If an integer variable assigned to a value larger than this range, it resets to 0; if the variable assigned to a value less than the range, it would be reset to the top value of the range [RN87]. For example, when a positive number is subtracted from 0 it will result an integer of . The attacker exploited this vulnerability to steal Ethers through such an integer underflow attack [RN50].
If an attacker has a target account holding 0 Ether, an attack example works as the following steps: First, the attacker sends 1 Wei to a target contract. (Wei is the smallest denomination of Ether in Ethereum — 1 Ether is worth Weis [RN201].) The target contract will deposit the fund to the sender’s account. Next, the attacker requests to withdraw 1 Wei, and the sender’s balance will be updated to 0 Wei by subtracting 1 Wei. When the target contract sends the fund to attacker’s contract, the attacker’s fallback function will be triggered so that a subsequent withdrawal is requested again. Now when the contract updates the balance by subtracting 1 from 0, the balance becomes -1. Due to the integer under/over flow issue, the attacker’s balance will be automatically reset to 2 Weis. Using a repeating mechanism similar to the re-entrancy problem in the DAO contract, the attacker is able to steal all funds from the victim’s account.
Furthermore, the solidity compiler does not trigger any error flag to resolve the code with integer overflow/underflow problems. The integer overflow/underflow problem can be mitigated through using the arithmetic functions in the Solidity math library named SafeMath.sol [RN234]. It supports safe mathematics operations, such as addition, subtraction, and multiplication, while preventing the integer overflow/underflow issues.
Solidity language is less flexible since it has limitations on the value/integer types and length [RN69]. Several memory error detection techniques have been proposed for C and C++: The StackGuard automatic buffer overflow detection [RN235], PointGuard protection [RN236], baggy bounds checking [RN237], and the light weight bounds checking [RN238] are popular choices for bounds checking C and C++ programs. Since these bounds checking problems exist widely in Solidity language, prevention mechanisms should be developed to perform proper bounds checking as in C and C++. An overflow detector named EasyFlow [RN239] can identify the manifested overflows, well-protected overflows and potential overflows in vulnerable smart contracts.
Iii-D The Learned Lessons
According to our analysis on the major attacks occurred on Ethereum smart contracts, the Parity multisig wallet attack made severe impacts to the Ethereum by causing the hard fork, even though the attack was technically simple. The vulnerability was affected in both wallet contract and external library contract. It is challenging to detect the deployed libraries that leak the information and set inappropriate level of the control without proper access modifiers. These library contracts can self-destruct caused by malicious users with an escalated privilege. These attacks are simple and straight-forward because it is obviously abnormal to lock or freeze the smart contracts holding a significant amount of funds after a function call. The erroneous or vulnerable contracts are deployed to the Ethereum network without proper security checks, quality assurance tests, or following the best coding practices in Solidity.
The combination of vulnerabilities in Ethereum blockchain and Solidity programming language makes the security checks more challenging in smart contracts development [RN179]. Compared to native languages like Java, C and C++, the Solidity language is not very mature as a scripting language. Since integer types are fixed in size with 256 bits, the buffer overflow/underflow bugs in Solidity make erroneous smart contracts. Furthermore, the mapping data type in Solidity will not throw exception even if there is no key-value pair, instead it simply returns the default value. This nature can allow the attackers to execute the malicious codes by passing the parameters to the attackers’ advantage into smart contract functions with the mapping data type. Since Solidity functions can be recursively called, it lacks the tail call support [RN136]. Thus, the depth of recursive calls can be defined exclusively through input variables of the smart contracts.
In addition to the well-known attacks, there are more vulnerabilities in smart contracts. Many of them are proven to be problematic. They make less impact than the attacks, but they present a landscape of the security issues of smart contracts which is investigated in Section IV.
Iv Key Vulnerabilities in Smart Contracts
In this section, we discuss the key vulnerabilities which would cause serious problems in smart contracts applications. Re-entrancy problem, Transaction ordering dependency problem, Timestamp dependency problem and Exception handling issues are causing vulnerable patterns in smart contract execution as well as in their code. Developers should aware of these issues and have to follow quality assurance test cases carefully before they deploy their contracts into live Ethereum or any blockchain platform. Further we investigated 16 Ethereum vulnerabilities as shown in Table II. It describes Ethereum vulnerabilities and their related attacks. Also it maps relevant software security issues as categorized in [RN228] with the identified key Ethereum vulnerabilities.
Since smart contracts are executing asynchronously, the transaction ordering problem is a common attack vector. This problem can be cured using a locking mechanism which will keep an order or counter for each transaction to execute by first-in-first-out manner. Timestamp dependence problem is a prominent issue that uses block timestamp in critical operations. It is recommended to avoid assigning block timestamp to a variable in smart contract code. Instead of timestamp value, block number can be used for a constant variable. Exception handling problem is one of major problem in solidity programming. Developers can handle this problem by having best practices and exception try-catch mechanisms. The latest versions of solidity compiler also aware of this issue and giving warning or error message when compiling a code without having a proper exception handling implementation.
Iv-a Re-entrancy Problem
As illustrated in Section III.A, the DAO attack was occurred due to re-entrancy problem in smart contracts. The solidity smart contract has an unnamed function called fallback function that does not have any arguments nor return values. The call function is used to invoke a method of external contract or the same contract to transfer Ethers. This function does not throw any exception if any errors prompted, but it returns false otherwise true. This call method executes without a gas limit if it has not being set any gas value manually. If a contract invoke a call method to send an amount to sender’s account, it will call sender’s fallback function. Since there is no gas limitation for call method invocation, any code inside the fallback function would be executed until it finishes the remaining gas amount. This vulnerability is called re-entrancy in Ethereum smart contract and it was the serious attack vector for the DAO attack. A dynamic analyzing tool called ReGuard [RN241] detects the re-entrancy problem in smart contracts with the identification of unknown problems.
|Ethereum Vulnerabilities||Vulnerability Mechanism||Related Attacks||Software Security Issues|
|Re-entrancy problem||Recursively calling a function from a fallback function||The DAO attack||Failing to store and protect data|
|Transaction ordering||Inconsistent transactions’ orders with respect to the time of invocations||-||Race conditions|
|Block timestamp dependency||Constant variables are assigned to block timestamp value||-||failing to use cryptographically strong random numbers|
|Exception handling||Failing to check the return values after a function call||The DAO attack, Integer Over/Under flow attack, King of Ether Throne attack||Failure to handle errors|
|Call stack depth limitation||Exceeding the limit of number of calls to a contract method||-||Buffer overflows|
|Integer overflow/underflow||Subtracting positive integers from zero results big value||Integer Over/Under flow attack||Integer range errors|
|Unchecked and failed send||Send Ethers without checking the conditions||The DAO attack||Failing to store and protect data, Failure to handle errors|
|Destroyable / suicidal contract||Contract is susceptible to be destroyed by unauthorized users||Parity Multisig Wallet attack||Improper file access|
|Unsecured balance||The Ether balance in a contract is exposed because of the modifier public to theft by an anonymous caller||The DAO attack, Parity Multisig Wallet attack||Failing to store and protect data|
|Misuse of ORIGIN||Contract authenticates using the return value of ORIGIN rather than CALLER||-||Failing to store and protect data|
|No restricted write||Writes to storage variable is restricted by the modifier private||Parity Multisig wallet attack||Failure to store and protect data|
|No restricted transfer||Ether transfers cannot be invoked by any user who is independent to the sender||The DAO attack, Parity Multisig wallet attack||Failure to store and protect data|
|Non-validated arguments||Arguments in a contract function should be validated before its use||Integer Over/Under flow attack||Failure to handle errors|
|Greedy contract||Locking the contract fund or Ether balance indefinitely||Parity Multisig Wallet attack||Improper file access, Failure to store and protect data|
|Prodigal contract||Leaking fund or Ether balance to arbitrary users||The DAO attack||Information leakage|
|Gas overspent||Contract code execution consumes more gas unnecessarily||-||Poor usability|
Iv-B Transaction Ordering Dependency
A block includes a set of transactions, and the blockchain state is updated several times during each epoch[RN49, RN74]. The state of a smart contract is jointly determined by the value of its fields and the current balance [RN202]. In most cases, when a user initiates a transaction to invoke a smart contract in the network, there is no guarantee on whether the transaction will run in the same state that the contract was at the time of the initialization of the transaction. The actual state of the smart contract is unpredictable by any user when it was called by the user’s transaction [RN49, RN50, RN74].
If a new block on a blockchain includes two transactions to invoke the same contract, then the users have no certain knowledge of which state the contract is at when their individual invocation is executed. As shown in Figure 3, if user1 and user2 respectively send transaction and to a smart contract at same time , both users do not know which transaction will first run. And the order of these transactions are determined only by the miners of the block. Even if user1 sends transaction before user2 sends , is not guaranteed to run before . If is executed first, it will change the contract state from state to state ; but if the is executed first, it will change the contract state from state to state . Therefore, the final state of a contract depends on the order of transaction execution which is determined by the block mining order.
This problem is critical in the real-world situations where buyers and sellers use smart contracts for their decentralized stock market operations as implemented in the StockMarket.sol contract shown below. Sellers will often update the price of their selling items, and buyers will send their orders to purchase those items with the expectations of the price as they observed when they sent the transaction. In the worst case scenario, buyers may have to spend significantly more than their expected price for the requested item.
Iv-C Timestamp Dependency
The smart contract uses the block timestamp as an initial condition to execute some critical operations. Usually the timestamp is set to the system time of the miner’s local computer or server [RN49, RN50]. When a block is mined, the miner has to generate the timestamp for the block. The timestamp of a block can vary by approximately 900 seconds comparing with other blocks’ timestamps [RN49, RN201]. If a miner received a new block after the validity conditions are confirmed, the miner will check whether the timestamp of the received block is greater than the timestamp of previous block and whether his local machine timestamp is not greater than 900 seconds from the received block’s timestamp [RN49]. Because of this flexibility in setting the timestamp of a block by miners, an adversary or malicious miner can choose different block timestamps to manipulate the outcome of timestamp dependent smart contracts. If a contract is using the current time (), starting time () and ending time () based on the timestamp of the block, that means that the miner can manipulate the timestamp for a few seconds by changing the output for the miner’s favor [RN49, RN50].
The following code snippet of TheRun.sol contract uses the block’s timestamp value to generate a random number which is subsequently used in a critical operation for the calculation. In line 2, a private variable is assigned to the timestamp of the block as a random number. In the random function, the variable is used to calculate the values of parameters , , and . And it returns the calculated number whenever the function is externally called.
The following code implements the condition where the random function is called in line 4. The return value of random function is calculated by the block’s timestamp and assigned to the variable . Then the variable is checked for a condition — if it is successful, then it will run the send function as a critical call. A malicious miner can take advantages by modifying the local system’s timestamp to trigger this call.
Similarly, there are smart contracts which use the block hash value on crucial components. It is not recommended, because the malicious miners can still manipulate the timestamp in order to modify the execution output.
Iv-D Mishandled Exception Issues
In Ethereum, a smart contract often needs to call another to fulfill the required functionalities [RN108]. These calls are conducted by either sending instructions or calling a contract’s method directly with reference to the contract’s name [RN49, RN50]. In the callee contract, there may be exceptions raised so that the callee contract will terminate and revert its state while returning a false value to the caller contract [RN49, RN50]. The exceptions can be caused by many situations, such as there is not enough gas to execute the operation, the call stack limit is exceeded, some unexpected system error occurs in the callee node, and so on [RN49, RN203]. The exception thrown in the callee contract should be propagated to the caller, and the return value should be explicitly checked in the caller contract to verify whether the call has been executed successfully or not [RN39, RN49, RN50, RN118]. In several instances of smart contract calls, there are inconsistencies in the exception propagation policies [RN49], which posts threats in the real-world transaction.
A malicious user can invoke a caller contract and cause its send function to fail purposefully. The call-stack depth is the maximum time a function can be called iteratively [RN108, RN49, RN50]. The Ethereum Virtual Machine sets the call-stack depth limit to 1,024 frames [RN98]. If the 1024-frames limit is exceeded, the EVM will throw an error. The value of the call-stack depth is increased by one if a function is called at once. An attacker can use this feature to intentionally interrupt the execution by calling a contract itself for 1,023 times [RN108, RN98, RN49].
An example of a contract which is vulnerable to the call-stack depth exceed problem is a Ponzi scheme implementation [RN62]. The SimplePonzi.sol contract is shown in the following code snippet. This contract is used to pay interest to the investors according to their amount of investments and the order of the investments. An attacker can exploit the call-stack limit to gain benefit by getting his/her interest earlier. And the attacker can intentionally make other investors payments fail by increasing the call stack depth to 1,023. Having executing these calls, the attacker will make his/her payment to receive the interest earlier than other investors since their payments are terminated or unsuccessful.
According to the Ethereum documentation [RN201], using the send function is dangerous and causes many problems. For instance, a transfer fails if the call-stack depth is over 1,024 frames that can be deliberately forced by a malicious caller; and it fails if the recipient runs out of gas. Therefore, in order to safeguard Ether transfers, the return value of any function call should be always checked [RN108]. It can be any invocation of functions used in the contract itself or another contract [RN201, RN49, RN50, RN108]. To prevent the unchecked-send bug [RN50, RN28], the error should be handled in the caller statement manually; otherwise, it can lead an attacker to execute the unwanted or malicious codes into the contract to rob off its balance.
Iv-E Sequential Execution of Smart Contracts
Blockchain network such as Ethereum supports the sequential execution of transactions on smart contracts with a consensus mechanism [RN179, RN40, RN29, RN204, RN39, RN179]. In a sequential execution, the requests to the smart contract invocations are ordered by the consensus method. Then, the smart contracts are executed in the same order on all the nodes. This method has many performance limitations and drawbacks in the blockchain-based applications [RN70]. In particular, the most severe problem is that effective throughput of blockchain application is affected due to the sequential operations. The throughput is inversely proportional to the latency of execution [RN82], which causes the performance bottleneck. Hence, a malicious user can try to introduce a smart contract which may take very long time for its execution. This action will subvert the performance of the network by delaying the traffic of subsequent transactions.
The sequential execution of smart contracts causes the performance issues by limiting the number of contracts executed per second. The performance in the execution rate of transaction will affect by the sequential execution pattern. The number of smart contracts that can be executed per second will be limited. Vukolić et al. [RN70] proposed to execute the independent smart contracts in parallel to significantly improve the throughput of the transactions. Furthermore, the blockchain-based applications could not be scaled with the growing number of smart contracts in the future [RN205].
Iv-F Other Ethereum Vulnerabilities
Call stack depth limitation: The call stack depth limit is 1,024 frames in the EVM implementation. When a contract invokes a call or send function to call another contract, the call stack depth increases by one. This setup allows an attacker to exploit a contract by calling itself for 1,023 times before invoking a send function, which exceeds the call stack depth limit [RN49]. The attack exploited on the KingOfEtherThrone smart contract (KoET) due to the call stack depth limit purposefully exceeded by calling the attacker’s contract 1,023 times before invoking a call function to claim the throne.
Integer overflow/underflow: The integer type unit256 in Solidity has a limited size up to 256 bits. If the value of integer variable reaches its maximum value as , then it will automatically be reset to zero when an additional integer 1 is added to the variable. Hackers are keen to target these variables in smart contract to make vulnerable by increase or decrease the value of integers until they reach to the maximum or minimum value [RN244].
Unchecked and failed send: The use of send instruction to send money to another contract or user may fail to send the value to the recipient for reasons like exceeding gas limit or the insufficient amount of Ether in balance. But it will not throw any exception or error message to the contract. If there is no exception handling implemented at invoking send method, the balance would be updated as if it has been sent.
Destroyable contract: A destroyable contract [RN34] refers to the smart contract subject to be terminated or killed by an anonymous suicide instruction called by any external user account or another smart contract. The self-destruct function in the smart contract is usually executed by its owner whenever an attack or emergency incident is detected. The self-destruct function should be aware of the user who is executing it, and it should allow the kill method invoked by the legitimate owners only.
Unsecured balance: If the balance of any smart contract is exposed to be drained off by a hacker or anonymous caller, the contract is vulnerable with unsecured balance. It can be caused by the improper access control mechanism for balance variable and constructor functions or updating balance after invoking call instruction to send money to another contract or arbitrary user [RN34, RN121].
Use of ORIGIN: In an Ethereum Virtual Machine, the account address initiating the transaction is returned by the keyword ORIGIN; the account/contract address executing the current invocation is returned by the keyword CALLER [RN121]. If a contract has a code that validates the authentication of account/contract that invokes the current message call using ORIGIN, then it is prone to be an erroneous contract.
No Restricted write: If there is a possible write operation to the storage without any restricted condition, then it allows the attackers to exploit the contract [RN28]. The parity multisig wallet was hacked because of the absence of restricted write to the storage variable. Therefore, the attacker could set the ownership of wallet library without any condition or proper authorization checks [RN66].
No Restricted transfer: The call method of Ethereum transfers Ethers between accounts or smart contracts. Despite its convenience, it is not the best practice to have call invoked by arbitrary users. The contract that has no user restriction of sending Ethers through the call function is vulnerable to no restricted transfer. In the DAO attack, the contract sends Ethers to the withdrawer using the call method. This is one of the causes to invoke a fallback function of the attacker’s contract and subsequently drain off the money repeatedly using the re-entrance property.
Non-validated arguments: Most Solidity functions in Ethereum smart contracts need a few arguments. The arguments in a function are the parameters passed during an invocation of a method or a transaction. The arguments are used in the method for several operations and computations as the required logic. These method arguments should be checked and validated before passing to the method call since the unchecked arguments may cause malicious actions during the execution of the method.
Greedy contract: The smart contracts that are remaining active and keep locking Ether balance continuously due to the inability to access the external library contracts to transfer or send fund. These contracts are defined as greedy contracts according to [RN34]. If the library contracts are terminated or destructed by an arbitrary user either intentionally or accidentally, the contracts that call the external library functions are becoming greedy contract [RN34]. The attackers made the Parity Multisig wallets contracts as greedy contracts by claiming the ownership of the wallet library contracts and subsequently destructed them to freeze the money in the wallet contracts [RN66].
Prodegal contract: Ethereum smart contract functions are used to refund the owners after an attack. They transfer Ethers to the addresses who have sent the fund previously or to whom they have provided a solution for a specific problem. These sending process is saved as transactions and contracts are aware of the recipients. In some cases, the contracts are transferring money to arbitrary recipients who have never intervened with these contracts and no data about those addresses. In this scenario, the contracts which send fund to the anonymous users are called Prodegal contract [RN34], since their sending function can be invoked by any user to send fund to the list of addresses by the sender’s choice.
Gas costly pattern exists: The solidity code in Ethereum smart contracts are implemented with expensive patterns which cost more gas during execution of each instructions. There were seven gas costly patterns in contract code identified in [RN47]. These patterns were detected by a tool called GASPER. However, the smart contract developers should be aware of their coding practice and optimize the code before they deploy the contracts to the live Ethereum network. It would save contract user’s money from spending more gas for the execution of contract methods.
V Security Analysis Methods on Ethereum Smart Contracts
Smart contracts in Ethereum are autonomously intermediate during the execution of transactions. Although they facilitate the blockchain-based applications, there are many security risks and vulnerabilities in the smart contracts. One of the critical challenges in smart contracts is that they are immutable and cannot be upgraded or patched once deployed to the blockchain network. If users’ requirement is changed or any errors is found later on their deployment, they cannot be modified like traditional software applications. Furthermore, it is difficult to test smart contracts during their run-times. Because they interact with other smart contracts and invoke many external off chain services repeatedly and continuously. The attackers are very keen to exploit the bugs on smart contracts since these contracts hold significant value of crypto assets. Their effort would be worth to obtain much benefits by stealing fund from smart contracts.
|Types of analysis||Methodologies||Input type|
|Static Analysis||Symbolic execution||bytecode|
|Control Flow Graph construction||bytecode|
|Rule-based analysis||solidity code|
|Dynamic Analysis||Execution trace at run-time||bytecode|
|Transaction graph construction||bytecode|
|Validation of true/false positives||bytecode|
|Formal verification||Using theorem provers||bytecode|
|Translation of formal language||solidity code|
|Construction of program logics||bytecode|
We categorize the security analysis methods of smart contracts in three types — static analysis, dynamic analysis, and formal verification methods. Table III lists the security analysis methods for detecting smart contracts vulnerability using different methodologies and input types. There are several symbolic execution tools to find code vulnerabilities in smart contracts, such as OYENTE [RN49], MAIAN [RN86], ZEUS [RN28], GASPER [RN47], Securify [RN116], Mythril [RN208], and SmartCheck [Rn209]. Formal verification methods are high-level analysis on Ethereum bytecodes using theorem provers, such as isabelle/hol [RN118], KEVM [RN140], and Coq [RN113, RN119]. This section briefly introduces these analysis methods and compares them with examples. The systematic mapping between identified Ethereum vulnerabilities, detection tools and attacks are presented in Figure 4.
V-a Static Analysis
Static analysis is a way of analyzing a computer program or compiled code in a non run-time environment. The static analysis method inspects the programming code without executing the program. It generally examines all possible code behaviors, vulnerable patterns, and flaws which would be expected in the run-time. This subsection presents a few primary static analysis tools which analyzes the smart contracts security problems and vulnerabilities.
Luu et al. [RN49] investigated the security of the existing smart contracts on the Ethereum network. Several security problems were identified such that the attackers can manipulate the smart contract execution. Using symbolic execution methods, OYENTE is a static analysis tool which detects the security vulnerabilities. The vulnerabilities include transaction ordering dependence, timestamp dependence, mishandled exceptions, and re-entrancy vulnerabilities [RN49].
The architecture of the tool OYENTE is illustrated in Figure 5. The bytecode of a smart contract and the current global state of Ethereum are taken as inputs. The samples of the smart contracts bytecode are publicly available on the Ethereum network and downloadable via the service named Etherscan [RN62]. The initial values of the smart contract variables are extracted from the global state of Ethereum, which improves the accuracy of the analysis. Upon the detection of any problem, OYENTE pinpoints the specific line of the smart contract source code which contains any security vulnerability.
OYENTE has four modules [RN49], namely CFGBuilder, Explorer, CoreAnalysis, and Validator. CFGBuilder builds a control flow graph for the smart contract bytecode. In the control flow graph, each node represents a basic execution block; the edges represent the execution jumps between the blocks. The Explorer executes the smart contract code symbolically. The output from the Explorer are fed as the input to the CoreAnalysis component. The identified vulnerabilities are targeted to implement the logic in the CoreAnalysis module. In the end, the Validator module filters out the false positives from the results, and the final results are visualized to the users.
ZEUS [RN28] can verify the correctness of smart contracts and validate their fairness. Combining an abstract interpreter with a symbolic model checker, ZEUS verifies the safe programming practices of the vulnerable smart contracts. According to [RN28], ZEUS outperformed OYENTE [RN49] with less false positive rate and less analysis time. The tool ZEUS detects six security vulnerabilities in smart contracts including re-entrancy bug, unchecked send, failed send, integer overflow/underflow, block/transaction state dependence and transaction order dependence [RN49, RN50, RN28].
ZEUS consists of three components — policy builder, source code translator, and verifier. ZEUS takes two inputs, that is, the smart contract source code in Solidity and a security policy written in an specific language to verify the vulnerabilities. In the first step, a static analysis is performed to check the smart contract code, while the policy builder inserts the policy predicates as the assert statements at the appropriate places in the source code. The source code translator converts the source code embedded with the policy assertions to LLVM bytecode. Finally, the verifier determines the assertion violations to identify the vulnerable smart contracts.
Formalizing Solidity Semantics
An abstract language is defined to capture the related constructs from the Solidity smart contract program [RN28]. Figure 6 shows the model of the abstract language that is used to formalize the Solidity semantics. A smart contract program consists of a sequence of smart contract declarations. Each smart contract is abstractly implemented with one or more method definitions and logic [RN105, RN127]. The declarations and initialization of methods are stored in the private storage of a contract that is denoted by the keyword global. The variable is used to uniquely identify a smart contract. A transaction is the invocation of a publicly accessible contract method. All the methods are defined as a single input variable type of . is a generic variable and can represent collections and struts. There are three types of invocations in Solidity [RN179, RN180, RN182, RN242] internal invocation, external invocation, and call functions. The goto instruction is used to model the internal and external invocations; and the post instruction is used to model the call invocation. The variable type is defined to represent the body of a contract method. But the post statement can be called with the parameters of smart contracts.
Formalizing the Policy Language
The policy language is formalized for assertion in their abstract language [RN28]. The assertions are used to define the state reachability properties of the smart contract. The policy tuple specification is <Sub, Obj, Op, Cond, Res> which includes the subjects, objects, operations, conditions, and resources [RN210]. The policy tuples are used in ZEUS for two reasons: The first reason is to assert the predicate or condition; and the second reason is to extract the correct control location to insert the assert statements into the Solidity source code [RN28].
|Tools||Detecting Vulnerabilities||Identified Attacks|
|OYENTE||Re-entrancy, Exception handling, Transaction ordering, Block timestamp dependency, Call stack depth limitation||Integer overflow/under flow The DAO attack|
|ZEUS||Re-entrancy, Transaction ordering, Block timestamp dependency, Integer over/under flow, unchecked and failed send, Destroyable/Suicidal contract, Unsecured balance||The DAO attack, Integer Over/Under flow attack|
|Vandal||Re-entrancy, Unchecked and failed send, Destroyable/Suicidal contract, Unsecured balance, Use of Origin||The DAO attack, Parity multisig wallet attack|
|Ethir||Re-entrancy, Exception handling, Transaction ordering, Block timestamp dependency||The DAO attack|
|Securify||Exception handling, Transaction ordering, Call stack depth limitation, Unchecked and Failed send, No Restricted write, No Restricted transfer, Non-validated arguments||Parity multi sig wallet attack|
|MAIAN||Call stack depth limitation, Destroyable/Suicidal contract, Unsecured balance, Greedy contracts and Prodigal contracts||Parity Multisig Wallet attack|
|GASPER||Gas costly code patterns exist||-|
To detect the smart contracts with inefficient gas consumption, a static analysis tool named GASPER was developed by Chen et al. [RN47]. GASPER focused on the identification of gas costly patterns from the existing smart contracts. Seven Solidity code patterns were identified in [RN47] which are used by GASPER for detection purposes. According to [RN47], more than 90 percentage of the deployed smart contracts until November 2016, were suffering from some forms of the poorly defined gas cost patterns, and most of these smart contracts consumed a significant amount of gas unnecessarily.
The tool GASPER takes smart contract bytecode as the input to identify gas costly patterns. GASPER runs symbolic execution on bytecode to find all the reachable code blocks in a candidate smart contract. During the pre-processing step, the disasm command in the Ethereum facilities is used to disassemble the contract bytecode. GASPER uses the disassembled results to construct the control flow graph (CFG) of the smart contract. GASPER starts a symbolic execution from the root node of the control flow graph and traverses the CFG. Whenever a conditional jump is found during the CFG traversal, GASPER checks its feasibility. Specifically, GASPER uses the Z3 solver [RN75] to query the condition whether it is true or false.
Vandal [RN121] is a security analysis framework for identifying the vulnerabilities in Ethereum smart contracts. An analysis pipeline is used to convert the EVM bytecode to the semantic logic relations. Vandal uses the Souffle [RN211] language to express the logic specifications for security analysis. Vandal’s pipeline has five major components [RN121]: The scraper extracts bytecode of smart contracts in a bulk basis; the disassembler converts the smart contract bytecode into disassemble patterns; the decompiler translates the stack-based bytecode to a register transfer language; on the basis of the register transfer language, the extractor makes logic relations reflecting the program semantics of the smart contract; at last, the security analysis reports any possible vulnerabilities of the examined smart contracts. Vandal can identify most of the security vulnerabilities, such as unchecked send, re-entrancy, unsecured balance, destroyable contract, and use of origin problem [RN121, RN50].
Ethir [RN119] analyzes Ethereum smart contract bytecode based on the rule-based representations of the control flow graphs (CFG) produced by the OYENTE tool [RN49]. Ethir
produces sound and automated reasoning about the high-level properties of the Ethereum smart contracts.Ethir requires OYENTE to generate the CFG of EVM code. The first element of Ethir is a modified version of OYENTE to include all possible jump addresses, since the original OYENTE only stores the last value of the jump address [RN49, RN119]. So this modification allows Ethir to reconstruct the whole CFG [RN119]. The second element is to translate from EVM bytecode into the rule-based representations by using guarded rules to examine the conditional and unconditional jump instructions.
Securify [RN116] is a fully automated and scalable security analyzer for Ethereum smart contracts. Securify checks the smart contract behaviors with respect to a given property, and the result is either safe or unsafe. For finding the violation patterns in the smart contract, Securify consists of two components: The dependency graph of each smart contract is symbolically analyzed to extract the semantic information; subsequently, the critical code structure is checked with sufficient conditions to prove whether a property exists or not.
Securify checks the important domain-specific properties that are derived from the known attacks, the Solidity recommendations, and the best practices. The security defined specific properties based on the patterns of the known attacks are presented in formal definitions [RN116]. The properties are Ether Liquidity (If a contract has less Ether, it has less Ether liquidity), No writes after the call (There are no writes to the storage variable after any call instructions), Restricted write (Writes to storage is restricted by modifier), Restricted transfer (Ether transfers cannot be invoked by any users who is independent to the senders), Handled exception, Transaction ordering dependency, and Validated arguments (Method parameters should be validated before usage) [RN116]. The Securify tool was evaluated with two datasets — the EVM dataset and the Solidity dataset. The experiment results in [RN116] showed that Securify found most of the vulnerabilities and security properties accurately comparing with OYENTE [RN49] and Mythril [RN208].
V-B Dynamic Analysis
Dynamic analysis is a method which checks a programming application while it is executing or in the run-time. It acts similar to an attacker who searches vulnerabilities in a piece of vulnerable code by feeding malicious code or anonymous input to the required functions in a program. Some vulnerabilities would be resulted as false negatives in static analysis, but they can be identified via dynamic analysis method successfully. It also can validate the findings from a static code analyzer.
Nikolić et al. [RN34] characterized the smart contract issues as trace vulnerabilities using the detection techniques across a long sequence of invocations of a contract during its run-time. The problematic smart contracts are labeled in three categories — greedy contracts, prodigal contracts, and suicidal contracts [RN34]. The greedy contracts lock the fund indefinitely while they are alive, and the lock cannot be released in any other conditions. When a smart contract accepts Ether with lack of instructions or unreachable commands, it can become a greedy contract locking the available fund. By default, an Ethereum smart contract returns its funds to the fund owners, when the contract is under attack [RN61, RN98, RN50]. A prodigal contract releases the funds to arbitrary addresses other than to the legitimate owners. Because Ethereum disallows the Ethers held by a smart contract to be released to an arbitrary or unknown address, no actual Ethers will be deposited. An Ethereum smart contract enables a security fallback option of being killed by its owner or by an authorized address [RN34, RN206]. A suicidal contract is vulnerable, because an arbitrary account can kill the contract or force it to execute the suicide instruction [RN34].
Smart contracts are repeatedly executed during their lifetime [RN73, RN98, RN127, RN145]. A transaction invokes a smart contract and runs a function [RN98]. An execution trace is a sequence of running a contract recorded on the blockchain. MAIAN [RN34] considers the execution traces of smart contracts together with the vulnerability categories. An invocation of each run of the contract can exercise an execution path for a given input context. Hence, there may be a chain of effects across a trace of invocations [RN34, RN98]. Considering only one invocation and find a bug on a particular invocation is inefficient. The dynamic analysis tool MAIAN uses systematic techniques to find the violations on the defined specific properties of traces in smart contract executions [RN34].
Figure 7 shows the architecture of MAIAN. It has two major components — symbolic analysis and concrete validation. The contract bytecode and analysis specifications are taken as input to the symbolic analysis component. The analysis specifications contain the vulnerability category and the depth of the search space to define the search operation [RN34]. A custom EVM was implemented to facilitate symbolic execution of smart contract bytecode. The EVM runs for all possible execution traces symbolically for each smart contract candidate. MAIAN continues until it reaches a problematic trace with a set of predetermined vulnerability properties. Every execution trace takes a set of symbolic variables as its input. If a contract is detected as vulnerable, then the symbolic analysis component will return concrete values for the specific symbolic variables. The concrete validation component validates the results of the symbolic analysis component. The concrete validation component checks the contract exploitation on a private fork of the Ethereum network [RN34]. It confirms the correctness of bugs found in the candidate smart contract. During the analysis, MAIAN does not affect the state of the contract on the main Ethereum blockchain.
V-B2 Graph Construction
Chen et al. [RN80] conducted a systematic study on Ethereum by leveraging graph analysis. The major activities on Ethereum were characterized, that is, money transfer, contract creation, and smart contract invocation. The whole internal and external data on Ethereum was collected by modifying Ethereum client using opcodes. New observations and insights were discovered via the construction of three types of graphs [RN80] — MFG (Money Flow Graph), CCG (Contract Creation Graph), and CIG (Contract Invocation Graph), based on the dynamically collected data. Two new approaches were proposed based on cross-graph analysis to address two security issues in Ethereum. The first application is to find out all accounts controlled by the attacker for a given malicious contract used in digital forensics systems [RN80]; the second application is to detect abnormal contract creation that consumes lots of resources by creating many contracts [RN80].
Figure 8 shows the methodology of the graph analysis approach in [RN80]. The graph-based analysis approach consists of three major phases — data collection, graph construction, and graph analysis. During data collection, all internal and external transactions data are collected from the Ethereum network. When a contract invokes a method of another smart contract, that is called internal transactions. Since these data are not publicly available in the blockchain, a new approach was introduced to collect internal transactions. The Ethereum client was modified to add instrumentation code using interpretation handler for every EVM opcode. During graph construction, three graphs Money Flow Graph (MFG), Contract Creation Graph (CFG), and Contract Invocation Graph (CIG) are constructed on the basis of all the internal and external transaction data. The transaction data are filtered to exclude the non-relevant transactions in four steps. The relevant transaction data are used to build three types of graphs — Money Flow Graph (MFG), Contract Creation Graph (CCG), and Contract Invocation Graph (CIG). In a Money Flow Graph (MFG), the edges denote the amount of Ether transferred from one node (account) to another. The sender and the receiver can be an external owned account or a smart contract. A Contract Creation Graph (CCG) captures when a smart contract is created. A Contract Invocation Graph (CIG) is constructed when a transaction executes to call or invoke a smart contract method by an account or from another smart contract. Finally the statistics of the three types of graphs are computed for the graph analysis phase. The graph analysis is conducted on MFG, CCG, and CIG by calculating matrices, such as degree distribution [RN217], clustering [RN216], degree correlation [RN215], node importance [RN217], Pearson correlation coefficient [RN218], and strongly/weekly connected component [RN80]. The statistics and matrices provide clear observations and insights [RN80] listed as below.
Most users prefer to transferring money on Ethereum instead of using smart contracts.
The smart contracts are not widely used. Many smart contracts are like toy contracts, and lots of them are duplicated.
Not all users frequently use the Ethereum network.
A small number of developers created lots of smart contracts.
The financial applications such as exchange markets, dominate the Ethereum platform.
|Formal Verification Methods||Proved Properties||Methodologies used|
|F* Framework [RN45]||run-time safety||Solidity translator to F*|
|functional correctness||EVM bytecode transator F*|
|Formalization using Isabelle/HOL [RN118]||contract correctness||Separation logic and verification conditions|
|contract termination||Program logic based on execution cost of gas|
|FEther using Coq [RN113]||functional correctness||Symbolic execution and higher order logic theorem proofs|
|Improvement of theorem proving methods of contracts||Verification using Coq|
V-C Formal Verification Method
Formal verification methods use theorem provers or formal methods of mathematics to prove the specific properties in a programming code such as functional correctness, run-time safety, soundness, reliability, and so on. There are a few formal verification analysis conducted to validate and prove vulnerabilities in smart contracts. They used existing theorm provers such as Coq, Isabelle/HOL, Lem and SMT solvers.
V-C1 F* Framework
Bhargavan et al. [RN45] developed a framework to analyze and verify both run-time safety and the functional correctness of Solidity smart contracts. The Solidity source code and EVM bytecodes are translated to a programming language called F*. A language-based approach is developed for verifying smart contracts with the assumptions that the Solidity compiler is not untrustworthy [RN45], and it is difficult to directly modify EVM due to its intricate semantics and its limited openness [RN87].
Figure 9 shows the architecture of overall framework of F* verifier. Two tools are implemented: The first tool is called Solidity* which translates the Solidity program to the shallow embedded F* programs; the second tool is a decompilier named EVM* that converts the EVM bytecode to an equivalent shallow copy of F* programs. The source-level functional correctness specificaitions were verified by the Solidity* tool for a given piece of Java contract source code. The EVM* tool was used to decompile an EVM bytecode of smart contract and analyze the low-level properties, such as gas consumption for each method invocation, execution time, and so on [RN45]. By using both tools, the functional equivalence between the Solidity source code and the EVM bytecode and the correctness of output are verified [RN45].
V-C2 Formalization using Isabelle/HOL
Amani et al. [RN118] built a sound program logic for Ethereum smart contracts bytecode. A proof assistant Isabelle/HOL is used to reason about correctness properties of EVM bytecode based on separation logics [RN118]. All the elements in a program model is carried out by a state. These elements in a state are separated using separation conjunctions as separation logics [RN219]. The formal verification can be used to achieve high-level confidence on the correct behavior of smart contracts. The bytecode sequences were structured into blocks of straight line code and created a program logic for reasoning the behaviors of smart contract code patterns.
The method of finding correctness properties acts towards of termination based on execution cost of gas in Ethereum. The verification was conducted using a sound program logic at the bytecode level. Smart contract bytecode is divided into two sections as pre-loader and run-time code. Preloader code is used to deploy the contract on Ethereum network. The core functionality of the contract is written in run-time code which are used for verifying smart contracts. Even for a small smart contract, the reasoning about bytecode will have excessively long and repetitive proofs [RN118]. Therefore, it is efficient to the verification conditions using the rules of the logic in Isabelle tactics.
V-C3 FEher interpreter using Coq
FEther is an extensible hybrid verification proof engine that was developed by Yang et al. [RN113] to improve the theorem proving methods for security of smart contracts. The consistency between smart contracts and its formal model is guaranteed by FEther using Lolisa. Lolisa [RN147] is a formal syntax and semantics for a subset of the solidity programming language. FEther combines the symbolic execution with higher order logic theorem proving. A set of automatic strategies in FEther helps execute and verify the smart contracts in Coq. Its verification process is automated. The segments of verified code is reusable to help verify the specified properties [RN113]. Coq is used to interpret and verify the functional correctness in FEther.
|Tool||Source Location||Package Dependencies|
|OYENTE||https://github.com/melonproject/oyente||solc, web3, Z3, Go Ethereum, requests, EVM|
|MAIAN||https://github.com/MAIAN-tool/MAIAN||solc, web3, Z3, Go Ethereum, Python, EVM|
|Securify||https://github.com/eth-sri/securify||Soufflé, Java 8, solc, EVM|
|Vandal||https://github.com/usyd-blockchain/vandal||Soufflé, Python, solc, JSON RPC API, EVM|
|Ethir||https://github.com/costa-group/EthIR||solc, web3, Z3, Go Ethereum, Python, EVM|
|Graph Analysis||https://github.com/brokendragon/Ethereum_Graph_Analysis||solc, Go Ethereum, Python, EVM|
|Isabelle/HOL Proofs||https://github.com/pirapira/eth-isabelle||Isabelle2007, Lem Ocaml, Opam packages|
|KEVM framework||https://github.com/kframework/evm-semantics/||Pandoc, Java 8 JDK, Opam packages|
V-D Comparison between the three analysis Methods
Here we compare the three analysis methods — static analysis, dynamic analysis, and formal verification. Both static and dynamic methods use a few similar methodologies such as symbolic execution, transaction/flow graph construction, and validations [RN49, RN50, RN28, RN47, RN128, RN116]. However, static analysis cannot detect vulnerabilities occur during the execution time. In dynamic analysis, the traceability feature is important to identify the erroneous contracts which cause faults in their run-time [RN34]. MAIAN traces behind the real execution of smart contracts and finds the vulnerable patterns [RN34]. It would be ensured the reliability of smart contract which passes the test cases throughout the time of its execution or invocations [RN34]. Dynamic analysis tools find a few types of vulnerabilities such as destroyable contract, unsecurred balance, lock and leak contract fund [RN34]. Static analsis tools are able to identify key vulnerable patterns in smart contracts as listed in Table II and IV. Formal verification methods are proving specific properties in smart contracts that are performing correct or not. They verify run-time saftey, functional correctness, and sound program logics in smart contracts [RN45, RN118, RN113]. Compare to static and dynamic analysis methods, formal verification methods checks vulnerable patterns using different methodologies, such as separation logic, theorem provers, and translation of EVM byte code to formal languages [RN45, RN118, RN113].
The static analysis tool OYENTE that can detect four major vulnerabilities in smart contracts. The ZEUS tool is able to identify seven vulnerabilities where unchecked send and failed send problems are sub sets of exception handling problem [RN28]. Seven gas costly patterns are defined and identified by the GASPER analysis tool [RN47]. The tool Ethir used the concept of control flow graph construction from the OYENTE tool. Ethir is able to find four key vulnerabilities as OYENTE detects and includes all possible jump addresses to validate all instructions [RN119]. Vandal is detecting five key vulnerabilities using static analysis mechanisms. Securify defines seven smart contract vulnerable properties and detects them more accurately [RN116] than OYENTE [RN49]. This study categorized MAIAN [RN34] as a dynamic analysis tool which defines three errornious contracts and detects them by tracing every invocation paths.
All formal verification methods we discussed [RN45, RN118, RN113] are proving some functional correctness property in smart contracts. They use different methodologies and theorem provers for their verification process as breifed in Table V. They do not detect specific Ethereum vulnerabilities as the analysis tools identify. But they define smart contract correctness and safety properties and able to proof using theorem solving methods. The F* framework [RN45] can verify runt-time safety and functional correctness in smart contract execution.
Comparing the performance between OYENTE and Securify, it is observed that OYENTE [RN49] has missed to report transaction ordering dependency and exception handling problem from few vulnerable contracts [RN116]. Furthermore, OYENTE generates more false warnings than Securify, when it checks re-entrancy problem in problamatic smart contracts [RN116].
Only a few tools we analyzed here have published their source codes or executable applications to download as open source. Table VI shows the available source links and the required dependencies for each tool.
Vi Research Challenges and Future Directions
The DAO attack was occurred due to the two important vulnerablities — there are an re-entrancy problem and the contract state is updated after sending fund. The re-entrancy problem can be mitigated by using address.transfer() or address.send() functions instead of invoking address.call.value() directly [RN65]. The call function allows caller to make multiple external invocations before the contract state is changed [RN49, RN50]. And developers should aware of updating contract state or balance that should be updated before sending fund to user not after. The tools OYENTE, ZEUS, Vandal and Ethir can be used to detect the re-entrancy vulnerability. Securify checks the restricted transfer property which help detect the state updating problem and suggest the solution in the relevant line of code [RN116].
The parity multisig wallet attack happened because of the lack of a proper access modifier to the external library functions [RN66]. The solution for this problem is to use a private modifier to the functions in the external library and use a locking mechanisms to avoid sending fund or changing state without the owner’s permission [RN197]. MAIAN finds greedy contract that is being frozen and locked its fund indefinitely. This approach will help to find the contracts that call to external functions without having restricted access. The attacks like the partiy multisig wallet problem are partially addressed because it is impossible to avoid all the invocations that are called to the public external functions [RN66].
The Integer underflow/overflow attack occurred due to the unchecked send, and the exception handling problem. ZEUS, Vandal, and Securify [RN28, RN116, RN121] are able to detect the unchecked and failed send problem. Further, the latest version of Solidity compiler [RN87] gives warnings to the integer underflow problems while the smart contracts are compiled. Thus this problem is well addressed and able to avoid many future attacks if the proper version of the Solidity compiler is used [RN116].
Considering the variety of the key vulnerabilities in Ethereum smart contracts, many vulnerable contracts had already been deployed on the Ethereum blockchain. Because of the immutability feature in smart contracts, the functionalities of deployed smart contracts are unable to modify unless a hard fork. Even though we have analysis tools and verifications methods to detect the buggy contracts [RN49, RN86, RN28, RN121, RN116, RN118, RN119, RN129, RN80, RN47, RN115, RN129, RN130, RN140, RN220], it is very challenging to eliminate all the vulnerable smart contracts. However, it is recommended to use the Ethereum compiler, analaysis tools, or formal verification methods to test and detect errors before deploy the contracts to the live network.
The usability of the tools differs significantly. The tools including OYENTE, Securify, MAIAN, and Vandal are fully automated analysis tools. The automated tools can be set up easily before analyzing a huge set of smart contracts. Securify is a scanning tool available online [RN222] so that smart contract codes can be scanned for possible vulnerabilities. OYENTE provides a docker image [RN223] to deploy the application quickly because a docker image includes all the required dependencies [RN78]. However, only a few formal verification methods have published their source code on github [RN118, RN140]. They are partially automated to verify and prove the correctness properties in smart contracts. The initial setup for formal verification methods takes more time than the symbolic execution tools [RN49, RN86, RN28, RN121, RN116].
The solidity compiler solc [RN87] is improved well for detecting basic errors and vulnerable patterns in smart contracts during the development phase. Most of the analysis tools depend on the solc compiler to compile smart contract solidity code to bytecode as shown in Table VI. As a future work, the detection tools can be integrated with solidity compiler as an external plugin to help the developers identify the vulnerable contracts during the compiling time [RN224, RN225]. Johannes et al. [RN226] developed an automated tool teEther that uses a generic definition of problematic smart contracts to create an exploit for a contract bytecode.
Furthermore, static analysis tools are detecting their specific vulnerabilities as listed in Table IV. Seventeen vulnerabilities appeared in the published literature [RN142, RN49, RN50, RN111, RN124]. The logic related problems [RN108] in smart contracts cannot be detected by OYENTE [RN49]. It has narrowed down to detect the security bugs relevant to the semantic misunderstandings raised up from smart contracts developers [RN49]. The verification process in ZEUS was conducted for the solidity-based smart contracts using an abstract language interpretation approach [RN28]. Kalra et al. [RN28] demonstrated that ZEUS can be extended with a few changes to be compatible to analyze smart contracts on other blockchain platforms [RN28]. Vandal framework [RN121] also partly uses an abstract interpretation method, but it analyzes the EVM bytecode directly using its own decompiler for the translation work.
GASPER [RN47] can detect seven gas costly patterns in smart contracts. There will be more gas expensive patterns in complex contract programs. Chen et al. have ensured that they will broad their research on finding more under optimized patterns and detect them by their tool [RN47]. Ethir [RN119] framework utilizes the control flow graph methodology developed in OYENTE to analyze Ethereum bytecode. But, Ethir does not perform any improvement on recovery capability of control flow graph algorithm [RN28]. Securify uses Datalog solvers [RN211] to efficiently analyze smart contract code. Flix [RN227] enhances the scalablily of analysis process using Datalog. Securify [RN116] can utilize these advancements on Datalog solvers as a future development.
The formal verification methods use different theorem provers such as Isabelle/HOL, F*, KEVM, Lem, and Coq [RN118, RN140, RN113, RN45]. Since they use complicated mechanisms, it is not trivial for ordinary users to analyze smart contracts using the formal verification methods. That is, the users must be taught and trained on how the proof method works and on how to read the outputs. Furthermore, the formal verification approach uses a general method to construct code patterns and theorems to prove the security properties of smart contracts using theorem provers [RN118, RN140, RN113, RN45]. Since these provers are semi-automated, the formal verification methods require a significant amount of manual effort to construct the proofs and analysis of smart contracts [RN118, RN121]. Hence, these methods poorly scale for analyzing thousands of smart contracts currently deployed on the Ethereum network [RN62, RN121]. However, the formal verification approach provides accurate and prompt results of validating smart contracts’ security, saftey, and soundness properties [RN118, RN130, RN45, RN148, RN149, RN221].
Smart contracts in Ethereum are becoming more applicable as digitalized agent on distributed applications. The security of smart contracts should be ensured to avoid unnecessary losses and malicious attacks. There are several analysis mechanisms implemented to test and assure the correctness and non vulnerable patterns in smart contracts. But developers and users of smart contracts should aware of the accuracy and performance of these analysis methods. Our survey identified the existing vulnerabilities in smart contracts on Ethereum, categorized the security analysis methods in three ways such as static, dynamic, and formal verification. Then we compare the three methods in terms of their performance, coverage of finding vulnerabilities and accuracy. The static and dynamic analysis methods implemented automation tools which are very handy to use and analyse vulnerable contracts. But they detects only their specific defined vulnerable patterns. Formal verification methods uses theorem provers to validate the correctness properties in smart contracts using their interpreted proofs.
We appreciate the authors who gave permission to reproduce the images from their original papers. We thank to Loi Luu, Antoine Delignat-Lavaud, Ivica Nikolić and Yuxiao Zhu for their coordination.