Smart Contract Repair

12/12/2019 ∙ by Xiao Liang Yu, et al. ∙ National University of Singapore Singapore Management University 0

Smart contracts are automated or self-enforcing contracts that can be used to exchange money, property, or anything of value without having to place trust in third parties. Many commercial transactions presently make use of smart contracts due to their potential benefits in enabling parties to engage in secure peer-to-peer transactions independent of external parties. They do so by transferring trust to computer programs (smart contracts), raising the question of whether these programs can be fully trusted. However, the code can be complex and may behave in many different unexpected or malicious ways due to poorly written or vulnerable smart contracts. Furthermore, in the case of smart contracts on the blockchain, they are typically open to (malicious) agents which can interact with it in various ways. Experience shows that many commonly used smart contracts are vulnerable to serious malicious attacks which may enable attackers to steal valuable assets of involved parties. There is therefore a need to apply analysis techniques to detect and repair bugs in smart contracts before being deployed. In this work, we present the first automated smart contracts repair approach that is gas-optimized and vulnerability-agnostic. Our repair method is search-based and considers the gas usage of the candidate patches via leveraging our novel notation of gas dominance relationship. Our approach can be used to optimise the overall security and reliability of smart contracts against malicious attackers.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

Code Repositories

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Smart contracts are automated or self-enforcing programs which currently underpin many online commercial transactions. A smart contract is a series of instructions or operations written in special programming languages which get executed when certain conditions are met. Typically, smart contracts are running on the top of the blockchain systems, which are distributed databases whose storage is represented as a sequence of blocks. smart contracts are one of the most successful applications of the blockchain technology. The key attractive property of smart contracts is mainly related to their ability to eliminate the use of trusted third parties in multiparty interactions, enabling parties to engage in secure peer-to-peer transactions without having to place trust in external parties (i.e., outside parties which help to fulfill the contractual obligations). They do so by transferring trust to computer programs, raising the question of whether these programs can be trusted and the contract can be used for critical commercial transactions.

While smart contracts are commonly used for commercial transactions and designed to be security-oriented systems, many malicious instances of attacks were carried out due to poorly written or vulnerable smart contracts. The code (set of operations) executed by smart contracts can be complex and may behave in many different unexpected ways. They usually involve subtle interactions of a number of components in contexts of unpredictable or malicious environment. There is therefore a need for smart contracts, which may be used for critical commercial transactions, to be rigorously written against potential malicious attacks before trust is warranted.

The successful application of automated repair techniques to traditional programs (Le Goues et al., 2012a, b; Martinez and Monperrus, 2015; Nguyen et al., 2013; Xuan et al., 2017; Mechtaev et al., 2015; Long and Rinard, 2015; Mechtaev et al., 2016)

raises the question of whether these techniques can be also applied to fix bugs in smart contracts. In essence, automated repair techniques for programs try to automatically identify patches for a given bug, which can then be applied. Several different approaches have been developed to automatically repair bugs in traditional programs which can be classified mainly into two categories: heuristic repair approaches

(Le Goues et al., 2012a, b; Martinez and Monperrus, 2015) and constraint-based repair approaches (Nguyen et al., 2013; Xuan et al., 2017; Mechtaev et al., 2015; Long and Rinard, 2015; Mechtaev et al., 2016). The inputs to these approaches are a buggy program and a correctness criterion (often given as a test suite). The automated repairing approaches returns then a version of the buggy program that passes all failing tests in the given test suite.

In practice, the implications of unfixed bugs in smart contracts can be more serious than the typical non-security sensitive programs for several reasons. First, smart contracts are of different nature from traditional programs: they are open for inspection and running on a decentralised network, the whole program state of smart contracts is transparent to everyone. Second, the generated patch for a vulnerable smart contract should not only fix the detected vulnerabilities but also needs to meet some gas constraints related to the computational resources required on the nodes in the blockchain that runs the contract. Third, the quality of the generated patch for a vulnerable SC is a major design issue to be considered as smart contracts are typically used for commercial transactions. In fact, malicious agents may take advantage of unfixed bugs in smart contracts to steal some valuable assets of the parties involved.

Due to its open nature and the fact that the contract code is known in advance (publicly available), the generated patch for a vulnerable smart contract needs to meet certain criteria. First, the generated patch should ensure that the contract is still running immediately and autonomously. Second, the generated patch should fix detected bugs with small set of changes and without breaking other behavior or introducing new behavior. Third, the generated patches should meet the gas usage limit imposed by the blockchain system on which the contract is running.

In this work, we develop an automatic smart contract repair algorithm using genetic programming. Given a vulnerable smart contract and test suite, we conduct a parallel, biased search for a set of edits to that contract that fixes a given vulnerability without breaking any test that previously passed. The parallelization strategy consists of splitting the search space into mutually-exclusive (disjoint) sub-spaces, where each sub-space is generated and concurrent independent validation of patches. We introduce also the notion of gas dominance level for smart contracts which enables us to compare the quality of patches based on their runtime gas. The gas consumption of a smart contract is typically measured by the amount of computational resources needed to execute the operations of the contract. The presented gas dominance level can be used to compare the quality of generated patches as smart contracts are gas critical.

To evaluate the effectiveness of our genetic repair algorithm, we constructed a dataset of vulnerable smart contracts taken from the Etherscan network, which is the main network wherein actual transactions of smart contracts take place on a distributed ledger. Hence, our constructed dataset consists of real-world smart contracts. During our evaluation, we considered 20 vulnerable contracts which have been selected randomly from the constructed dataset while taking into consideration the class of detected vulnerabilities and the complexity of the vulnerable contracts. The vulnerable contracts have been selected in a way such that most of the common classes of vulnerabilities that are typically made by smart contract developers are covered when evaluating the genetic algorithm. However, to understand and draw some valid conclusions about the factors affecting the correctness and quality of patches generated by the algorithm, we have evaluated the algorithm under many different settings and configurations. Examples of such settings include: (i) enabling/disabling the gas calculation of generated patches, (ii) increasing/decreasing the size of time budget allocated to the algorithm, and (iii) increasing/decreasing the size of test suite used when generating patches. One of the main outcomes of the analysis is that the consideration of the gas does not only allow the genetic algorithm to produce low-cost patches for vulnerable contracts, but can also help to reduce the size of the search space. Furthermore, the consideration of the gas when repairing vulnerable contracts can help to detect and discard infeasible patches early. Patches might be plausible but not feasible to be deployed on a real blockchain. This happens when the generated patch consumes a significantly large amount of gas leading to expensive transactions. Our genetic algorithm was able to fully repair 10 vulnerable smart contracts from the selected set of vulnerable contracts achieving a

success rate. It is interesting to mention that most of the selected vulnerable contracts have multiple bugs and we therefore assert a vulnerable contract as repaired if all detected bugs are repaired. We summarize our main contributions as follows.

  1. We present the first automated smart contract repair approach that is gas-optimized and vulnerability-agnostic. The approach inspired by genetic programming and can be used to generate a patch for a given vulnerable smart contract.

  2. We describe a parallel genetic repair algorithm that can be used to split the large search space of candidate patches into smaller mutually-exclusive search spaces which can be processed independently. The presented parallel algorithm helps to process large number of candidate patches in a short computational time and therefore, in contrast to previous repair approaches, repairs can be generated faster. It also improves the scalability of genetic repair algorithms so that large real-world programs may be repaired more efficiently.

  3. We show how to integrate non-functional properties of developers’ interest related to the cost and performance properties of the generated patches with the automated patch generation process of smart contracts. While integrating such non-functional properties with the repairing approach may increase its computational complexity, it has the advantage of generating low-cost or optimised patches. This is crucial for smart contracts as excessive unnecessary gas consumption of contracts can lead to financial loss or out-of-gas exception when running the contract on a public blockchain network. It is therefore necessary to minimize the cost of running the contract and also the possibility of introducing new out-of-gas exceptions when repairing a vulnerable smart contract by considering such non-functional properties.

  4. We introduce a simple yet effective gas ranking approach with the novel notation of Gas Dominance Level that can be used to rank generated patches of a given vulnerable smart contract during the patch generation. In general, the gas consumption of a given smart contract can be a non-constant bound which can be described as a parametric gas formula that takes into consideration both static and dynamic parameters that affect the cost of the contract including the instructions gas, memory gas, stack gas, and storage gas. This complicates somewhat the gas ranking approach.

  5. We describe an acceleration technique to reduce the computational complexity of the gas comparative approach of generated plausible patches for a given vulnerable smart contract. Since the generated plausible patches address the same set of vulnerabilities for the same vulnerable contract, their syntactical structure and semantic behaviour can be very similar. We therefore generate reduced gas formulas for plausible patches by considering only the set of different paths while ignoring joint or identical paths among patched versions of the vulnerable contract. This is mainly due to the observation that joint paths among edited contracts will have the same gas formulas and hence one can safely ignore such paths when comparing edited versions.

  6. Based on the above described techniques, we develop a fully automated repairing tool for smart contracts (which we call SCRepair) which is integrated with a gas ranking approach to generate an gas-optimised secure contract. Our tool can both detect and repair automatically security vulnerabilities in smart contracts. It is does so by integrating the tool SCRepair with the powerful smart contract security analyzer Oyente (Luu et al., 2016) and Slither (Feist et al., 2019). We demonstrate that our approach is effective in fixing bugs for real-world smart contracts. Our approach can deal with bugs whose fixes involve multi-line changes. Our smart contract repair tool and dataset is available in Github from https://github.com/xiaoly8/SCRepair

2. BlockChains and Smart Contracts

The blockchain technology is a distributed database that maintains records and transactions in a decentralized fashion. The blockchain technology has been adopted in many application areas to increase security and reliability and to avoid the need for a trusted third party. The transactions on blockchains are available to all the parties in the network in real-time and it allows all the entities to interact with each other in a distributed manner. It uses state-of-the-art cryptography, and hence it enables parties to engage in secure peer-to-peer transactions. The decentralization nature of blockchain makes it suitable for many applications including decentralized cloud storage with provenance, general health care management, IoT data sharing with assured integrity, and general commercial transactions.

Smart contracts are one of the most successful applications of the blockchain technology. They currently underpin many online commercial transactions which are typically running on the top of blockchain systems. A smart contract is a series of instructions written in a special programming language which get executed when certain conditions are met. The key attractive property of smart contracts is mainly related to their ability to eliminate the use of trusted third parties in multiparty interactions, and therefore help to save time and costs significantly. The smart contract allows decentralized automation by facilitating, verifying, and enforcing the conditions of an underlying agreement.

Ethereum is the most popular blockchain platform for creating smart contracts. It supports a feature called Turing-completeness that allows the creation of more practically useful smart contracts. Smart contracts are typically written using the programming language “solidity”, which works on the basis of the IFTTT logic aka the IF-THIS-THEN-THAT logic. Note that everything executed on Ethereum costs some gas for giving the miners incentive to perform the computations (Wood, 2019). For example, executing an ADD bytecode costs 3 gas. Storing a byte costs 4 or 68 gas, depending on the value of the byte (zero or non-zero). Hence, any slight mutation to the source code of a smart contract can change the gas usage tremendously of the contract and thus the amount of money that the involving parties of a transaction need to pay when running the smart contract on a real blockchain network.

To develop a better understanding of the blockchains and smart contracts technology, let us consider an example. Suppose that Bob would like to sell a property (house) to Alice and Alice is willing to pay 100K as a price for that property and that Bob is happy with Alice’s offer. After some discussion, they agreed to proceed with their business transaction and wish to perform it in an automated way by taking advantage of the technology of blockchains and smart contracts. From the given description of the problem, one can see that there are three main conditions that any possible smart contract solution to the problem needs to satisfy: (1) Bob has legal ownership of the property that he is selling (2) Alice can get the ownership of Bob’s property only if she transferred 100K to Bob, and (3) Bob can get 100K from Alice only if he transferred the ownership of his property to Alice. The transaction can be then said to be successful if upon completion, the ownership of Bob’s property is transferred to Alice while Bob receives 100K in return.

Contract CommercialTransaction
{
  transferedA, transferedB : Bool
  initialise { transferedA := false; transferedB:= false }
  transferA { if sender = A and value = 100K then transferedA := true}
  transferB { if sender = B and asset = houseOwnershipB then transferedB := true}
  finalise { if transferedA and transferedB
                then transferedA := false; transferedB := false;
                send (100K, B); send (houseOwnershipB, A)}
  AbortA { if transferedA then transferedA := false; send (100K, A)}
  AbortB {if transferedB then transferedB := false; send (houseOwnershipB, B)}
}
Figure 1. An atomic smart contract that allows two parties to be involved in a commercial transaction to sell some property.

Suppose that Alice and Bob perform their transaction using the smart contract given in Fig. 1. Rather than using specific smart contract programming language, we write pseudo-code for making it readable by readers not familiar with smart contract programming language. For simplicity reasons, we also omit the use of cryptographic techniques when presenting the smart contract example. The code consists of a number of functions needed in order to perform the commercial transaction in an atomic way. The function transferA is used by Alice to transfer 100K to the smart contract and hence when this function is executed the money becomes under the control of the smart contract. Similarly, the function transferB is used by Bob to transfer the ownership of his property to the smart contract. So that after executing the functions transferA and transferB, the smart contract is supposed to hold both the money of Alice and the property of Bob. The function finalise is used to finalise the transaction by transferring the money from Alice to Bob and the property of Bob to Alice. The smart contract provides also two more functions, namely AbortA and AbortB which are available to both Alice and Bob respectively. The goal of these functions is to protect the parties from the situation where one party transfers his asset while the other does not.

Recently, there has been a growing interest in verification and validation of smart contracts (Grossman et al., 2017; Kalra et al., 2018; Jiang et al., 2018; Amani et al., 2018; van der Meyden, 2019), as vulnerabilities in such class of programs can have serious adverse consequences. Therefore, a number of vulnerability detection tools have been developed for smart contracts including Oyente (Luu et al., 2016), Slither(Feist et al., 2019), and ContractFuzzer (Jiang et al., 2018). In general, smart contract vulnerabilities can be categorized into three categories (Atzei et al., 2017): (i) vulnerabilities at the blockchain level, (ii) vulnerabilities at the Ethereum Virtual Machine level, and (iii) vulnerabilities at the source code level. In this work, we are interested on the vulnerabilities that can be repaired at the level of source code. Based on our conducted literature review on recent research work on smart contracts (Atzei et al., 2017; Jiang et al., 2018; Luu et al., 2016; Tsankov et al., 2018; Delmolino et al., 2016; Bhargavan et al., 2016; Dika, 2017; Tikhomirov et al., 2018; Grishchenko et al., 2018), we summarise in Table 1 some selected popular vulnerabilities that can be detected using the tools Oyente (Luu et al., 2016) and ContractFuzzer (Jiang et al., 2018).

Class of vulnerability References
Exception disorders / Mishandled exceptions / Gasless send (Atzei et al., 2017; Dika, 2017; Luu et al., 2016; Bhargavan et al., 2016; Tikhomirov et al., 2018; Jiang et al., 2018; Tsankov et al., 2018)
Reentrancy (Atzei et al., 2017; Dika, 2017; Luu et al., 2016; Bhargavan et al., 2016; Tikhomirov et al., 2018; Jiang et al., 2018; Tsankov et al., 2018)
Integer overflow / Integer underflow / Unchecked math (Dika, 2017; Tsankov et al., 2018; Tikhomirov et al., 2018; Luu et al., 2016)
Transaction order dependence / Unpredictable state (Atzei et al., 2017; Dika, 2017; Luu et al., 2016; Tsankov et al., 2018)
Table 1. Vulnerabilities detected in real smart contracts that can be fixed by updating Solidity source code

Table 1 shows a summary of widely studied known vulnerabilities. As mentioned earlier, the implication of unfixed vulnerabilities in smart contracts can be very serious as malicious attackers may take advantage of these vulnerabilities to steal assets of parties involved by creating transactions using these contracts. We give detailed description of these classes of vulnerabilities in section 7.

3. The Smart Contract Repair Problem

Recent advances in program repair techniques have raised the question of whether these techniques can also be applied to repair vulnerabilities in smart contracts. In this section, we discuss the automated smart contract repair problem together with the set of challenges that might be encountered when developing solutions to this problem. We also discuss the key differences between the smart contract repair problem and the traditional program repair problem.

Problem 1 ().

(Automated smart contract repair problem). Consider a vulnerable smart contract with a set of detected vulnerabilities , a test suite and a maximum gas usage bound , the automated smart contract repair problem is the problem of developing an algorithm that takes as inputs and produces as an output a new contract that is similar to but have all vulnerabilities in fixed, passing all tests in , and the maximum gas usage of feasible execution paths should be less than or equal to .

The smart contract repair problem is very similar to the traditional program repair problem. However, the smart contract repair problem introduces some extra computational complexity as the cost of the repair needs to be taken into consideration when generating patches for vulnerable smart contracts. It is highly desirable when constructing patches for vulnerable contracts to keep them readable and simple. So that the (syntactical) structure of the vulnerable contract is maximally preserved.

Since detailed formal specifications of intended program behavior are typically unavailable, program repair uses weak correctness criteria, such as an assertion of existences of vulnerabilities by vulnerability detector and a test suite. Therefore, the validity of patches is relative to the chosen vulnerability detector and the available test cases.

As mentioned earlier, the generated patches for SC need to meet more criteria than those generated for traditional programs. This is mainly due to the fact that smart contracts are typically running on the top of the blockchain systems, which impose certain constraints on the total computational resources used by the contract. That is, the execution of the smart contract needs to comply with the gas usage constraints imposed by the blockchain system. Note that if the running smart contract exceeds the allowed upper bound limit of the gas usage, the execution of the contract will be interrupted and “out-of-gas” exception will be thrown.

Definition 0 ().

(Validity criteria of generated patches). Given a vulnerable smart contract with a set of detected vulnerabilities and a test suite that consists of two sets: the failing tests and the passing tests . Suppose that the contract is running on the top of a blockchain system and that the maximum allowed gas usage available to the contract is bounded by . We say that the new patched smart contract is a valid plausibly fixed contract if it satisfies the following requirements.

  1. The contract is not vulnerable to the vulnerabilities in .

  2. The contract passes all tests in .

  3. The contract does not break any test in .

  4. There is no feasible execution path in whose total gas consumption exceeds the bound .

Typically, the bound imposed on the gas usage of the contract is determined by the involving parties of the transaction, the structure and semantics of the smart contract, and the available resources on the blockchain at which the transaction will be executed. Such bound (if known) can be incorporated in the patch generation process for vulnerable contracts in order to avoid introducing new out-of-gas exception. Note that requirement 4 of Definition 1 can be checked by enumerating all feasible paths in the patched contract and then verifying that there is no feasible path that exceeds the bound . In addition to the above correctness requirements, we are also interested in some quality properties when generating patches, as described below:

  1. The simplicity of the patch. The simplicity of the edited contract can be measured in terms of the number of edits that have been made to the original contract.

  2. The cost of the patch. The cost of the contract can be measured in different ways. We choose here the average gas usage as a metric to measure the cost of the contract.

To evaluate the quality requirements of a generated patch we introduce two functions, namely and . The function returns a numerical value that specifies how much the edited contract differs from the original vulnerable contract . It simply counts the differences between the contracts and . Replacing expressions, inserting of new statements, and moving of statements will be counted when computing . The function computes the average cost of gas usage of a given smart contract. Recall that every single operation that takes part in the blockchain (Ethereum) network consumes some amount of gas. Gas is what is used to calculate the amount of fees that need to be paid to the miner in order to execute an operation. Of course, the cost of transactions can vary from one to the other depending on the details of the transaction and the structure and complexity of the smart contract. However, for a given smart contract and a specific transaction , one can perform certain calculations to compute the average cost or the maximum expected cost of the transaction in gas units, provided that the cost of each operation of the contract on the running blockchain system is known in advance. We defer the discussion of the computational details of gas usage of a given smart contract to Section 5.

4. The Smart Contract Repairing Framework

In this section, we present a multi-objective genetic repair algorithm with mainly four objectives: two objectives related to the correctness of the smart contract and two related to the quality of the generated patch. We develop an efficient genetic search approach to generate a patch for a vulnerable smart contract. The developed genetic algorithm employs mutation operators to generate fix candidates for the vulnerable contract and then uses some fitness functions to evaluate the suitability of the candidate patch. The overall goal of our approach is to generate correct, high-quality, and gas-optimised fixes for the vulnerable smart contract.

4.1. Mutation Analysis of Smart Contracts

The mutation analysis of a vulnerable smart contract is the process in which a set of contract variants, called mutants, are generated by seeding a large number of small syntactic changes into the vulnerable contract using some mutation operators. The main motivation behind developing a genetic repairing approach using the mutation analysis techniques to the hypothesis that most software bugs introduced by programmers are due to small syntactic errors.

4.2. Mutation Operators and Patch representation

We employ three mutation operators. The move operator moves a given statement in the analysed smart contract to some other location in the contract. The insert operator inserts a randomly synthesized statement before or after a given buggy statement. The replace operator replaces a potentially-buggy expression with another randomly synthesized expression. Our set of mutation operators contains both statement-level and expression-level operators to allow efficient mutation conducted in different granularity.

Patch Representation

A patch candidate is represented in terms of the mutation operations that need to be performed on the abstract syntax tree of the original vulnerable contract being repaired.

4.3. Generating Mutated Smart Contracts

A large number of mutants may be introduced when repairing a vulnerable smart contract depending on the size of the contract, leading to the validation of an extremely large set of mutants. Note that the validation process of the generated mutants can be extremely costly and time-consuming as also shown by other automated program repair works(Le Goues et al., 2012b). Each mutant may need to be tested against the original test suite. It is therefore necessary to apply a parallelization methodology in order to speed-up the validation process of candidate mutants for a given vulnerable contract.

Note that all mutation operators used in our repairing framework can affect the cost of the vulnerable smart contract which is later also shown by our experiment. Their effect on the cost of the contract can be of a considerable value especially when the vulnerable contract contains loops that can be repeated a large number of times. So that if a plausible patch of is obtained by replacing or inserting a statement within the body of the loop then the cost of the contract may change dramatically. It is crucial then to search for a gas-optimised patch when repairing vulnerable smart contracts in order to minimize the possibility of introducing new out-of-gas exceptions to the smart contract being repaired.

In general, generating a gas-optimised repair for a given vulnerable smart contract can be a computationally complex task. Note that the repair should not only fix the vulnerability in the contract but also needs not to increase considerably the gas usage of the original vulnerable contract. To achieve such goal, one might choose to mutate the vulnerable smart contract by favoring the mutation operators move and replace over the mutation operator insert when searching for low-cost patches. Indeed, intuitively, when we add new instructions onto the program would likely to increase the computational demand. Unfortunately, such favoring does not necessarily lead to the least costly plausible patch for the vulnerable contract as one might expect. Subtle interaction between the operators can turn a low-cost contract into a high-cost contract and vice versa. For example, the insert mutation operator which supposes to increase the cost of the contract by adding a new statement, may sometimes lead to a mutant with lower gas usage than the original vulnerable contract. Similarly, the move mutation operator which supposes not to increase the cost of the contract can also lead to a mutant whose gas usage is higher than that of the original contact. The cost of the generated mutant does not depend only on the cost of the applied mutation operations but also on the way the operators change the behaviour of the contract. We therefore cannot favor one operator over another when searching for low cost repairs without performing some analysis on the overall structure of the vulnerable contract.

Let us consider some trivial examples of loop contracts to demonstrate how the insert operator can turn a high-cost contract into a low-cost contract while the move operator may turn a low-cost contract into a high-cost contract. The program in Fig 3 represents a buggy non-terminating program. Suppose that we generate a mutant for this program by inserting a new statement after the initialisation statement (line 1) of the form: a := false;. In this case, the loop in the generated mutant will be skipped and the average gas usage of the new mutated version will be much smaller than that of the original version. The program in Fig. 3 represents a buggy program which suffers from the buffer overflow error. Let us generate a random mutant of the program by applying the move operator so that the statement at line 4 (the loop counter update statement) is moved outside the loop. Obviously, this will turn the loop into an infinite loop and hence the contract will run out of gas after certain number of iterations. Note that since mutation makes random changes to the buggy smart contract, it may impact the performance and cost of the contract in many different arbitrary ways. This is critical especially when the buggy smart contract contains loops.

 1: Bool a := true;
 2: while (a)
 3: {
 4:   // Some computation
 5: }
  
Figure 2. A non-terminating buggy program.
 1: int x := 0, b[100];
 2: while (x <= 100)
 3: {
 4:   x := x+2;
 5:   // Some computation
 6: }
   
Figure 3. A program with buffer overflow error.
Observation 1 ().

There is insufficient information to predict the gas of mutated contract by inspecting the mutation operations applied. For example, the successive applications of the mutation operators not introducing new statements (move, replace) does not necessarily lead to a low-cost mutant w.r.t. the original smart contract. Similarly, the successive applications of the mutation operator inserting new statements insert does not necessarily lead to a high-cost mutant w.r.t. the original smart contract. The cost of the generated mutants depends mainly on how the applied mutation operators change the behaviour of the smart contract.

As mentioned earlier, one of the biggest challenges that need to be addressed when using a genetic search approach for repairing smart contracts is how to speed-up the generation and validation processes of possible mutated versions. We describe here a parallel search-based algorithm by which a fixing patch of the vulnerable contract can be generated efficiently. We assume here we have three versions of the mutate function: which mutates the contract using only the move operator, which mutates the contract using only the replace operator, and which mutates the contract using only the insert operator. Since genetic repair approaches use mainly an exhaustive search algorithm to generate a fixing patch for buggy programs, it is highly desirable to split the search space into smaller spaces. To do so, we use the mutate functions described above to split the search space into 7 smaller spaces as described below.

  • : this search space consists of the set of candidate patches that result from mutating the contract using only the function .

  • : this search space consists of the set of candidate patches that result from mutating the contract using the two functions and .

  • : this search space consists of the set of candidate patches that result from mutating the contract using only the function .

  • : this search space consists of the set of candidate patches that result from mutating the contract using the two functions and .

  • : this search space consists of the candidate patches that result from mutating the contract using the functions , , .

  • : this search space consists of the candidate patches that result from mutating the contract using and .

  • : this search space consists of the candidate patches that result from mutating using .

Note that for the effectiveness of the parallel algorithm we need to ensure that the search spaces are mutually-exclusive spaces so that no redundant mutants are generated and validated across various spaces. Recall that each mutant will be checked against a set of test cases in addition to the gas usage requirement. Such validation process can be computationally complex specially when the search space of candidate patches is extremely large.

Note that the search spaces , and are mutually exclusive sets of mutants (by definition) as each search space is generated using a single unique mutation operator. However, nesting of the mutate functions may lead to duplicate mutants among various spaces and we therefore need to apply some checks to ensure that the spaces are mutually-exclusive. Let us start with the spaces and as an example of these spaces, we consider the space to illustrate how we ensure the mutual exclusive property. As one can see, the search space consists of all possible random mutations of the contract using the two mutation functions and . There are two distinct permutations of the functions: and . To ensure the mutual-exclusive property when generating the search space we use a validity function which is used to decide whether the generated mutant of the operation can be added to the search space , where and are distinct operators from the mutation domain . To simplify the presentation of the functions we introduce some notations. Let . Then the function can be formalised as follows.

That is, the mutant contract can be added to the space if every application of the mutate function yields unique mutant including intermediate mutant (the one generated by the move mutate operator). For example, if then such mutant will not be added to the space as it is already in space (the mutants that result from the application of the move operator). Note that when , it implies that the mutant differs from the mutant with at least one edit which results from the application of some mutate function. However, the most non-trivial search space to consider is the search space , as it consists of six distinct permutations which is equal to the factorial of the number of operators.

Note that the mutants in are generated using the nesting operation , where , and are distinct operators taken from the mutation domain . Let us assume and . Then the validity function for this search space can be formalised as follows.

Note that for a mutant to be added to the space it has to satisfy somewhat complex condition. This is necessary in order to avoid overlaps with the other search spaces. Note that the above given validity functions are sufficiently powerful to ensure the mutual-exclusive property of various spaces (See Theorem 2).

Definition 0 ().

(Properties of splitting strategy). Let be the search space of possible mutants of a vulnerable smart contract generated using the operators move, replace, and insert. The splitting strategy of into spaces satisfies the following properties

  • disjointness: for any two distinct sets and such that () we have .

  • completeness: .

The generated set of plausible patches from each of the above search spaces (-) will be ranked then according to their average gas usage. The one with the least average gas usage will be selected. However, if there are more than one plausible patch with the same average gas usage then the one with the least number of edits will be selected. If more than one plausible patch satisfies the two requirements then one of them will be selected randomly.

Theorem 2 ().

Spaces ) are mutually exclusive spaces.

Proof.

(sketched). To prove the theorem we need to consider many different cases as we have 7 spaces. However, since the proof argument of all cases will be very similar and for brevity reason, we consider here only space . For this case, we need to show that . Hence, there are six possible sub-cases to consider. Recall that the mutants in are generated using the nesting operation , where , and are distinct operators taken from the mutation domain . The theorem can be proven by contradiction.

  • Let . This implies that there exists a mutant that belongs to both and . Note that since belongs to then it is be generated using a single mutate function of the form , where . It is easy to see then that the mutant cannot exist in the space as the addition of such mutant to contradicts with the definition of the validity function of the space .

  • Let . This implies that there exists a common mutant that belongs to both and . Note that since belongs to then is generated from the nesting operation , where and are distinct operators taken from the domain . Hence, the mutant is generated using only two operators while ignoring the effect of one of the three operators. Therefore, the mutant cannot exist in the space as this contradicts with the definition of the validity function of the space and the fact that mutants in are generated using the nesting operation .

4.4. Parallel Repair Algorithm

We now describe a parallel genetic repair framework for vulnerable smart contracts. The repairing framework consists mainly of eight processes running in parallel (): the first seven processes () are responsible for generating compilable candidate patches of the given vulnerable smart contract corresponding to the search spaces and the last process (process ) is responsible for creating concurrent validation processes and selecting the most preferable patches generated as the base version to be further mutated. Such parallel repair framework would help to generate plausible repairs for vulnerable smart contracts in a much faster way than the repair framework that generates and validates candidate patches in a traditional sequential order.

1:while  is running and space is not exhausted do
2:     
3:     while Space is not exhausted do
4:         
5:         if  is compilable then
6:              Sends
7:              Break
8:         end if
9:     end while
10:end while
11:Terminate
Algorithm 1 The workflow of long-running process . Each of the processes explores part of the search space, .
1:Inputs : Vulnerable Contract , Vulnerabilities , Tests
2:Inputs : Initial Population size IP, Generation size GR, Maximum Population size
3:Inputs : Maximum Gas Usage Bound
4:Output : Set of Plausible Patches
5:Patches :=
6:for  do Each iteration executes in parallel
7:     
8:     
9:     Patches := Patches
10:end for
11:while (at least one of has not terminated timeout not reached) do
12:     
13:     if plausible !=  then
14:         return plausible
15:     end if
16:     Patches :=
17:     for  do Each iteration executes in parallel
18:         
19:         
20:         Patches := Patches
21:     end for
22:end while
23:return
Algorithm 2 Repair Algorithm (process which combines results from processes )

Each process is a long-running process. In each iteration, it waits for to send patch generation request. Upon the request is received with a base version of the vulnerable smart contract, the processes will then search for a compilable patch mutated from the received base version. The processes use mainly the set of mutation functions: and which mutate the vulnerable contract using some mutation operators. Every mutant is then be checked for their syntactic correctness via the use of compiler. This technique has been shown very effective in early rejecting invalid patches. After the first compilable patch is generated, the process will then send it back to and wait for the next request. However, since the implementations of processes are very similar, we present here the pseudo-code of one of them for brevity (we choose process that corresponds to the search space ). For readability, let us assume that we can get a fresh (new) mutant every time the function is used. The pseudo-code of is given in Algorithm 1. As one can see, Algorithm 1 can consider all possible combinations of random mutations of the function on the contract until the corresponding patch space is exhausted.

We now discuss the implementation of the main process (Algorithm 2). This process takes as inputs: the original vulnerable smart contract , the set of targeted vulnerabilities , and the set of test cases , then returns the patches that meets the quality requirements (plausible patches that pass given tests and do not exhibit given vulnerabilities ). At the beginning, we conduct a population bootstrapping that a set of mutants is generated to have the initial set of mutants. The size of the set is controlled by the parameter IP(Initial Population Size). At the time new mutants should be generated, sends requests to the processes (the Requests operation in Algorithm 2). Whenever one of the processes has generated a new compilable mutant, all other mutant generation processes will stop attempting to generate new mutants and the request is fulfilled. The is used to calculate the fitness value of the patches. The objective functions are defined in Table 2. Note that all the objective functions are independent from one to the other, the function therefore also issues new concurrent processes to speed-up the patch fitness evaluation process. The control flow then enters the main loop. In each iteration, the algorithm first checks if there is already plausible patch existing in the maintained set of patches; this is accomplished by invoking the function Filter_Plausible_Patches. If it exists, this algorithm returns immediately the plausible patch. Otherwise, the maintain set of patches will be trimmed to the size by the NSGA2 population selection algorithm(Deb et al., 2000) and yet another set of patches will be generated in the similar fashion. The base version used to generate the new set of patches is chosen to be the best patch among all the patches in the maintained set Patches. The evaluation of relative quality between patches is based on their fitness values. In each iteration of the main loop, the number of new patches will be generated is determined by the parameter GR (Generation Rate).

We employ a timer in (not shown in pseudo-code for simplicity) which will be used to enforce termination of the process in case the time spent in the search process exceeds the bound . The bound

should be chosen while taking into consideration the number of test cases, the size of the buggy program, and the estimated number of mutants in the search space assigned to the process. Note that processes work independently and terminate whenever a plausible patch is found or that the timer is fired.

As mentioned earlier, the size of the search space can be extremely huge even for programs whose source code size is small. Recall that the search space grows exponentially with considered lines of code and hence the efficiency and performance of the genetic repair algorithm needs to be improved when examining candidate patches in the generated search space. While the parallel repair algorithm splits the large search space into smaller sub-spaces which improves considerably the patch generation process, the search sub-spaces can be still huge to be exhaustively explored in a reasonable time budget. The goal of the employed fitness functions is to guide the search towards plausible repair. We therefore integrate four interesting fitness functions (objectives) with the patch generation process. The objectives are classified into primary objectives and secondary objectives. Primary objectives are related to the functional or correctness properties of the patch, while secondary objectives are related to the non-functional properties of the patch. The two main functional correctness objectives are the number of targeted vulnerabilities and the number of failing test cases. The secondary properties or non-functional properties include the number of mutation operators applied on the generated patch and the gas usage or the cost of the patch. The designated fitness functions measure how many of desired functional and non-functional requirements a generated mutant meets. The mutation distance of the generated mutant from the original vulnerable contract is measured by counting the number of times the mutation operators applied to the generated mutant. This can be used to measure the simplicity of the generated mutant. The average gas usage is compared by the methodology described in section 5. The two secondary objectives are considered only when the generated patch is valid (fixed all targeted vulnerabilities and passes all test cases). Note that we give higher preference to a patch that fixes all detected vulnerabilities and passes all test cases with lower average gas usage and smaller number of syntactical changes w.r.t. the original vulnerable contract. We summarise these objectives (fitness functions) in Table 2.

Description of objective Purpose of objective Class of objective Level of importance
Number of targeted vulnerabilities Patch correctness Functional Primary
Number of failing test cases Patch correctness Functional Primary
Gas consumption Patch gas optimization Non-functional Secondary
Mutation operation distance Patch simplicity Non-functional Secondary
Table 2. Objectives (fitness functions) used when generating patches for vulnerable smart contracts

5. Choosing Patch With Lower Average Gas Consumption

One of the key challenges we encounter in this work is how to compare efficiently the average gas usage between the original contract and the repaired contract and how to compare the average gas usage of different generated patches of a given vulnerable contract. In general, the gas cost of a smart contract depends on a number of parameters including memory cost, stack cost, and storage cost in addition to the instructions costs. Hence, the gas consumption of a given path in a smart contract can be a non-constant bound. It should be therefore described as a parametric formula that takes into consideration the parameters that affect the gas consumption of the path. We refer the described parametric formula as gas formula. A patch for a smart contract is considered more optimised than other patches fixing the same bugs if the patch has lower running cost during the lifespan of the smart contract. To compare the average gas usage of two smart contracts, we propose the notion of gas dominance. The goal of the introduced gas dominance notion is to rank edited contracts (generated repairs of vulnerable contracts) based on their corresponding gas formulas as an estimation on the relative average gas usage. This estimation is required as we cannot predict in advance the true average gas usage over their lifespan. Such ranking approach can be used then to select a low-cost repair for a vulnerable smart contract from the set of proposed repairs generated by the parallel repair algorithm.

5.1. The Gas Dominance Relationship

When formalising the gas usage of smart contracts, we choose the specification of the gas cost function in the current Ethereum Virtual Machine specification (version EIP-150) (Wood, 2019) at the time of writing of this paper. From a high-level perspective, the gas usage of a single invocation to the smart contract depends on the user input to the smart contract, the blockchain environment, and the code of the smart contract. The gas usage of an execution (a transaction) to a smart contract is the sum of the gas usage of each executed instruction along the execution path. Formally, the gas cost function of an instruction can be defined as

(1)

where is the blockchain world state before the instruction is executed and is the machine state before is executed, the operation code is a property of the execution environment indexed by a program counter , and is the gas formula associated to the operation code of and is the gas usage formula associated to the expansion of machine memory when executing the instruction . For more technical details about the definition of the gas cost function, we refer the reader to (Wood, 2019).

The total gas usage of an invocation (in the form of a single transaction) with the execution information specified in can be defined as a gas function corresponding to the visited contract path triggered by the inputs:

(2)

where the sequence of instructions in the execution path determined by , and , and , and . For a smart contract with execution paths, we construct gas usage functions, e.g. . We can then express the total gas usage of a smart contract over its lifespan as follows:

(3)

where is the number of transactions to smart contract (denoted by ) over its lifespan (the history of transactions of ).

Given two repaired versions and for a vulnerable smart contract addressing the same vulnerabilities, we then favor the version with lower lifespan gas usage. However, since the future blockchain world state and the user inputs to can be of any possible combination which are generally unknown in advance, concrete lifespan gas usage of patched versions cannot to used to compare effectively the average gas usage of patches. We therefore propose to use what we call gas dominance as a method to compare the relative gas-efficiency between two patches by comparing the expected gas usage functions of them. So that for a given a smart contract with execution paths, we can express the expected gas usage of as follows:

(4)

where

is the probability of

being visited by an arbitrary execution of , is the gas usage function corresponds to program path . For the cases where the contract paths invoke external functions, we need to include the gas usage introduced by the external function invocations in the equation of of the contract.

Definition 0 ().

(Gas Dominance Relation). Given two smart contracts and , we say gas dominates (denoted by ) if and only if for all inputs and for at least one input to the smart contracts.

The gas dominance relation has the following properties:

Property 1 (Irreflexive).

For all smart contracts , they do not gas dominate themselves. That is, must not gas dominate .

Property 2 (Asymmetric).

For two arbitrary smart contracts and , if gas dominates , then must not gas dominate .

Property 3 (Transitive).

For three arbitrary smart contracts , and , gas dominates and gas dominates , then must gas dominate .

5.2. Lightweight Approximation for Determining Gas Dominance Relationship

In general, determining the gas dominance relationship between two smart contracts can be a computationally complex task and practically infeasible because the possible input space is generally too enormous. We therefore develop a lightweight approximation approach based on the notion of function dominance. We say that one gas formula dominates another formula if the magnitude of the ratio of the first formula to the second increases without bound as the inputs increase without bound. There are different ways to compare the gas consumption of two smart contracts and we describe here two approaches.

Given two contracts and , we first construct the expected gas usage formulas for and , namely and . We then transform the equations and into polynomial expressions. Due to the fact that there might be terms containing non-polynomial functions, we use a substitution mapping to transform the gas formula into a polynomial expression. The substitution mapping is constructed as follows.

  1. For all monomial terms, they are unchanged.

  2. For other terms, the coefficient remains unchanged while the other parts of the term is mapped to a unique fresh variable.

All common non-monomial terms in and are mapped to the same fresh variable, that is, variable binding of the fresh variables are maintained for the substitution mappings e.g. if the formula is substitution mapped to , the formula should be substitution mapped to . A polynomial can be expressed as a sum of monomials where each monomial is called a term. The degree of the polynomial is the greatest degree of its terms. We denote the resulting polynomial equation for by and the resulting polynomial equation for by . We then rearrange and simplify the resulting polynomial equations and as a sum of monomials. Let and be the sets of monimials in and . We can determine the gas dominance relationship between and as follows (apply in order).

  1. If , then and are not gas dominating each other.

  2. Let and

    be the vectors of coefficients of

    and respectively so that the order of elements of and should be aligned according to the same corresponding monomials.

    1. If (all elements in are less than or equal to the corresponding elements in and ), then .

    2. If , then and are not gas dominating each other.

    3. If (all elements in are greater than or equal to the corresponding elements in and ), then .

5.3. Integrating Gas Dominance Relationship into Genetic Patch Search Process

The above defined gas dominance relationship is for comparing the relative average gas consumption between two versions of the vulnerable contract. To enable the comparison among multiple patched versions of the original vulnerable contract, we here define the notion of gas dominance level, as defined in the following.

Definition 0 ().

(Gas Dominance Level). Given a set of smart contracts, non-dominated sorting (Deb et al., 2000) is performed based on the gas dominance relationship. The gas dominance level of an arbitrary smart contract in the set is defined as its ranking in the non-dominated sorting result.

The multi-objective genetic algorithm can now use the gas dominance level as one of the objectives, which serves to implicitly capture the effect of patches on the gas consumption (without having to compute the gas consumption).

6. Accelerating Gas Comparison by Generating Reduced Gas formulas

As described in the previous section, to compare the gas usage of two contracts we need first to synthesize gas formulas for the set of feasible paths in each contract. Note that the number of gas formulas generated for each resulting patches can affect the computational complexity of the gas comparative approach dramatically. Suppose that the parallel genetic algorithm generates three plausible patches for a vulnerable contract , namely and . Among which we would like to choose the one with lower average gas usage. However, to compare efficiently the gas usage of the contracts and we need only to synthesize gas formulas for the set of different paths in the three contracts. That is, it is sufficient to conduct a comparison between reduced or minimized versions of these contracts by skipping joint or common paths. This is mainly due to the observation that joint paths among contracts have the same gas formulas. This helps to reduce the computational complexity of the comparative approach by avoiding unnecessary computations and analysis.

Remark 1 ().

Syntactically identical paths among contracts share the same gas formula and therefore can be safely skipped during comparison.

Definition 0 ().

(Classifying paths in contracts). Let be a vulnerable smart contract and be a repaired versions of obtained by the parallel repair algorithm. A reachable path in can be classifed into one of the following categories

  • is a repaired path of some paths in , or

  • is a new path w.r.t. the set of feasible paths in , or

  • is a joint or common path between and .

Note that a patch introduces to a given vulnerable smart contract may trigger a new set of paths that were infeasible in the original vulnerable smart contract. Thus, a repaired version of a contract may have new set of behaviours w.r.t. the original contract. This may happen for example when the patch updates an expression in a conditional statement in the original vulnerable contract. The advantages of distinction between the above three classes of paths are two-fold. First, it helps to reduce the number of paths that need to be considered when comparing the contracts and hence the number of gas formulas that need to be synthesized. Second, it helps to reduce the complexity of the final gas formulas of the contacts being compared. Note that since we use a genetic algorithm based on three mutation operators (move, insert, and replace), we can easily then classify paths in the contracts being compared into three categories: repaired paths, joint paths, or new paths. Typically, we can identify the locations of buggy statements in the contract and we can augment the repairing algorithm to label the locations of statements that have been influenced by the deployed patch. This facilitates the classification of paths in the generated repaired contract w.r.t. the original contract.

We now turn to describe an acceleration technique that can be applied before conducting the actual comparison between two similar contracts and . Let us denote the set of feasible paths in the two contracts by and . The goal of the acceleration technique is to generate reduced versions of the contracts and as follows:

  1. Compute the sets of paths that are unique in each contract as follows

  2. Synthesize a gas formula for each path in the sets and using Equation (2) and then compute the final gas formula by summing the resulting gas formula using Equation (4).

  3. Compare the resulting gas formulas using the comparative approach described at Section 5.

Comparing the gas usage of two contracts using their reduced versions (i.e., versions obtained by skipping joint paths or repaired paths whose gas formulas are equivalent) preserves soundness, as described in the following theorem.

Theorem 2 ().

(Soundness of reduction). Let be a vulnerable smart contract and be a repaired version of . Let also and be gas formulas for and respectively and and be gas formulas for reduced versions of and obtained as described at Section 6. Then when dominates then dominates , and vice versa.

Theorem 3 ().

(Effectiveness of reduction). The accelerated comparative approach of smart contracts has lower computational complexity than the non-accelerated comparative approach. The amount of reduction on the computational complexity that can be obtained depends on the number of joint and repaired paths in the contracts being compared that can be skipped safely during the comparison (i.e., without adversely affecting the outcome of comparison).

The number of generated gas sub-formulas (for paths) and the complexity of the final gas formula (for the contract) can be significantly reduced if the acceleration approach is employed. This is crucial as synthesizing gas formulas for paths can be an expensive step specially for paths with cyclic behaviour. Note that comparing reduced versions of contracts using simplified or reduced gas formulas that consider only different paths in the two contracts does not affect the soundness of the analysis. This is mainly due to the observation that only the set of different paths in the contracts can make the gas consumption of a contract dominates the other.

7. Evaluation and research questions

7.1. Prototype implementation

To evaluate our presented repair approach for vulnerable smart contracts, we have implemented a tool called SCRepair. The tool has interfacing component for the smart contract security analyzer Oyente (Luu et al., 2016) and Slither (Feist et al., 2019) in order to analyse and detect security vulnerabilities (if any) in the subject smart contracts. The tool Oyente is a symbolic execution tool that works directly with Ethereum virtual machine code. It is able to detect some of the commonly occurring security flaws of Ethereum, including reentrancy, which was exploited caused a loss of 60 million US dollars in June 2016 (the DAO hack). However, since our repairing approach aims not only to fix the vulnerability but also to optimize the gas usage of the patched vulnerable smart contract, we extended the tool Oyente in a way such that it can generate the information for determining the approximated gas dominance relationship with our acceleration method. To have a more accurate gas dominance determination, we have also extended the original gas usage modelling in Oyente to be closer to the actual EVM’s gas model. On the other hand, the other supported vulnerability detector tool Slither is a static analysis based detector which is able to reliably detect various vulnerabilities within a short time due to the light nature of static program analysis. The fault localization information provided by the both vulnerability detection tools is used as the fix localization of the repair process.

In Fig. 4 we give the schematic diagram of our smart contract repairing tool in which we describe the basic components of the tool. The tool consists mainly of five units: the vulnerability detection unit, the test cases execution unit, the gas ranking unit, the patch generation unit, and the main controller unit. These components interact with each other in order to generate plausible patches for a given vulnerable smart contract. It is interesting to also note that the components of the tool operate in parallel. Asynchronous programming has also been employed to increase the efficiency.

Figure 4. The architecture of the SCRepair tool

7.2. Etherscan Vulnerable Dataset (EV-DS)

To evaluate our repair approach, we have constructed a dataset of vulnerable smart contracts taken mainly from Etherscan as a proxy to access real-world deployed sc source code. Etherscan is a well-known block explorer, search, API and analytic platform for Ethereum Mainnet, which is the main network wherein actual transactions of smart contracts take place on a distributed ledger. A large amount of information related to the smart contracts can be extracted from Etherscan, e.g., deployment address, verified source code, byte-code and application binary interface (ABI) of deployed contracts. Using the Etherscan we collected around 34,400 smart contract source code files. These source files are then analysed using the tool Oyente. We obtained 2,752 vulnerable smart contracts with different types of vulnerabilities. Four types of vulnerabilities have been detected on this dataset: tod, re, ed, and io.

The tod happens when the user of a smart contract assumes a particular state of a contract, which may not exist when his transaction is processed potentially leading to malicious behavior. Reentrancy vulnerability which is probably the most widely known vulnerability as it led to the DAO hack/exploitation. re happens when a contract is called by another contract so that the original contract has to wait for the call to finish. This intermediate state can be exploited. The contracts may suffer also from the so-called exception disorder (ed) vulnerability where the contract does not check explicitly whether the send operations have been completed successfully. Integer overflow is a common problem across all systems which could be used to modify the program state in an unwanted manner by deliberately providing large numbers as inputs leading to wrong results being calculated in mathematical operations.

While the selection of smart contracts shown in Tables 3 and 4 has been made randomly from the dataset EV-DS, we have considered some key criteria when selecting these smart contracts. The two main criteria we considered are: (1) the size and complexity of the vulnerable smart contract measured mainly in terms of the number of lines in the contract, and (2) the popularity and the number of available transactions of each vulnerable smart contract.

7.3. Test Case Generation for EV-DS Dataset

Since test cases for smart contracts are in general not available on blockchains and that the authors of the deployed smart contracts are also not contactable (Luu et al., 2016), we therefore use a novel method to generate regression test cases from the avaliable transactions to the subjects smart contracts on the blockchain. For every transaction (denoted by ) to the subject smart contract, we capture the inputs and the changes to the blockchain state during the execution of which then considered as the inputs and expected behaviors of the generated regression test case. A generated regression test case for a transaction contains the following elements:

  1. Blockchain state before executing the transaction .

  2. The function being invoked and the corresponding argument values.

  3. Blockchain state after executing the transaction .

  4. The return values of invoked functions.

However, as the whole blockchain state can be very huge (in the magnitude of terabytes), it is impractical to simply store relevant versions of the blockchain state. To address this issue, we only capture relevant states of the Ethereum accounts in the blockchain before and after the execution of the transaction . The generation of each regression test case is then run against the original vulnerable smart contract to check the validity of the newly generated test case. During the test case generation process, we have set a timeout bound of 5 minutes for the execution time of each regression test case. However, regression test cases required longer time is terminated and discarded. Table 3 shows the number of regression test cases generated for each subject contract. The valid generated regression test cases are then used in the automated repair experiments.

Name of smart contract No. of transactions No. of regression test cases Supported by our prototype
Autonio ICO 34 31 Yes
Airdrop 147 7 Yes
Banana Coin 360 24 Yes
XGold Coin 308 304 Yes
Flight Delay Issuance 80 1 No
Hodbo Crowdsale 36 18 Yes
Lescoin Presale 115 107 Yes
Classy Coin 574 495 Yes
Yobcoin Crowdsale 515 435 Yes
Classy Coin Airdrop 137 4 Yes
OKO Token ICO 179 173 Yes
ApplauseCash Crowdsale 43 42 Yes
HDL Presale 94 93 Yes
Privatix Presale 78 11 Yes
MXToken Crowdsale 56 37 Yes
EthereumFox 493 491 No
dgame 302 108 Yes
Easy Mine ICO 1339 491 Yes
Siring Clock Auction 1641 2 Yes
Government 502 366 No
Table 3. Regression test case generation statistics for EV-DS dataset

7.4. Factors Affecting our Repair Algorithm

Before discussing the research questions that we developed to evaluate the presented genetic repair algorithm, we first summarise the key factors that affect the correctness and efficiency of our genetic repair algorithm. There are various factors and parameters that contribute to the efficiency of the algorithm which are within the control of the user of the genetic repair algorithm, which we summarise as follows:

  1. Quality of test suite and vulnerabilities. The quality of provided test suite for a given vulnerable smart contract has a major impact on the genetic repair algorithm. Recall that a mutant is considered as a repair when all available test cases pass and the vulnerability detector does not report any vulnerability found. In our experiments, we constructed the test suite with a script to convert past block-chain transactions as positive test-cases as described in subsection 7.2. The vulnerabilities detected by a smart contract checker like Oyente and Slither constitute the negative behavior that the generated patches should avoid.

  2. Timeout allocated to the algorithm. A feasible exploration of the search space (candidate patches) depends heavily on the amount of resources allocated to the genetic algorithm. In general, the size of the generated search space of a given vulnerable contract depends on multiple factors including: (i) the size and complexity of the contract being repaired, (ii) the number of buggy statements in the contract, and (iii) the mutation operators used by the algorithm. However, the number of mutants that can be examined during the search is limited to the time budget allocated to the algorithm. The bigger the time budget, the higher the probability to produce a plausible patch.

  3. The consideration of gas consumption of patches. Considering the gas when searching for plausible patches of a vulnerable smart contract can be of great benefit. First, it can help to generate a low-cost repair for a given vulnerable smart contract by comparing the gas consumption of generated patches and selecting the one with low average cost. Second, it can be used to optimise the efficiency of the genetic search algorithm in various ways. For example, it can be used to detect and discard infeasible patches early. Note that a patch can be a plausible patch (passes the test-cases) but infeasible to be deployed on a real blockchain. This happens when the generated patch consumes a significantly large amount of gas and thus leading to expensive transactions. To reduce the computational complexity of the algorithm, one might need to maintain during the search the best known low-cost average gas usage (let us denote it by ) of a plausible patch. Then when a new plausible patch is found that has lower average cost, the bound will be updated accordingly. The bound can be updated on-the-fly during the search and used to discard infeasible patches without necessarily examining the entire test suite.

  4. The number of genetic mutation operators used by the algorithm. Note that the size of the search space that needs to be examined when searching for a plausible patch for a vulnerable contract can be extremely large. Recall that the search space of a given vulnerable smart contract is generated by mutating (buggy) statements in the contract. Hence, the size of the generated search space grows exponentially w.r.t. the number of considered lines in the contract and the number of mutation operators. The smaller the number of the mutation operators, the smaller the size of the search space and the faster the algorithm. However, reducing the number of the mutation operators may reduce significantly the capability of the algorithm in producing plausible patches.

  5. The state space search order. As the search space grows, the organisation of mutants or candidate patches into sub-spaces becomes more critical to the efficiency of the algorithm. In general, there is no specific search strategy that one can follow when examining the candidate patches of a given vulnerable smart contract. The search can be purely sequential and random or it can be parallelized based on the semantics of the mutation operators. However, as described above, the search can be optimised by taking into consideration some interesting factors including the semantics of the bug, the semantics of the mutation operators, and the gas consumption of generated patches.

As one can see from the above described factors, the correctness and efficiency of the genetic algorithm can be evaluated under many different settings and assumption by varying the way the above factors are configured. For example, one might wonder how does the algorithm perform when enabling/disabling the gas calculation of generated patches or when increasing/decreasing the size of test suite or the amount of time budget allocated to the genetic algorithm. In this work, we choose to evaluate the correctness and efficiency of the genetic algorithm by considering five key research questions. The goal of the research questions is to evaluate the presented parallel genetic repair algorithm and to understand and draw conclusions about the factors affecting the correctness and quality of generated patches.

8. Research Questions and Experimental Results

Before discussing the experimental results of the set of vulnerable smart contracts considered in this work, we would like to mention some ethical issues regarding the work. We decide to publish the dataset for open science. The blockchain system is decentralized so even if we want to go back to the owner we cannot find them. This is unlike vulnerabilities like the Spectre where Microsoft and Intel were able to be contacted and informed first. We run our tool on a single AWS EC2 instance c5.24xlarge which has 192GB of RAM and AWS-customized 2nd generation of Intel Xeon Scalable Processor with 96 CPU execution threads allocated. It is interesting to mention that our algorithm can be run on a compute cluster of multiple computing nodes and our implementation supports that. However, we run our experiments on a single node in this work for the simplicity and sake of financial budget. Among the 20 vulnerable sc subjects, our implementation prototype was able to handle 17 of them. The remaining 3 has syntax construct that is currently unsupported or the version of Solidity used in the implementation of these contracts is too old to be supported. Therefore, we carried out our experiments on the 17 supported subjects. For increasing the variety of vulnerabilities being considered while avoiding the expensive cost of symbolic execution, we have employed Slither as the vulnerability detector for the first fifteen subjects and Oyente for the remaining subjects. We limit the scope of targeted vulnerabilities in our experiments to have more focused study to the following vulnerabilities: ed, re, io, tod. In our experiments, the maximum gas usage bound is not specified since a reasonable value is subject to the concrete usage of the subject smart contracts from the view of the original developers.

RQ1: How effective is the genetic repair algorithm at fixing detected bugs?

Setup

To demonstrate the effectiveness of the presented genetic repair algorithm in fixing vulnerable sc, we run the genetic algorithm on the selected set of sc. We evaluate the effectiveness of the algorithm by measuring the number of vulnerabilities that can be detected and repaired correctly and the time it takes to generate correct patches of these vulnerable contracts. Recall that a repair is generated by the algorithm when all failing test cases pass and no targeted vulnerability found. Hence, the generated patch might not be a correct patch. We then check the correctness of the generated patches by inspecting the semantics of the patches manually. We assert a plausible fix for one vulnerability as correct if it indeed repaired the vulnerability being detected while the original business logic does not seem to be modified and the fix does not introduce new features to the code.

Results

For each of the considered vulnerable contracts, we have run our algorithm five times. We report the average value of the runtime and the sum of plausibly successfully repaired vulnerabilities among five runs as the final results. Table 4 shows the summary of the results and the average runtime of the algorithm. The algorithm was able to plausibly repair 26 occurrences of vulnerabilities among the 48 detected vulnerabilities. The average runtime of the algorithm over the considered 17 subjects was 25 minutes. We noticed that the main bottleneck of the implementation is due to the test case execution time which often consumes the most computational resources and blocks the synchronization barrier of each iteration of the main loop of the algorithm. When inspecting the generated patches, we found that our algorithm was able to fix correctly 21 vulnerabilities out of the detected 48 vulnerabilities. However, a careful inspection of the results reported in Table 4 leads to the following interesting observations.

Observation RQ1.1 ().

As shown in Table 4 there are four different classes of vulnerabilities that have been considered when evaluating the algorithm, namely, ed, re, io, and tod. We observed that most of the vulnerabilities of the classes ed and re have been fixed correctly by the algorithm, where 21 out of the 28 detected ed have been plausibly repaired and 4 out of the 6 detected re have been plausibly repaired. On the other hand, the algorithm was unable to generate correct patches for any of the vulnerabilities of the classes IO and TOD.

Observation RQ1.2 ().

The occurrence rates of the vulnerabilities ED, RE, IO, and TOD in the considered vulnerable contracts are as follows: ED occurs , RE occurs , TOD occurs , and IO occurs . We observed that the ED vulnerability is the most frequently occurring class of bugs in the selected vulnerable contracts, where 28 out of the 48 detected bugs are ED bugs.

Observation RQ1.3 ().

We observed that 7 out of the considered 17 vulnerable contracts have been repaired in less than 10 minutes, where most of these contracts contain multiple bugs. This demonstrates clearly the efficiency of the presented parallel genetic repair algorithm in fixing vulnerabilities in a considerably short amount of time.

Name of contract
# lines
Vulnerabilities Discovered
Vulnerabilities Repaired
(Correct/Plausible)
Average Run-time
(mins)
Autonio ICO 330 ed(1) ed(0/1) 3
Airdrop 62 ed(4) ed(3/4) 8
Banana Coin 117 ed(1), re(1) ed(1/1), re(1/1) 16
XGold Coin 272 ed(2) ed(2/2) 12
Hodbo Crowdsale 268 ed(2) ed(2/2) 22
Lescoin Presale 351 ed(2) ed(1/1) 2
Classy Coin 217 ed(1), re(1) None 29
Yobcoin Crowdsale 481 ed(2), re(1) ed(1/1), re(1/1) 60
Classy Coin Airdrop 49 ed(2) ed(1/2) 1
OKO Token ICO 232 ed(4), re(2) ed(1/1), re(1/2) 60
ApplauseCash Crowdsale 407 ed(2), re(1) ed(1/1) 60
HDL Presale 239 ed(3) ed(3/3) 55
Privatix Presale 179 ed(1) ed(1/1) 2
MXToken Crowdsale 186 ed(1) ed(1/1) 30
dgame 42 io(3), tod(1) None 2
Easy Mine ICO 351 io(6), tod(1) None 60
Siring Clock Auction 978 io(3) io(0/1) 4
Total 4761
ed(28),io(12),re(6)
tod(2), sum: 48
ed(18/21),re(3/4)
io(0/1), sum: 21/26
Table 4. A summary of the experimental results addressing RQ8

Answer to RQ1: The presented parallel genetic repair algorithm is generally effective in terms of generating plausible patches for vulnerable smart contracts. Among the 48 detected vulnerabilities, the algorithm was able to fix plausibly 26 vulnerabilities, where 21 of these plausible fixes have been verified to be correctly fixing the vulnerabilities. Hence, the presented genetic algorithm achieved a success rate of over the selected set of vulnerable contracts.

RQ2: Does fixing the vulnerability affect the gas usage?

Setup

When fixing the detected vulnerabilities, expressions in the vulnerable smart contracts will be modified. However, it is unclear whether plausibly fixing the vulnerabilities would change the average gas consumption of the smart contract. This RQ attempts to investigate this. We therefore perform a comparison on the average gas consumption between the original vulnerable smart contract and the plausibly patched versions generated from five repeated runs conducted in RQ8. Gas dominance levels between the original contract and the patched versions are used as proxy. We assert the patched version has different average gas consumption from the original version when they are of different gas dominance levels. To calculate the gas dominance level, the gas formula of the original version and the patched versions will be generated first.

Results

Table 5 shows the difference in average gas consumption between the plausible patches and the original version. Subjects for which plausible patches could not be generated within time limit are omitted for consideration of this RQ. To sum up, 6 out of 8 (75%) of our set of selected subjects have plausible patches with gas formula that are different from the original vulnerable version while half (50%) of our set of selected subjects have plausible patches having gas dominance levels different from that of the original vulnerable version. This suggests the possibility that fixing vulnerabilities in smart contracts can change the average gas consumption of the original contract. For the subjects with plausible patches amending the average gas consumption, each independent patch generation process has high probability (93.65% in our experiments) of generating plausible patches of gas dominance levels different from the original version.

Answer to RQ2: In general, when fixing vulnerabilities in a vulnerable smart contract, the gas should be one of the factors that comes into the play.

Name of contract # plausible patches
# patches with diff.
gas formula
from original
# patches with diff.
gas dominance level
from original
Ratio of
plausible patches yielding
different average gas consumption
Autonio ICO 7 7 6 85.7%
Airdrop 5 0 0 0%
Banana Coin 4 4 0 0%
XGold Coin 7 7 0 0%
Hodbo Crowdsale 3 3 3 100%
Classy Coin Airdrop 5 5 5 100%
HDL Presale 1 0 0 0%
Privatix Presale 9 8 8 88.89%
Table 5. A summary of the experimental results addressing RQ8

RQ3: Can plausible patches generated across independent runs vary in average gas consumption?

Setup

Further, we would like to investigate whether there is a possibility to plausibly fix the vulnerabilities with more than one patch yielding to different average gas consumption across patches. In other words, we intend to understand whether the same bugs can be fixed with patches of different average gas consumption. If the answer is positive, we then justify the need to attempt pursuing a more gas-efficient plausible patch during the search process. We conduct our analysis on the patches generated in RQ8 across five repeated runs. We leverage gas dominance levels of patches as a proxy to compare the difference in average gas consumption between patches. We assert a patched version has different average gas consumption from the other when they are of different gas dominance levels. Note that two patched versions have different gas dominance levels when the gas formula of one of the two versions dominates the other. However, to calculate the gas dominance level of generated patches, the gas formulas of the original version and the patched versions need to be generated first.

Results

Table 6 shows the difference in average gas consumption between the generated plausible patches of selected vulnerable contracts. Subjects for which plausible patches could not be generated within time limit are omitted for consideration of this RQ. For 5 out of 8 subjects (62.5%), we were able to get a set of plausible patches with more than one corresponding unique gas formulas, indicating the diversity of gas consumption between plausible patches addressing the same set of vulnerabilities. We noticed that plausible patches have overall 57.68% of chance to have unique gas formula, which also yields around three gas dominance levels among the plausible patches for one contract on average.

Observation RQ3.1 ().

For 62.5% of the considered subjects, there exist plausible patches having different average gas consumption.

Name of contract # plausible patches
# unique gas formula
among patches
# gas dominance levels
among patches
Autonio ICO 7 7 7
Airdrop 5 1 1
Banana Coin 4 2 1
XGold Coin 7 5 1
Hodbo Crowdsale 3 2 2
Classy Coin Airdrop 5 1 1
HDL Presale 1 1 1
Privatix Presale 9 3 3
Table 6. A summary of the experimental results addressing RQ8

Answer to RQ3: Different plausible patches can yield various average gas consumption for fixing the same vulnerabilities. We should therefore attempt to guide the search towards more gas-efficient plausible patches besides considering their correctness.

RQ4: How effective is the gas ranking approach at producing low-cost patches?

Setup

During the patch generation process, we have integrated our proposed gas comparative approach to compare the relative gas usage of generated patches. The relative gas dominance relationship is then used in the genetic patch generation process as a guidance to generate a potentially gas optimised patch. To evaluate systematically the effectiveness of the gas usage objective in producing low-cost patches, we run our repair algorithm on the selected vulnerable smart contracts under two different settings: the first setting is when the the gas ranking objective is active (done in RQ8) and the second setting is when the gas ranking objective is deactivated. The first setting is a reuse of patches generated in RQ8 while the second setting is additional runs with repeating factor of five and timeout of one hour. Later, we run all patches generated in both settings on our generated test cases and collect the average runtime gas usage of each setting. For consistent and fair comparison, we only consider patches fixing all vulnerabilities. Different from RQ 8 and RQ8, this RQ attempts to expose the change in average gas consumption for the previous usages of the contracts to infer practical gas cost changes.

Results

Table 7 shows the summary of average gas usage of patches generated with and without the gas objective being activated. Subjects that plausible patches could not be generated within time limit (1 hour) are omitted for consideration of this RQ. Overall, 6 out of 8 subjects among subjects for which both settings can generate plausible patches (75%), the gas objective is effective to reduce the average cost of the patches by up to 9.31% for our subjects. Two subjects (Autonio ICO and Classy Coin Airdrop) do not have varied average gas usage between patches generated in two settings. One subject (MXToken Crowdsale) does not have plausible patch generated where the gas objective is deactivated in the five repeated runs. In addition, we have also done careful profiling of the algorithm exposing the fact that gas ranking has frequently been the determining factor of patch rankings during the repair process of the selected subjects even though the gas objective is employed as a secondary objective.

Name of contract
Average gas usage
(Gas objective is enabled)
Average gas usage
(Gas objective is disabled)
Autonio ICO 87092.2 87092.2 (0%)
Airdrop 73633.4 74316.1 (0.92%)
Banana Coin 72535.1 72542.3 (0.01%)
XGold Coin 46154.6 49296.3 (6.37%)
Hodbo Crowdsale 38848.3 38848.6 (0%)
Classy Coin Airdrop 72810.5 72810.5 (0%)
HDL Presale 48536.6 48536.525 (0%)
Privatix Presale 40323.7 44464.46 (9.31%)
MXToken Crowdsale 43247.4 No patch generated
Table 7. A summary of the experimental results addressing RQ8

Answer to RQ4: When enabling the gas objective during the patch generation process we observed that the average gas consumption of generated patches of four vulnerable contracts has been reduced comparing to the setting in which the gas objective was disabled. We observed also that the average gas of two subjects has been considerably reduced when enabling the gas objective, where the average gas of the patched version of XGold Coin contract has been reduced by and the average gas of the patched version of Privatix Presale contract has been reduced by . This is a considerable amount of reduction as gas costs real money.

RQ5: How does the time budget impact our effectiveness at fixing bugs?

Setup

Allocating or estimating a feasible time budget to a genetic repair algorithm is an interesting open problem. It is crucial as it affects the capability of the algorithm in generating plausible patches for a given vulnerable contract. There are some key factors that should be taken into consideration in order to allocate a feasible time budget to our repair algorithm including: (i) the size of the test suite, (ii) the complexity of the contact (i.e., larger contracts may take longer time to be analyzed than smaller contracts), and (iii) the estimated size of the search space which in turn depends on the number of the mutation operators used by the algorithm and size of the original vulnerable contract. To address this research question, we choose to evaluate the algorithm under two different time budgets: the first is when we set the timeout to 30 minutes and the second is when we set the timeout to one hour. The goal is then to measure the number of vulnerable contracts that have been repaired under the two settings.

Results

Table 8 shows the results of running the algorithm over the selected vulnerable smart contracts using two different values of the timeout parameter (30 minutes and 1 hour). As shown in the table, when setting the timeout parameter to 30 minutes the algorithm was able to generate plausible patches for 17 vulnerabilities out of the 48 detected ones, achieving a success rate of . On the other hand, when setting the timeout parameter to 1 hour the algorithm was able to generate plausible patches for 26 vulnerabilities, achieving a success rate of . While the amount of improvement on the repair rate looks somewhat small, it is very crucial as it shows that some vulnerabilities can be only repaired when increasing the timeout to 1 hour. This clearly demonstrates the impact of the timeout parameter on the effectiveness of the algorithm. However, since every detected vulnerability in a given vulnerable smart contract needs to be repaired and the fact that the size of the search space can be extremely huge, the time budget allocated to the algorithm can play a key role in the successful termination of the algorithm. The intuition behind this, is due to the observation that when increasing the time budget of the algorithm, we increase the size of the explored search space which in turn increases the probability of generating plausible patches. This also shows that the genetic algorithm may sometimes fail to produce plausible patches for detected vulnerabilities due to the infeasible allocation of the time budget.

Name of contract
Vulnerabilities discovered
(# Occurrence)
Vulnerabilities plausibly fixed
(30mins/1hr timeout)
Autonio ICO ed(1) Same
Airdrop ed(4) Same
Banana Coin ed(1), re(1) Same
XGold Coin ed(2) Same
Hodbo Crowdsale ed(2) Same
Lescoin Presale ed(2) ed(0/1)
Classy Coin ed(1), re(1) Same
Yobcoin Crowdsale ed(2), re(1) ed(0/1), re(0/1)
Classy Coin Airdrop ed(2) Same
OKO Token ICO ed(4), re(2) ed(0/1), re(0/1)
ApplauseCash Crowdsale ed(2), re(1) ed(0/1), re(0/0)
HDL Presale ed(3) ed(1/3)
Privatix Presale ed(1) Same
MXToken Crowdsale ed(1) ed(0/1)
dgame io(3), tod(1) Same
Easy Mine ICO io(6), tod(1) Same
Siring Clock Auction io(3) Same
Table 8. Experimental results when varying the timeout from 30 minutes to 1 hour

Answer to RQ5: The effectiveness of the genetic algorithm in repairing detected vulnerabilities depends heavily on the time budget allocated to the algorithm. When increasing the timeout parameter of the algorithm from 30 minutes to 1 hour we observed that the vulnerability repair rate of the algorithm has been increased from to , where the genetic algorithm was able to repair 9 extra vulnerabilities. This demonstrates clearly the importance of allocating a substantial time budget (at least one hour) to the algorithm when repairing vulnerable smart contracts.

9. Threats to Validity

9.1. Internal validity

Threats to internal validity are related to the representativeness of our conclusions and summaries made based on our experiment results. In our experimental study, we have conducted our experiment on a sampled dataset to evaluate our approach. The size of the dataset is however limited since this is the first automated smart contract repair work, and therefore, there is no consolidated dataset for use like Detect4J(Just et al., 2014)

for Java. We are aware that our approach is randomized. We admit that the presented results are potentially skewed even though we have conducted our experiments with a replication factor of five times for each setup.

9.2. External validity

External validity treats are related to the ability to generalize our findings. We have only evaluated our work on four known vulnerabilities. While our approach is vulnerability-agnostic, the performance on fixing other vulnerabilities remains unknown. On the other hand, we have conducted our experiments on real-world subjects as an attempt to investigate the performance of approach. This does not guarantee that similar performance will be exhibited for arbitrary vulnerable smart contracts. We leave the larger scale experimentation as a future work.

10. Related Work

10.1. Automated Program Repair

Automated program repair has been the subject of considerable recent attention in the software engineering research community. Commonly, they attempt to automate the process of addressing the bugs exposed by the failing test cases, for which these techniques is collectively called test-based repair techniques. The patch that can fix the buggy program upon applying asserted by passing all test cases is called the test-adequate patch. Several test-based program repair approaches have been developed. These approaches can mainly be classified into search-based and semantics-based approaches.

Search-based approaches developed by Le Goues et al. (2012b), and Martinez and Monperrus (2015) show promising results towards the automation of bug fixing. The key idea of their approaches is to use failing test cases to identify bugs and then apply mutations to the source code until the program passes all failing test cases. Genetic programming methodology with single objective function has been extensively employed, for which Genprog(Le Goues et al., 2012b) is one representative work of this class of approaches. Genetic programming (Le Goues et al., 2012b) as well as random search (Qi et al., 2014) have been used as search techniques for finding a plausible patch, a patch passing given test-cases.

Semantics-based techniques like SemFix (Nguyen et al., 2013), Nopol (Xuan et al., 2017), DirectFix (Mechtaev et al., 2015), SPR (Long and Rinard, 2015), Angelix (Mechtaev et al., 2016) and JFIX (Le et al., 2017) split patch generation into two steps. First, they infer a synthesized specification for the buggy program statements, which is often accomplished via symbolic analysis of executions of test cases. Second, they synthesize a patch for these statements based on the inferred specification. These works view program repair as a specification inference problem, as opposed to searching among candidate patches. These approaches can be combined with search: we explore patches by considering insert/delete/replace of statements, while the semantic analysis can help synthesize expressions to be inserted in the statement replaces (Yi et al., 2017).

Apart from automated program repair approaches driven by test cases, some other studies e.g. Caramel(Nistor et al., 2015) attempts to automatically address performance bugs that can be fixed by inserting an early terminating statement onto loops. Its generated patch can potentially reduce the run-time of the programs.

Our smart contract repair problem (defined in Problem 1) is similar to the test-based program repair problem that we also leverage the test cases to examine functional correctness of patches. However, since vulnerabilities in smart contracts have been raising serious financial loss leading to test, our approach generates patches need to not only be test-adequate but also secure. We refer the interested reader to (Monperrus, 2018; Goues et al., 2019) for a more comprehensive discussion of the complete work on the automated program repair.

10.2. Formal Analysis and Mutation Testing of Smart Contracts

Analysis of smart contracts for possible security vulnerabilities is a popular topic that has received a lot of attention recently, with numerous tools developed based on symbolic execution and SMT solving (Grossman et al., 2017; Kalra et al., 2018; Jiang et al., 2018; Amani et al., 2018; van der Meyden, 2019). The work in (Amani et al., 2018) attempted to translate smart contract source code to Isabelle/HOL in order to conduct some formal verification on smart contracts. They use the symbolic security analyser Oyente (Luu et al., 2016) to detect vulnerabilities in smart contracts. The tool ContractFuzzer (Jiang et al., 2018) uses both fuzzing and static analysis techniques to perform a formal analysis of smart contracts in order to detect security vulnerabilities. Recently, van der Meyden (van der Meyden, 2019) conducted a formal analysis of an abstract model of smart contract code (atomic swap smart contracts) using the epistemic MCK model checking tool (Gammie and van der Meyden, 2004) . He showed how to automatically verify that a concrete implementation of atomic swap satisfies its specification using epistemic-temporal logic model checking.

There is also a considerable amount of work on the mutation testing of smart contracts (Wu et al., 2019; Honig et al., 2019; Fu et al., 2019). Mutation testing (Papadakis et al., 2019; Offutt and Untch, 2001) is an evaluation technique for evaluating the quality of a set of test cases (i.e., a test suite). It works by introducing faults into a system via source code mutation and then analysing the ability of some developed test suite to detect these faults. The work in (Wu et al., 2019) has implemented some mutation operators and tested them on four DApps (decentralised applications on blockchains). However, their approach does not take into consideration the access control faults and the gas usage of the mutated contracts. The work in (Honig et al., 2019) developed a mutation testing framework for smart contracts that considers the access control faults, but it does not consider the gas limit. The work in (Fu et al., 2019) introduced a smart contract mutation approach, but for testing implementations of the Ethereum Virtual Machine (EVM) implementations and not smart contracts. There are two available GitHub repositories with related tools on mutation testing of smart contracts: (1) Eth-mutants111https://github.com/federicobond/eth-mutants which implements just one mutation operator and (2) UniversalMutator which describes a generic mutation tool (Groce et al., 2018) with set of operators for Solidity.

10.3. Gas usage Calculation of Smart Contracts

The work in (Marescotti et al., 2018) presented techniques for calculating the worst case gas usage of smart contracts. Their approach is based on symbolically enumerating all execution paths and unwinding loops up to some limit. They infer the maximal number of iterations for loops and generates accurate gas bounds. Knowing the worst case gas usage bound for smart contracts can be extremely useful as it provides the smart contract users some information about the maximum amount of gas they need to pay before sending out their transactions to the blockchain networks. The work in (Signer, 2018) provides a graphical user interface that generates a gas usage information (e.g. best and worst case gas usage, and the gas usage of different parts of the code) which helps the developers to optimize the gas usage of their smart contracts.

11. Conclusion

In this paper, we have presented the first work on automatically repairing smart contracts. Our repair method is gas-optimized and vulnerability-agnostic. The repair algorithm is search-based, and it breaks up the huge search space of candidate patches down into smaller mutually-exclusive spaces that can be processed independently. The repair technique considers gas usage of vulnerable contracts when generating patches for detected vulnerabilities. Our experiments demonstrated that our method can handle real-world contracts and generate repairs in a short computational time (less than 1 hour) while taking into consideration the cost of the generated repairs.

Since the owners of smart contracts are unknown, we could not reach out to them in advance, prior to publication. Nevertheless, we hope that this work will spur greater interest in improving the reliability of smart contracts via software testing and analysis. Our smart contract repair tool and dataset is available in Github from the following web-site.

https://github.com/xiaoly8/SCRepair

Acknowledgments

This work was partially supported by the National Satellite of Excellence in Trustworthy Software Systems, funded by National Research Foundation (NRF) Singapore under National Cybersecurity R&D (NCR) programme.

References

  • S. Amani, M. Bégel, M. Bortin, and M. Staples (2018) Towards verifying ethereum smart contract bytecode in isabelle/hol. In Proceedings of the 7th ACM SIGPLAN International Conference on Certified Programs and Proofs, CPP 2018, pp. 66–77. Cited by: §10.2, §2.
  • N. Atzei, M. Bartoletti, and T. Cimoli (2017) A survey of attacks on ethereum smart contracts sok. In Proceedings of the 6th International Conference on Principles of Security and Trust - Volume 10204, pp. 164–186. External Links: ISBN 978-3-662-54454-9 Cited by: Table 1, §2.
  • K. Bhargavan, A. Delignat-Lavaud, C. Fournet, A. Gollamudi, G. Gonthier, N. Kobeissi, N. Kulatova, A. Rastogi, T. Sibut-Pinote, N. Swamy, and S. Zanella-Béguelin (2016) Formal verification of smart contracts: short paper. In Proceedings of the 2016 ACM Workshop on Programming Languages and Analysis for Security, PLAS ’16, pp. 91–96. Cited by: Table 1, §2.
  • K. Deb, S. Agrawal, A. Pratap, and T. Meyarivan (2000) A fast elitist non-dominated sorting genetic algorithm for multi-objective optimization: nsga-ii. In International conference on parallel problem solving from nature, pp. 849–858. Cited by: §4.4, Definition 2.
  • K. Delmolino, M. Arnett, A. Kosba, A. Miller, and E. Shi (2016) Step by step towards creating a safe smart contract: lessons and insights from a cryptocurrency lab. In Financial Cryptography and Data Security - International Workshops, FC 2016, BITCOIN, VOTING, and WAHC, Revised Selected Papers, pp. 79–94. Cited by: §2.
  • A. Dika (2017) Ethereum smart contracts: security vulnerabilities and security tools. Master’s Thesis, Norwegian University of Science and Technology, Department of Computer Science. Cited by: Table 1, §2.
  • J. Feist, G. Grieco, and A. Groce (2019) Slither: a static analysis framework for smart contracts. In 2019 IEEE/ACM 2nd International Workshop on Emerging Trends in Software Engineering for Blockchain (WETSEB), pp. 8–15. Cited by: item 6, §2, §7.1.
  • Y. Fu, M. Ren, F. Ma, H. Shi, X. Yang, Y. Jiang, H. Li, and X. Shi (2019) EVMFuzzer: detect evm vulnerabilities via fuzz testing. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2019, New York, NY, USA, pp. 1110–1114. External Links: ISBN 978-1-4503-5572-8, Link, Document Cited by: §10.2.
  • P. Gammie and R. van der Meyden (2004) MCK: model checking the logic of knowledge. In Computer Aided Verification, 16th International Conference, CAV, Cited by: §10.2.
  • C. L. Goues, M. Pradel, and A. Roychoudhury (2019) Automated program repair. Communications of The ACM 62 (12). Cited by: §10.1.
  • I. Grishchenko, M. Maffei, and C. Schneidewind (2018) A semantic framework for the security analysis of ethereum smart contracts. In Principles of Security and Trust - 7th International Conference, POST, pp. 243–269. Cited by: §2.
  • A. Groce, J. Holmes, D. Marinov, A. Shi, and L. Zhang (2018) An extensible, regular-expression-based tool for multi-language mutant generation. In Proceedings of the 40th International Conference on Software Engineering (ICSE), pp. 25–28. Cited by: §10.2.
  • S. Grossman, I. Abraham, G. Golan-Gueta, Y. Michalevsky, N. Rinetzky, M. Sagiv, and Y. Zohar (2017) Online detection of effectively callback free objects with applications to smart contracts. Proc. ACM Program. Lang. 2 (POPL), pp. 48:1–48:28. Cited by: §10.2, §2.
  • J. J. Honig, M. H. Everts, and M. Huisman (2019) Practical mutation testing for smart contracts. In Data Privacy Management, Cryptocurrencies and Blockchain Technology - ESORICS International Workshop, pp. 289–303. Cited by: §10.2.
  • B. Jiang, Y. Liu, and W. K. Chan (2018) ContractFuzzer: fuzzing smart contracts for vulnerability detection. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, ASE 2018, pp. 259–269. Cited by: §10.2, Table 1, §2.
  • R. Just, D. Jalali, and M. D. Ernst (2014) Defects4J: a database of existing faults to enable controlled testing studies for java programs. In Proceedings of the 2014 International Symposium on Software Testing and Analysis, pp. 437–440. Cited by: §9.1.
  • S. Kalra, S. Goel, M. Dhawan, and S. Sharma (2018) ZEUS: analyzing safety of smart contracts. In 25th Annual Network and Distributed System Security Symposium, NDSS, Cited by: §10.2, §2.
  • C. Le Goues, M. Dewey-Vogt, S. Forrest, and W. Weimer (2012a) A systematic study of automated program repair: fixing 55 out of 105 bugs for $8 each. In Proceedings of the 34th International Conference on Software Engineering (ICSE), Cited by: §1.
  • C. Le Goues, T. Nguyen, S. Forrest, and W. Weimer (2012b) GenProg: a generic method for automatic software repair. IEEE Transactions on Software Engineering 38 (1), pp. 54–72. Cited by: §1, §10.1, §4.3.
  • X. D. Le, D. Chu, D. Lo, C. Le Goues, and W. Visser (2017) JFIX: semantics-based repair of java programs via symbolic pathfinder. In Proceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis, pp. 376–379. Cited by: §10.1.
  • F. Long and M. Rinard (2015) Staged program repair with condition synthesis. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering, ESEC/FSE 2015, pp. 166–178. Cited by: §1, §10.1.
  • L. Luu, D. Chu, H. Olickel, P. Saxena, and A. Hobor (2016) Making smart contracts smarter. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pp. 254–269. Cited by: item 6, §10.2, Table 1, §2, §7.1, §7.3.
  • M. Marescotti, M. Blicha, A. E. J. Hyvärinen, S. Asadi, and N. Sharygina (2018) Computing exact worst-case gas consumption for smart contracts. In Leveraging Applications of Formal Methods, Verification and Validation, pp. 450–465. Cited by: §10.3.
  • M. Martinez and M. Monperrus (2015) Mining software repair models for reasoning on the search space of automated program fixing. Empirical Softw. Engg. 20 (1), pp. 176–205. Cited by: §1, §10.1.
  • S. Mechtaev, J. Yi, and A. Roychoudhury (2015) DirectFix: looking for simple program repairs. In Proceedings of the 37th International Conference on Software Engineering, ICSE ’15, pp. 448–458. Cited by: §1, §10.1.
  • S. Mechtaev, J. Yi, and A. Roychoudhury (2016) Angelix: scalable multiline program patch synthesis via symbolic analysis. In Proceedings of the 38th International Conference on Software Engineering, ICSE ’16, pp. 691–701. Cited by: §1, §10.1.
  • M. Monperrus (2018) Automatic software repair: a bibliography. ACM Computing Survey 51 (1), pp. 17:1–17:24. Cited by: §10.1.
  • H. D. T. Nguyen, D. Qi, A. Roychoudhury, and S. Chandra (2013) SemFix: program repair via semantic analysis. In Proceedings of the 2013 International Conference on Software Engineering, ICSE ’13, pp. 772–781. Cited by: §1, §10.1.
  • A. Nistor, P. Chang, C. Radoi, and S. Lu (2015) CARAMEL: detecting and fixing performance problems that have non-intrusive fixes. In 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, Vol. 1, pp. 902–912. External Links: Document, ISSN 1558-1225 Cited by: §10.1.
  • A. J. Offutt and R. H. Untch (2001) Mutation testing for the new century. W. E. Wong (Ed.), pp. 34–44. Cited by: §10.2.
  • M. Papadakis, M. Kintis, J. Zhang, Y. Jia, Y. L. Traon, and M. Harman (2019) Mutation testing advances: an analysis and survey. Advances in Computers 112, pp. 275–378. Cited by: §10.2.
  • Y. Qi, X. Mao, Y. Lei, Z. Dai, and C. Wang (2014) The strength of random search on automated program repair. In ACM/IEEE International Conference on Software Engineering, Cited by: §10.1.
  • C. Signer (2018) Gas cost analysis for ethereum smart contracts. Master’s Thesis, ETH Zurich, Department of Computer Science. Cited by: §10.3.
  • S. Tikhomirov, E. Voskresenskaya, I. Ivanitskiy, R. Takhaviev, E. Marchenko, and Y. Alexandrov (2018) SmartCheck: static analysis of ethereum smart contracts. In Proceedings of the 1st International Workshop on Emerging Trends in Software Engineering for Blockchain, WETSEB ’18, pp. 9–16. Cited by: Table 1, §2.
  • P. Tsankov, A. Dan, D. Drachsler-Cohen, A. Gervais, F. Bünzli, and M. Vechev (2018) Securify: practical security analysis of smart contracts. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, Cited by: Table 1, §2.
  • R. van der Meyden (2019) On the specification and verification of atomic swap smart contracts. In IEEE International Conference on Blockchain and Cryptocurrency, pp. 176–179. Cited by: §10.2, §2.
  • G. Wood (2019) Ethereum: a secure decentralised generalised transaction ledger. Ethereum project yellow paper 151, pp. 1–32. Cited by: §2, §5.1, §5.1.
  • H. Wu, X. Wang, J. Xu, W. Zou, L. Zhang, and Z. Chen (2019) Mutation testing for ethereum smart contract. External Links: 1908.03707 Cited by: §10.2.
  • J. Xuan, M. Martinez, F. DeMarco, M. Clement, S. L. Marcote, T. Durieux, D. Le Berre, and M. Monperrus (2017) Nopol: automatic repair of conditional statement bugs in java programs. IEEE Trans. Softw. Eng., pp. 34–55. Cited by: §1, §10.1.
  • J. Yi, U. Z. Ahmed, A. Karkare, S. H. Tan, and A. Roychoudhury (2017) A feasibility study of using automated program repair for introductory programming assignments. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering, ESEC/FSE 2017, pp. 740–751. Cited by: §10.1.