A blockchain is a distributed ledger that manages assets between users. A smart contract encodes rules to handle the transfer of these assets. The transfers happen within transactions that are stored on the blockchain and are persistent. Therefore, smart contracts can implement a wide range of use cases, including financial and governance applications . For instance, a contract could act like an autonomous agreement between multiple parties to transfer assets to desired accounts when particular conditions are met.
The novel semantics and programming model of smart contracts make it challenging to ensure their correct behavior. This makes them susceptible to bugs or vulnerabilities that may be exploited by other accounts in the Ethereum network. In fact, there has been a number of attacks on the Ethereum main network that caused the loss of millions of ETH. The most famous attack so far on Ethereum has been the attack on Decentralized Autonomous Organisation (DAO), which was conducted by exploiting the reentrancy vulnerability. As a result of this attack, 3.5 million ETH was stolen (about 50 M USD at the time) .
Reentrancy involves repeated calls to the same function (or a set of functions) before the first invocation is finished. Such nested invocations can cause the smart contract to behave in unexpected ways, which can be exploited by an attacker, usually to transfer funds away from the victim contract. Reentrancy is known as one of the most dangerous vulnerabilities in Ethereum smart contracts .
Existing tools to detect reentrancy vulnerabilities use complex code analysis and handcrafted rules to carefully analyze the control flow and asset transfers in smart contracts. At a transaction level, such attacks are not explicitly observable, though. Our work attempts a completely new direction:
We monitor transactions at runtime at the level of the Ethereum blockchain. A sample transaction trace that we gather data from for our machine learning model is presented in Figure 1. This monitoring does not require complex inspection of the smart contracts themselves and makes it possible to deploy our technique directly at the Ethereum block chain client, without any modification of the smart contracts or the client involved.
We use machine learning on the monitored transaction metadata. This avoids the need to design (possibly flawed) rules and also paves the way towards recognizing new types of vulnerabilities in the future.
Dynamit is designed to analyse transactions in smart contracts and report malicious ones. Our technique, when used with a random-forest model, showed high accuracy (96 %) on 105 transactions. We averaged our experimental results over ten iterations of a setting that used ten-fold cross-validation for the training and test phases of all machine learning models per iteration.
The rest of this paper presents our work in detail and is organized as follows: Section 2 explains smart contracts and reentrancy, and covers related work. Section 3 describes our approach. Our experiments and their results are described in Sections 4 and 5, respectively. Section 6 covers threats to validity; Section 7 concludes and outlines future work.
Smart contracts embody a novel programming model that includes a global shared state (managed in a decentralized way on a blockchain). The global state that stores everyone’s assets is manipulated automatically by the smart contracts, which are small programs that are expressed in a specific format, such as Ethereum bytecode . This bytecode is usually compiled from a high-level language, e. g., Solidity . The code is executed by a virtual machine, instruction by instruction. Each instruction also incurs a cost, measured in gas, which the invoker (user) of a smart contract has to pay. The VM manages the effects of the instructions and their cost on everyone’s assets on the blockchain. Potential vulnerabilities can arise at different levels in this architecture; reentrancy is generally regarded as one of the most severe ones [3, 2].
2.2 Reentrancy Vulnerability
Contracts in Ethereum can send Ether to each other. Whenever a contract receives a message without data that contains Ether and does not specify a function, a default unnamed function, the fallback function, is invoked. When there is a transfer of funds from contract to contract , control will be handed over to contract . In the time that has control, it can call back into any public function of contract , even the same function that issued the call to . This situation is called reentrancy.
A simple example of reentrancy is illustrated in Figures 2 and 3. Contract Vulnerable donates to a target contract (as specified by to parameter in donate). The intention is that a donation occur only once, but this is not checked in donate. An exploit is implemented by contract Attacker. Its startAttack function issues a call to donate in Vulnerable. After this call, Vulnerable transfers plain Ether to Attacker. At this point, control is passed to the fallback function of Attacker, which tries to call the donate function again. The donations will continue until Vulnerable runs out of gas or Ether, and because only the last invocation is reverted upon failure, the attacker effectively drains the victim of all its funds .
In order to prevent reentrancy, one could use function modifiers in solidity  to perform checks before giving the control to the fallback function of another contract. In case of our Vulnerable contract, a simple check before sending the Ether prevents the attacker from exploiting reentrancy. The safe version, NotVulnerable, checks and updates its state before sending Ether to the interacting contract (see Figure 4).
2.3 Related Work
Program analysis techniques to detect potential vulnerabilities can be divided into static analysis, which analyzes the structure of code without running it, and dynamic analysis, which analyzes the runtime behavior of an executing program. The advantage of static analysis is that it does not require a test case to reveal a flaw; conversely, it has the disadvantage that the analysis may be overly strict and reveal spurious problems that are not actually exploitable flaws during program execution. Dynamic analysis, on the other hand, always produces actual executions (and thus a witness of a real problem), but may be unsuccessful at finding the right inputs to make this happen. Combinations of these techniques also exist, typically in the form of static analysis to identify parts of the program that might need closer inspection at runtime.
Static analysis tools for smart contracts include tools like Securify , SmartCheck , and Slither . These tools check code against problematic patterns that may constitute violations of coding guidelines or even potential vulnerabilities. Symbolic execution is a static analysis technique that uses path conditions (conditions about the feasibility of certain execution paths) to reason about inputs that may reach a potentially unsafe state in a program. Oyente  was the first tool to apply symbolic execution to smart contracts. Other tools, such as TeEther , MAIAN , and Zeus , followed and took different approaches at finding harmful inputs or detecting problems in the contracts.
The above-mentioned static analysis tools use rules designed by experts to detect problems. Recent works have adapted the use of machine learning to static analysis  and extended this idea to the domain of smart contracts .
Dynamic analysis for smart contracts focuses on finding inputs that reach a program state exhibiting a problematic execution pattern (such as reentrancy), implemented in tools like Echidna , ContractFuzzer , and ReGuard . The accuracy of these tools depends on the quality of the hand-coded pattern recognizer. Recent work uses an oracle that tracks the balance of each smart contract instance to detect fundamental misuse, thus eliminating the need for specific program patterns to detect a vulnerability .
Our work leverages machine learning to detect problematic execution patterns, focusing on reentrancy in smart contracts. It is, to our knowledge, the first work to analyze dynamic execution patterns in smart contracts through machine learning. In the area of malware detection, a metadata-focused approach has also been used successfully  by looking at the frequency and size of packets sent over encrypted connections to classify the behavior of an application.
The Dynamit framework detects reentrancy vulnerabilities in deployed smart contracts without needing their source code. Dynamit considers only the dynamic behavior of the smart contract; that behavior is extracted from metadata describing the transactions between the contracts. This monitoring is based on the existing application programming interface (API) of the unmodified Ethereum blockchain client.
Dynamit consists of two parts (see Figure 5):
The Monitor, which observes transactions in the blockchain.
The Detector, which classifies behavior as benign or malicious.
The detector can be configured against various classifiers, which are first trained on a training set before our tool is put to use to detect malicious transactions in production.
The monitoring obtains the data as follows:
Subscription to the events emitted by the Ethereum client. These events are emitted when a transaction related to an account is issued . In our work, we use pendingTransactions to get any new transactions related to our accounts under observation.
Probing the blockchain at specific intervals until the desired information is retrieved. This is suitable for getting information about an already mined transaction or getting the state of a contract after an event.
The detector is the part of the system that distinguishes the harmful transactions from the benign ones. It consists of a part that processes and cleans that data received by the monitor, and a machine learning model that is trained as the monitor feeds in the data.
3.2.1 Extracted Features
The extracted features and the mechanism used to monitor them are presented in Table 1.
|Gas usage of transaction||Event subscription|
Contract 1 balance difference
Contract 2 balance difference
Average call stack depth
The contract balance difference feature is the difference of the balance of a contract before and after the transaction has taken place. In fact, the feature contract balance difference may easily be replaced by any other asset that is being transferred by the contracts to match the specific use case.
The average call stack depth is the only feature being directly retrieved from the transaction trace. Calling a regular function in a contract will not dramatically change the value of this feature. However, recursive external calls will drastically change this value. This is often the case for the reentrancy vulnerability, where a particular function in victim is recursively called until the attacker contract stops. Intuitively, this feature should have a positive correlation with the transaction being harmful. However, an attacker can easily avoid being detected by limiting the number of recursions. Therefore, we decided to put effort to randomly decrease the average call stack depth for harmful transactions to make it harder for the detector to distinguish them, and to decrease the bias of the model.
As mentioned earlier, as contract code executes on the blockchain, it consumes gas. Gas usage depends on the specific operations that a contract carries out within a transaction. Since a successful attack on a vulnerable contract may exhibit a specific execution pattern, we use this gas usage as a summary representation of the execution.
To find the best model, we trained and tested the following models in our detector:
: A random-forest (RF) classifier using bagging with 100 decision trees.
Naive Bayes (NB)
K-Nearest Neighbours: A K-NN with 5 neighbors.
Support Vector Machine (SVM): Both linear and polynomial kernels were used. The model with linear kernel out-performed the other one.
All of our models were built using Scikit-learn library . The Random Forest model is composed of 100 trees of type DecisionTreeClassifier in Scikit-learn.
3.3 Usage of Dynamit
Let us assume developers of an application deploy it as a smart contract on top of Ethereum. Dynamit can be used by these developers to safeguard their smart contract. They install Dynamit on their own machine, and configure it to connect to Ethereum network to monitor their deployed contract. As the transactions are issued to the monitored smart contract, Dynamit collects and processes their metadata. The previously trained machine learning model then classifies transactions as benign or harmful; the latter can be used as feedback to the developer or as part of a security information and event management (SIEM) system that may report users to an administrator or block a vulnerable contract from being used further.
|Service contract||User contract|
|13 robust contracts||11 benign contracts|
|12 vulnerable contracts||9 malicious contracts|
We chose 25 open-source contracts for our experiments that implement a certain functionality that we denote asservice contracts here. These contracts were originally used in . Their source code is available on Etherscan.111https://etherscan.io/ We wrote 20 user contracts that access and utilize that functionality (see Table 2).
A service contract may be robust (not exploitable) or contain a vulnerability; likewise, a user contract may be benign or malicious. Only a combination of a vulnerable service contract with a malicious user may actually reveal the vulnerability in the service contract.
For the experiment, we monitored a total of 105 transactions generated from these contracts, with 53 benign and 52 harmful transactions. All of these transactions have been labelled manually before starting the experiment, so they could be used for both training and testing a supervised model. We feed labelled transaction data to our classifier (offline) for the training phase; in production, online (unlabelled) data can be used.
From the 105 transactions, 25 transactions were taken from the 25 open-source service contracts, which we complemented with 20 variants of user contracts. The remaining 80 transactions are generated using two pairs of contract templates (four contracts) that generate both harmful and benign transactions randomly. Contract Vulnerable2 is one such variant of a service contract, which donates a random amount to the user (see Figure 6). To generate these random transactions, both the service and the user contracts (see Table 2) fuzz their behavior to represent different behaviors of real-world scenarios. Another reason for having random behavior (fuzzing) in both service and user contracts is that there may be a complex internal computation that has a certain call stack depth or gas usage. This potentially can make an attack harder to detect. We would like to have such behavior included in our data to have a less biased classifier in detector. Therefore, these transactions are generated in a way to prevent overfitting in the model. For example, we fuzz
the gas usage by injecting a random loop with 50 % probability in the vulnerable contract template (see lines 12–18 in Figure6). Each use of the counter expends extra gas. Likewise, we randomize the amount donated to the user and the number of times an attacker actually exploits reentrancy, to make it harder to recognize the attacks.
Since each interaction between a service and its user is either benign or harmful, the following outcomes can occur:
The user contract successfully exploits the reentrancy vulnerability: harmful transaction.
The user contract tries to exploit a reentrancy vulnerability (which may or may not exist in the service contract) but is unsuccessful. This will lead to one of the following situations:
The transaction and accordingly its effects on the target contract state are reverted by the Ethereum runtime environment. Such failed (reverted) transactions are not made visible through the monitoring API in Ethereum and therefore not taken into account by our analysis.
The transaction is not reverted, and takes the intended original effect: benign transaction.
The peer contract does not try to exploit reentrancy at all: benign transaction.
As mentioned earlier, after data is collected by the monitor, it will be fed in to the detector for classification. We trained and tested the models in detector using above-mentioned data. For all of our models, we used stratified 10-fold cross-validated training and test sets to get consistent and reliable results. For each number in the plots, the whole experiment (including the cross validation) has been performed 10 times, and the average performance was taken. The numbers of neighbors in K-NN model and number of trees in our RF model are chosen based on empirical observations to maximize the performance of the model.
We train and test five different types of classifiers and compare them based on the average false positive rate (FPR) and false negative rate (FNR) as well as accuracy, F1 score, and recall (see Figures 7 and 8). The FPR varies between 1.48 % (logistic regression) and 5.74 %(Naive Bayes), while the FNR is the lowest for the random forest (RF) model at 12.37 %.
The RF classifier achieves the highest accuracy (93 %). Most of the inaccuracy of the models can be attributed to the FNR. In other words, the detector is labelling a considerable number of harmful transactions as benign (even using RF). Conversely, the low FPR makes Dynamit useful as a monitoring tool in scenarios where the cost of false positives are rather high, such as in testing or when suspending problematic contracts in production for manual review.
As mentioned earlier, the contract sets we used for random transaction generation try to disguise their behavior. We took this measure to build a realistic model and decrease bias. As a result of this, the correlation of the average call stack depth and the label of the transaction is very low (see Figure 9). Hence, we decided to also build the same models without the average call stack depth feature. The results of this version of the models are shown in Figures 10 and 11. The overall behavior of all models is consistent with results in Figures 7 and 8. However, there are a few interesting changes. While RF is still the most accurate model and even more accurate than before, the relative reduction in the FPR of RF is higher than the one in the FNR. The highest average accuracy in this experiment belongs to RF (96 %).
6 Threats to Validity
We use a total number of 49 smart contracts (25 serivce, 20 user, and 4 random transaction generation contracts) in our experiments. In an effort to collect more realistic data, the harmful transactions issued by our own contracts are randomized to disguise their malicious nature. As mentioned in results section, this has rendered our otherwise important average call stack depth feature useless by making the average call stack depth of a harmful contract seem like a benign one, and vice versa. From a different point of view, this also shows the possibility of tricking a dynamic detector if it only uses checks on certain variables (such as the balance) to detect a vulnerability. Our results suggest that a combination of our machine learning-based detector and an oracle-supported dynamic vulnerability detection  may decrease the number of false negatives.
Another consideration is the amount of randomness that our randomly generated transactions have. In case there is not enough randomness, our machine learning model in detector will exhibit high bias, rendering it useless for catching more complex attacks. Our random transaction generator uses the block’s timestamp and difficulty to generate random numbers. Since we have used a private deployment of Ethereum blockchain, the mentioned variables were controllable by increasing the mining frequency and issuing transaction generation commands each 30 seconds. Using this method, we verified that the random transaction generation system has enough randomness.
7 Conclusion and Future Work
In this work, we present Dynamit, a dynamic vulnerability detection framework for Ethereum smart contracts. Dynamit detects vulnerable smart contracts by classifying harmful transactions in a blockchain using machine learning on transactional metadata. We achieve 96 % accuracy on a data set of 105 transactions.
To further develop Dynamit, we will investigate automatic test-case generation tools such as Vultron . Such tools can generate labeled transactions and create benign and malicious user contracts to reproduce them. Another direction for future work is to find more features to make the detection more accurate. An example would be to observe bookkeeping variables inside the contracts, and the way they change, as additional indicators of a smart contract being exploited. Finally, we will consider analyzing sequences of multiple transactions and applying other types of machine learning to the data, to increase the capabilities of our detector and to analyze other types of vulnerabilities as well.
-  (2020) Using features of encrypted network traffic to detect malware. In 25th Nordic Conference on Secure IT Systems, LNCS. Cited by: §2.3.
-  (2021) Eth2Vec: learning contract-wide code representations for vulnerability detection on ethereum smart contracts. arXiv preprint arXiv:2101.02377. Cited by: §2.1, §2.3.
-  (2017) A survey of attacks on Ethereum smart contracts (SoK). In Principles of Security and Trust, M. Maffei and M. Ryan (Eds.), Berlin, Heidelberg, pp. 164–186. External Links: Cited by: §1, §2.1, §2.2.
-  (2021) Cryptocurrency Prices, Charts And Market Capitalizations. CoinMarketCap. External Links: Cited by: §1.
-  (2017) Introducing Ethereum and Solidity. Vol. 1, Springer. Cited by: §2.1.
-  (2021) DASP - TOP 10. External Links: Cited by: §1.
-  (2021) Home. ethereum.org. External Links: Cited by: §1, §2.1, §2.2, §2.2.
-  (2019) Slither: a static analysis framework for smart contracts. In 2019 IEEE/ACM 2nd International Workshop on Emerging Trends in Software Engineering for Blockchain (WETSEB), pp. 8–15. Cited by: §2.3.
-  (2020) Echidna: effective, usable, and fast fuzzing for smart contracts. In Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis, pp. 557–560. Cited by: §2.3.
-  (2018) ContractFuzzer: fuzzing smart contracts for vulnerability detection. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, pp. 259–269. Cited by: §2.3.
-  (2018) ZEUS: analyzing safety of smart contracts. In The Network and Distributed System Security Symposium, Cited by: §2.3.
-  (2018) TeEther: gnawing at Ethereum to automatically exploit smart contracts. In 27th USENIX Security Symposium USENIX Security, pp. 1317–1333. Cited by: §2.3.
VulDeePecker: A Deep Learning-Based System for Vulnerability Detection. In Proceedings 2018 Network and Distributed System Security Symposium, San Diego, CA (en). External Links: Cited by: §4.
-  (2018) ReGuard: finding reentrancy bugs in smart contracts. In ACM/IEEE International Conference on Software Engineering, pp. 65–68. Cited by: §2.3.
-  (2016-10-24) Making Smart Contracts Smarter. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, CCS ’16, pp. 254–269. External Links: Cited by: §1.
-  (2016) Making smart contracts smarter. In ACM Conference on Computer and Communications Security, pp. 254–269. Cited by: §2.3.
-  (2018) Finding the greedy, prodigal, and suicidal contracts at scale. In Proceedings of the 34th Annual Computer Security Applications Conference, pp. 653–663. Cited by: §2.3.
-  (2011) Scikit-learn: machine learning in Python. the Journal of Machine Learning Research 12, pp. 2825–2830. Cited by: §3.2.2.
Recognizing functions in binaries with neural networks. In 24th USENIX Security Symposium (USENIX Security 15), pp. 611–626. Cited by: §2.3.
-  (2018) SmartCheck: static analysis of Ethereum smart contracts. In Proceedings of the 1st International Workshop on Emerging Trends in Software Engineering for Blockchain, pp. 9–16. Cited by: §2.3.
-  (2018) Securify: practical security analysis of smart contracts. In ACM Conference on Computer and Communications Security, pp. 67–82. Cited by: §2.3.
-  (2019-05) VULTRON: Catching Vulnerable Smart Contracts Once and for All. In 2019 IEEE/ACM 41st International Conference on Software Engineering: New Ideas and Emerging Results (ICSE-NIER), pp. 1–4. External Links: Cited by: §7.
-  (2020) Oracle-supported dynamic exploit generation for smart contracts. IEEE Transactions on Dependable and Secure Computing, pp. 1–1. External Links: Cited by: §2.3, §6.