On transaction parallelizability in Ethereum

by   Nadi Sarrar, et al.

Ethereum clients execute transactions in a sequential order prescribed by the consensus protocol. This is a safe and conservative approach to blockchain transaction processing which forgoes running transactions in parallel even when doing so would be beneficial and safe, e.g., when there is no intersection in the sets of accounts that the transactions read or modify. In this work we study the degree of transaction parallelizability and present results from three different simulations using real Ethereum transaction data. Our simulations demonstrate that notable gains are achievable with parallelization, and suggest that the potential for parallelizability improves as transaction rates increase.


page 1

page 2


TSGN: Transaction Subgraph Networks for Identifying Ethereum Phishing Accounts

Blockchain technology and, in particular, blockchain-based transaction o...

Temporal-Amount Snapshot MultiGraph for Ethereum Transaction Tracking

With the wide application of blockchain in the financial field, the rise...

The Binary Vector Clock

The Binary Vector Clock is a simple, yet space-efficient algorithm for g...

On the Just-In-Time Discovery of Profit-Generating Transactions in DeFi Protocols

In this paper, we investigate two methods that allow us to automatically...

Babel Fees via Limited Liabilities

Custom currencies (ERC-20) on Ethereum are wildly popular, but they are ...

Using SGX for Meta-Transactions Support in Ethereum DApps

Decentralized applications (DApps) gained traction in the context of the...

Ethanos: Lightweight Bootstrapping for Ethereum

As ethereum blockchain has become popular, the number of users and trans...

1 Methodology

We use a modified version of go-ethereum v1.8.19111https://github.com/nadisarrar/go-ethereum/ which maintains for each transaction a record of all accounts which are relevant during its execution, including the From account, the To account (except for create transactions), all accounts created directly or indirectly, and all accounts that the transaction interacts with, including value transfers, method invocations, and any access to state such as balance and codeHash.

We feed these transaction records into three different simulators: one that is based on transaction graph properties, which we call largest cluster, and two transaction schedulers, simple and HEFT. We rely on gasUsed

as an estimator of transaction processing times – this is not always accurate but produces repeatable results which are independent of specific client implementations, hardware capabilities, network performance, etc.

1.1 Largest Cluster

Transactions are represented as vertices in a graph, with edges between a pair if and only if they have at least one account in their access lists in common. The largest cluster metric is based on the largest (in total gasUsed) disjoint set of transactions in that graph. We calculate the total processing time as the time it takes to sequentially process that largest disjoint subset222There may be transactions belonging to the same transitively connected group that can safely be processed in parallel. The largest cluster simulation therefore leaves room for optimization.. We assume that there are infinite threads available so that we can allocate a dedicated thread to each remaining subset and therefore make sure that the smaller sets finish no later than the largest one. This metric serves as a point of comparison for the other metrics that are limited in the number of available threads.

1.2 Simple Scheduler

The simple scheduler processes transactions in batches of size equal to the number of threads. Batches are allowed to contain only non-conflicting transactions. This means that batches may contain fewer transactions than there are threads available, and a batch only completes when its longest running transaction completes. For these reasons some threads may be left underutilized. Since the dataset contains access lists for each transaction, the scheduler can determine a schedule that is guaranteed to not conflict.

1.3 HEFT Scheduler

HEFT [5]

is a heuristics-based task scheduling algorithm that minimizes overall completion time. The implementation of HEFT used in this study is available at 

[6]. We configure HEFT to use the consensus ordering of transactions as well as their access lists as precedence constraints, and the gasUsed metric as an approximation of processing time. Unfortunately, the python implementation of HEFT used in our experiments failed for some large blocks. As a work around, we compute schedules for at most 32 transactions at a time, which means that larger blocks require multiple runs of HEFT. We impose the same 32 transaction limitation in the simple and largest cluster experiments as well to ensure a fair comparison.

(a) Comparison of largest cluster and simple.
(b) Comparison with a limit of 32 transactions per schedule.
Figure 1: Parallelization gain as achieved by largest cluster, simple, and HEFT.

2 Results

In Figure 1(a) we compare the performance of the largest cluster approach to that of the simple scheduler approach. There are three main take-aways. The direction of the LOESS regressions (solid lines) suggests that the potential for parallelizability improves over time. However, we observe a declining performance in 2018 which appears to coincide with a decreasing transaction rate. We speculate that the more transactions there are, the better they can be parallelized, due partly to there being little room for parallelization if a block contains few transactions, and partly to an increase in the number of popular accounts (smart contracts) and disjoint user groups generating distinct clusters of transactions. Lastly, while the simple scheduler performs worse than largest cluster, it still shows promising gains overall.

Figure 1(b) compares the HEFT scheduler to the other two simulations. These experiments use a cap of 32 transactions per schedule because of a limitation in the HEFT implementation (Sec. 1.3). This results in decreased performance of largest cluster as well as simple. The overall trends however remain largely the same. HEFT outperforms simple, reaching the same levels as largest cluster.

In summary, while average thread utilization remains modest compared to the number of available threads, all three simulations demonstrate that notable gains are achievable with parallelization.

3 Future work

This study assumes that we have future knowledge of the accounts that a transaction touches. While we may eventually be able to use access lists [2] in an upcoming version of Ethereum for that purpose, further research on scheduling algorithms which relax this assumption will be needed.

In case of HEFT, we further assume that gasUsed serves as an approximation of a transaction’s processing time. This is not feasible in practice because gasUsed only becomes available after a transaction was run, as part of the transaction receipt. Future research may investigate the (un)predictability of a transaction’s processing time.

This work makes no claims as to where resource bottlenecks in Ethereum currently lie. We hope our work complements studies investigating constrained resources such as networking and disk I/O, as well as work on alternative concurrent state database data structures.

Finally, we hope this research direction will prove useful in the current phases of Ethereum Serenity development, where accounts are assigned to shards (groups), and cross-shard communication (messages between groups) incurs some overhead. Intuitively, a greater parallelizability may suggest a lesser need for cross-shard communication, depending on our ability to correctly identify the clusters in real time as accounts are created and assigned to shards, and the rate at which those assignments change.