The scalability aspect of public blockchains is extremely important for their success. All communities and industries engaged in the development of public blockchains strive to create solutions to support a large number of nodes and/or large workloads (in terms of transactions per second). While distributed systems or peer-to-peer systems can scale very well with respect to these parameters, blockchains, till now, have represented a very hard challenge. The difficulty comes from the interplay of contrasting requirement in blockchain design. Vitalik Buterin, founder of the Ethereum project, summarized his understanding about this problem by introducing the so called blockchain scalability trilemma. This trilemma states that regarding scalability, security and decentralization, any improvement in one of these aspects negatively impacts on at least one of the other two aspects.
Most of the scientific literature is focused on the scalability of the consensus algorithm without taking into account the impact of the other parts of the system. Consider an ideal, fully decentralized, blockchain, in which nodes are all equal (cpu, bandwidth, and storage, roles, etc.). Suppose to double the number of its users. Intuitively, we may assume that also the load and the number of nodes of the network are doubled. Hence, the ratio between load and processing resources is constant and scalability should, in principle, be possible. Now consider current blockchains. The usual approach is to broadcast all pending transactions and accepted blocks to all nodes. In this case, each node become a scalability bottleneck due to bounded processing power and bounded bandwidth of each node. The obvious question is if it is possible to design a blockchain in which all activities (consensus, storage, and communication) are shared (and not just replicated) across all nodes keeping security and decentralization intact.
Consider for example common committee-based blockchains (e.g., Algorand  or EOS). A single committee (and each single member of that committee) is processing all the transactions to be included in the next block. Then the block is broadcasted and all nodes should process all accepted transactions (at least they have to store them). There are several scientific works trying to provide better tradeoffs by means of some form of sharding. In these works, the blockchain network (comprising nodes, blocks, consensus, and transaction history) is partitioned in several smaller networks (shards), with some form of coupling among them. In these solutions, the most hard-to-solve problems are inter-shard transactions, which impair scalability if they are many, and the limited size of shards, which impairs security.
The main contribution of this paper is a novel blockchain architecture that does not rely on sharding and achieves scalability without giving up on decentralization and security. In our solution, transactions are processed in a pipeline. Each stage of the pipeline is performed by several committees, in parallel. We assume a solution, like that described in , in which storage is shared across nodes. In our approach each block contains only the hash of current state and transactions. Hence, we only broadcast a constant amount of data for each block. Also pending transactions are not broadcasted.
The rest of the paper is structured as follows. In Section II, we quickly review some related literature. In Section III, we provide basic terminology, definitions, and formally state our problem. In Section IV, we focus on scalability problems of current solutions. In Section V, we first present the main ideas of our solution and then we detail the tasks performed by committees. In Section VI, we formally prove correctness. In Section VII, we formally prove scalability. In Section VIII, we discuss the effectiveness of our approach and several other aspects. In Section IX, we draw the conclusions and discuss some related open problems.
Ii State of the Art
Concerning scalability, the most remarkable results are about some form of sharding, but all of them suffer, in different ways, from the strong partitioning of the whole blockchain. RapidChain  requires strong synchronous communication among shards which is hard to achieve. AHL  is an approach that turns out to be expensive when a transaction involves multiple shards. Omniledger  requires every participant in the consensus protocol (a subset of the nodes in each shard) to broadcast a message to the entire network to verify transactions. It also requires the users to participate actively in cross-shard transactions verification. Elastico  and Monoxide , require to execute expensive proof-of-work to decide which shard should process a transaction.
Many other proposals are targeted to scalability of public blockchains, like the EOS network that is analyzed in  or Algorand . However, they restrict their focus on the scalability of the consensus algorithm.
A fundamental role in blockchain scalability is played by Authenticated Data Structures (ADS) and related techniques, since they enable architectures in which nodes only have explicit knowledge of selected data while preserving security on the whole blockchain state. The works  and  propose to use authenticated data structures to scale with respect to the size of the blockchain state. The work in  shows a methodology for scaling the usage of authenticated data structures that make use of a pipeline. A way of having a persistent AD is described in . Uses of ADSes for efficient verification of data integrity in critical environments are described in [10, 14, 9].
Iii Basic Definitions and Problem Statement
In the following, we introduce some definitions and assumptions that are used in the rest of the paper.
We call candidate transactions those that are generated by users but are not (yet) processed by the blockchain. A confirmed transaction is a transaction that was successfully processed by the blockchain. The history of the blockchain is a totally ordered sequence of confirmed transactions. For simplicity, in the rest of the paper, we mostly focus on a simple model of blockchain that realizes a pure cryptocurrency. In other words, each address (or account) is associated to a wallet with a non negative balance. A transactions just moves currency from a wallet to another changing balances, accordingly. While the results of this paper may be applicable in a more powerful model, for simplicity, we do not consider more complex cases. The state of the blockchain is the balances of all addresses, at a certain instant. Confirmed transactions are totally ordered. The state of the blockchain is defined between two consecutive transaction. The sequence of confirmed transactions is partitioned into blocks that are sequentially numbered. Transactions are confirmed when a new block is created (or mined). The mining of a new block requires to (1) select a set of candidate transactions, (2) order them, and (3) verify that certain consensus rules are fulfilled when a transaction is applied to the previous state. In practice, consensus rules may be complex but, in our simplified model, they boil down to keep non-negative balances.
The nodes of blockchain network act as peers in the sense that they perform the same actions. They broadcast candidate transactions. Each node keep a set of candidate transactions it knows (also called pending transactions). Nodes communicate to perform a consensus algorithm (or consensus protocol) to reach consensus on the transactions to be included in the next block. This also implies a consensus on the state after the block (i.e., after the last transaction of the block).
The stream of candidate transactions is the workload (or simply load) of the blockchain and its magnitude is a frequency measured in transactions per seconds. In the following, we assume that accounts whose balance is changed by candidate transactions are homogeneously distributed on the address space. The time a candidate transaction takes to be confirmed is called (confirmation) latency. The maximum throughput of the blockchain is the frequency of candidate transactions that it is possible to confirm with bounded latency (i.e., avoiding indefinitely growth of the set of pending transactions). When workload is less than the maximum throughput we say that the blockchain is well-provisioned.
Since we are interested in investigating scalability of blockchains, we need tools to compare situations in which the same blockchain vary the number of nodes and the workload. We call proportional increment the situation in which the load and the number of nodes proportionally increase. In this paper, we consider all node to have always the same constant amount of resources in terms of CPU, storage, and network bandwidth. For simplicity, we also consider that all communications between nodes are instantaneous.
Intuitively, each new node that joins the blockchain network provides both additional load and additional resources. A scalable blockchain architecture is able to take advantage of this new resources to process the additional load.
More formally, we say that a blockchain architecture scales when starting from a well-provisioned blockchain, increasing load and nodes under the proportionality condition keeps the blockchain well-provisioned.
Iv Problems of Current Approaches
In this section, we list some relevant aspects that make current common public blockchains architecture not scalable (according to the scalability definition given in Section III).
In this paper we focus on the following problems.
New candidate transactions are always broadcasted to all (validating/mining) nodes.
There exists a set of (validating/mining) nodes (possibly comprising all nodes), each processing all candidate transactions that have to be included in the next block.
Each new block is broadcasted to all nodes.
Each of these aspects implies that, to well-provision the blockchain, individual nodes have to increase computing power and bandwidth even under proportionality condition.
We purposely avoid to mention scalability problems related to the computational complexity of the consensus protocol, since these three aspects are independent from it and are relevant even for blockchains that adopt light consensus protocols. We also avoid to mention the problem of storing the whole blockchain state in each node, which is already addressed in other works [1, 12].
In literature, most of the proposals that address the above scalability problems introduce some form of sharding, which is a way to partition the blockchain network and the transactions, in effect, creating a sort of federation of a multiplicity of blockchain networks. The sharding technique suffers of a number of problems. The most hard-to-solve ones derives from the fact that state is partitioned across shards. Hence, if shards are many, most transactions turn out to change the state of more than one shard. These are called inter-shard transactions. Clearly, each transaction should be atomic, which is not so simple to achieve in a sharded environment. Typically this introduces inefficiencies related to inter-shard communication (usually performed using some form of broadcast) and the need of techniques similar to a two-phase commit to ensure that all transactions executions are atomic. Another strong criticism is about security, since smaller shards are supposed to allow for better scalability but are deemed to be less secure than larger ones.
The contribution of this paper is the description of an architecture that aims at addressing the three above-mentioned scalability problems. We do this without relying on sharding, in the sense, in our case, the blockchain is one. However, we introduce a way of dynamically sharing the load among nodes. Our solution intends to apply a parallel version of the Algorand consensus approach . We also leverage the idea of distributing the storage of the state, as described in .
V A Scalable Blockchain Architecture
In this section, we describe an architecture that achieves scalability, as defined in Section III. We first informally describe ideas that make scalability possible in our architecture, then we list in detail all the components of the architecture and their behavior.
V-a Main Ideas
In our approach, there are a number of committees that works at the same time. They collectively perform the computation needed to validate and confirm transactions and to compute the new block. For simplicity, we assume that all committees are equal sized. Their size is fixed and does not change when the number of nodes in the network changes. Committee members may be fixed or may change periodically to increase security. A discussion about the impact of periodically changing the committees is provided in Section VIII.
Differently from the common approach, in our architecture, a block conceptually aggregate transactions and changes to the state, but this block content is never explicitly represented in the block. In fact, the transactions and state change related to a block are proportional to the workload. Forcing a node to receive all of them would impair scalability. Instead, for each new block, we only broadcast constant size data. We call block this constant size data. Our block can be considered equivalent in content to the block header of other traditional approaches. For our theoretical analysis, it is only relevant to know that the block contains a hash of the blockchain state after the application of all transactions of the block. In principle, it may also contain a hash of the transactions of the block. However, since this is not important in our analysis, we ignore it. The state hash is computed on the basis of a Merkle tree, hence we call it state root-hash. In the following, we explain how the computation of the state root-hash is shared across several committees.
In our description, we focus on how the state of the blockchain is stored. While storing transaction history is also possible, this is not important for our analysis. For scalability reasons, it is not possible for all nodes to store all the state. This is not because of the size of the needed storage (which we are not considering in our analysis), but because processing updates to the whole state would require an amount of resources proportional to the workload. Bernardini et al.  and Vault  propose approaches that do not require for all nodes to store the whole blockchain state and a node may independently participate in storage and/or confirmation activities. In the following, we refer to a node that stores a part of the state as a storage node. In the cited works, on the whole address space a (complete and binary) Merkle tree is considered in which each leaf is an address (comprising unused addresses). We denote this Merkle tree by . Storage nodes store only a part of the state and the corresponding part of , that covers all the paths from stored addresses to the root, pruning the rest of (see details in ). The root-hash of is the state root-hash.
As in  and , a node that intends to create a transaction has the responsibility to provide cryptographic proofs of the balances of the accounts that are going to be changed by the transaction (i.e., the involved accounts). These cryptographic proofs are asked by to storage nodes. Since each storage node stores a pruned Merkle tree, they are able to provide that proof. However, as will be clear in the following, the balances provided by storage nodes are related to a state that is delayed by a few blocks. The proof obtained from a storage node is related to a state of the blockchain after a certain block , intending that its is valid with respect to the state root-hash in . We also simply write that is related to . Since nodes store only a truncated list of blocks (see below), proofs that are too old cannot be validated and we say that they are expired.
Consider the computation needed to validate and confirm transactions and then to compute the new state root-hash to be included in a new block. The effort needed for this task is clearly proportional to the workload. To scale, it is important to distribute this computation over several committees. For this reason, we introduce a pipeline in which the computation is performed in several stages. Each stage is performed by one or more committees. In each stage the computation can be further distributed across several committees with a parallel approach.
We suppose that the time is spliced into equal length rounds. Rounds are sequentially numbered. In each round, each committee performs its task for a certain pipeline stage. The result of the computation is communicated by the committee to the committees that need it for the next pipeline stage in the next round. A discussion on the problems of this communication is provided in Section VIII. For simplicity, we assume that all communications are instantaneous.
We denote by the block produced as output of the last stage in round . We denote the block that contains the transactions that entered the pipeline at round by . If the pipeline has stages, the transactions that enter the pipeline at round , and that are accepted, will be part of the block produced as output by the last stage that runs at round . Hence, we have that . The first round in which can be used by any node is round .
Each produced block is propagated to all nodes. Nodes use the state root-hash of the block to validate balance proofs related to that block. Each storage node replies only with proofs related to already produced blocks. For this reason, the proofs that are associated with candidate transactions are old by at least rounds when they are processed by the first stage where they are checked for validity.
Truncated block history
As in , we assume each node does not keep all the blocks, but only the last blocks received. That is, at round , each node stores blocks and previous blocks are forgotten. A proof related to is expired at round if (i.e., nodes have forgotten needed to validate ). Since the block is available in round , a storage node replies with proofs related to only during that round. For the nodes participating in committees of the first pipeline stage to be able to verify balance proofs, it should be .
V-B The architecture
In Figure 1, we show the proposed architecture and the flow of information within it. We describe it from left to right.
Any node can create a candidate transaction. As described above, a new candidate transaction should come with balances of the involved accounts and with corresponding proofs of integrity, related to a previous round. This can be obtained from a storage node. Candidate transactions are not broadcasted into the network (nothing is broadcasted in our approach but the constant size blocks), but are sent to a limited number of nodes as described below.
The validation of the set of transactions that have to be included in a block is performed by Confirmation Committees (CC). We denote each distinct CC by with , where is the number of CCes. When relevant, we write intending to denote the -th confirmation committee that runs in the -th round. The node that creates a new transaction sends it to , with , where is the account whose balance is charged by . We intend that is received by before the start of round and hence can process it during round . We say that is responsible for that transaction. The set of candidate transactions for which is responsible is denoted . We denote by the set of candidate transactions processed by any confirmation committee in round . The result provided by is a sequence of transaction denoted , with .
A fundamental aspect of the algorithm performed by is to obtain, for each transaction, the status of the source balance at round to compute the new balances at round with the transaction accepted for . Since proofs attached to transactions are related to rounds from to , they always have from transactions balances that are verified but old. In fact they can be outdated by transactions accepted in the last rounds for which the corresponding block is not yet available. Hence, each should also be aware of state changes induced by transactions accepted by , which are , respectively. These transactions are considered to update all balances involved in to their status at round . We call time-updating this process.
In our model, performs the following algorithm (by a suitable consensus protocol).
Algorithm 1 (Confirmation).
It checks that each transaction in fulfills syntactic rules and proofs are not expired. It discards non-compliant transactions.
It selects an arbitrary order for .
Let be the concatenation of . For each account that appears as source in transactions of , consider the last balance from and from the balances provided by .
It executes and checks that the resulting balance of each transaction fulfills the non-negative balance rule. Transactions the do not fulfill this rule are discarded. The resulting is derived from where discarded transactions are omitted.
Transactions in should be considered confirmed (or accepted) in the sense they have passed all checks to be inserted in . To allow the confirmation committees of subsequent rounds to perform time-updating, is made available to and also to other committees, as explained in the following.
The sequence of accepted transactions for that round is denoted , where is an arbitrary sequence that respects the order of each .
Even if is yet to be computed, storage nodes can receive from state changes that will be part of . Transactions in are selectively sent to the storage nodes that need it, to update the part of the state they manage.
The actual computation of requires to compute the state root-hash at round , which means computing all the hashes of the conceptual Merkle tree related to the whole state space. This is performed by committees, called Root-hash Pipeline Committees (RPCes). Each RPC is associated to a part of as shown in Figure 2, called underlying tree of the RPC. Each underlying tree is rooted to a node whose hash is named sub-root hash. RPCes themselves form a tree denoted by . Upon state changes, each RPC is responsible to compute all hashes for its underlying tree. There are two kinds of RPCes. Leaf RPCes, that are the leaves of , and inner RPCes, that are all other RPCes. Each leaf RPC is responsible for the interval of contiguous addresses that are leaves of its underlying tree. Since most addresses are unused, leaf RPCes consider a pruned version of the underlying tree containing only the paths from its root to used addresses. The underlying tree of inner RPCes are complete binary trees. RPCes are partitioned in levels numbered from to . Level contains all leaf RPCes. Level contains only the root of . Each level constitutes one stage of the pipeline. Hence the total number of stages of the pipeline is . Each RPC at level computes a sub-root-hash that is fed as input to its parent in . The root of outputs and broadcasts the new block with the corresponding state root-hash. Theorem 2 of Section VII states that it is possible to dimension the underlying tree of leaf and inner RPCes to ensure scalability.
We write to denote a generic leaf RPC that runs in the -th round. As mentioned above, the leaves of represent the second stage of our pipeline. This means that they receive the output of the first stage. Let be the current round, they receive from the CCes. In particular, each receives all and only the transactions in that modify the balance of an address for which it is responsible. Hence, a single accepted transaction is sent to two leaf RPCes. More in detail, If is a transaction in , sends to only if the source or the destination of is an address for which is responsible. We denote with the set of transactions received by from CCes at round .
A fundamental aspect of the task performed by (i.e. calculate the sub-root hash of its underlying tree) is to obtain, the status of underlying tree at round to compute the sub-root-hash at round with the transaction in . Since is a subset of , proofs attached to transactions are related to rounds from to , they cannot be used alone to computes all hashes of the underlying tree at round . In fact they can be outdated by transactions accepted in the last rounds for which the corresponding block is not yet available. Hence, each should also be aware of state changes induced by transactions received by (all responsible for the same addresses interval), that is , respectively. These proofs of these transactions are considered, according to their order, to calculate all hashes of the pruning underlying tree related to the state at round . We call time-shifting this process. To allow the leaf RPCes of subsequent rounds responsible for the same address interval to perform time-shifting, is made available to (all responsible for the same addresses interval). Since is a subset of , in our protocol the CCes that executes at round makes available to .
In this section, we formally prove the correctness of the architecture introduced in Section V.
The following lemma state the correctness of Algorithm 1 when run on only one committee.
Lemma 1 (Correctness of the confirmation algorithm).
Algorithm 1 never return a sequence that entails a violation of the non-negative balance rule.
Theorem 1 (Correctness).
Given a set of transactions processed, at round , by confirmation committees producing accepted transactions sequences , the following statements are true.
In any sequence such that respects the order of each , the non-negative balance rule is respected.
The state root-hash that is the output of the last stage of the pipeline at round is the root-hash of the new state after the application of .
Storage nodes knows the proofs of the addresses they store.
Concerning Statement 1, observe that satisfy the non-negative balance rule (Lemma 1) and their order is preserved by hypothesis in . Since for each the addresses charged in are not charged in any with , the statement follows.
Concerning Statement 2, note that each leaf RPC considers all the transactions that involve addresses for which is responsible that are present in sequences , respecting their order. RPC can correctly compute its sub-root-hash to pass to its parent RPC. In fact, if an internal node of its underlying tree is involved in transaction, receives the proofs attached with the transaction. If an internal node of its underlying tree is not involved in transaction either it is pruned or it is a root of a pruned tree. In the first case, does not need it. In the latter case, receives its hash in one of the proofs available to it. Since, internal RPCes always receive, form their children, the hash values for all the leaves of their underlying tree and their computation is trivial, the statement follows.
Concerning Statement 3, note that RPCes compute the root-hash on the basis of a pruned version of , where leaves of are all used addresses . Each storage node stores a pruned version of , where leaves of are all addresses that intends to store. Since , also . Hence, all sub-root-hash of pruned subtrees in are known to one of the RPCes, which can communicate it to . ∎
In this section, we formally prove the scalability of our approach. For real systems, the workload is usually characterized by probability properties. For simplicity, in our statements and proofs, we reason as if the workload were deterministic.
We start by introducing some notation. We denote by the frequency of transactions of our workload. We denote by the duration of a round. We denote by the number of addresses whose balance changes in each round. We denote by the maximum number of balance changes that a leaf RPC can process in one round. As explained in Section V-B, the underlying tree of an inner RPC is a complete binary tree. Suppose that an RPC can compute hashes in one round. The maximum number of nodes in the underlying tree of an inner RPC is , where is the largest possible integer such that . We can now state the following lemma about the number of RPCes.
The total number of RPC in the architecture described in Section V is , where the first term is related to leaf RPCes and the second term is related to inner RPCes.
The number of leaf RPCes is , which is the first term of the formula. Now, consider the Merkle tree , defined in Section V, and remove all the underlying trees of leaf RPCes but their roots, call this tree. Note that is complete and binary with leaves. A complete binary three with leaves has nodes, of which are non-leaf nodes. Hence, the number of inner nodes in is . Therefore the number of inner RPCes is , which is the second term of the formula, hence the statement holds. ∎
The following theorem states the scalability of the approach described in Section V, when nodes and workload are proportionally incremented.
Theorem 2 (Scalability).
Consider a system , with nodes, realized as described in Section V, and well-provisioned for a workload at frequency , under the assumption that the addresses involved in the transactions of its workload are homogeneously distributed across the address space.
It is possible to provide a system , with nodes and , that is well-provisioned for a workload at frequency , under the same assumptions for the workload.
In and , committees have the same processing capabilities that are enough to process the load of . We want to prove that it is possible to have a number of committees in to process its load. We first derive the needed number of CCes in compared to that of and then we do the same for RPCes. Then, we show that they are compatible with the increment of the nodes.
A workload at frequency , generates transactions per round. Let be the number of CCes in . Since is well-provisioned, each CC is able to process transactions per round. The load of is , hence, with CCes, we obtain in the same CC load as in . Note that, the fact that each have to re-process transactions accepted in a constant number of previous rounds does not impact this reasoning.
In and , for each round, each leaf RPC can process balance changes and each inner RPC has an underlying graph of as defined before. Since is well provisioned, for Lemma 2, the number of RPCes in is , where . Analogously, for to be well-provisioned, the number of its RPCes must be . In both and the first term (related to leaf RPCes) is largely dominating, since is supposed to be large. Hence, . Consider also that effects of the ceiling function can be compensated by a small overprovisioning of .
Since the increment of the number of CCes and of the number of RPCs from to is by a factor of , and has nodes, the statement holds. ∎
In this section, we discuss the effectiveness or our approach and certain aspects that are not analyzed in the rest of the paper.
Effectiveness with respect to the blockchain trilemma
It is important to understand if the proposed approach is a better solution to the blockchain trilemma than the previous ones. This means understanding if scaling requires to limit security and/or decentralization. The scalability of our appraoch has been formally proved in Section VII. Concerning decentralization, note that in our system all nodes cooperate in the creation of a new block. Further, even if the committees do not have all the same role, we can assign nodes randomly to each of them (like, for example, in ), making their role homogeneous when considered over a period time. While this is an informal consideration, we believe that our approach does not affect decentralization when scaling. Considering security, note that many other research works and practical systems relay on the security of a consensus algorithm run by a restricted set of nodes forming a committee. The fact that in our case we have several committees, does not impact security, as far as consensus algorithm is highly robust. Further, many attacks require the attacker to control the majority of a committee. Random selection makes this more and more difficult as the number of nodes increases. In this sense, security increases when scaling to higher number of nodes. Clearly, security is about many other aspects, but the difficulty to subvert the consensus is usually considered in the context of discussions about the blockchain trilemma.
Committees members selection
Our approach is independent on the way members of each committee are selected. Members can be static or change regularly. Their selection can be done using a public shared source of randomness or using verifiable random functions, as in . However, there is a caveat regarding this in our approach. Since intermediate results of the pipeline are passed to committees that need them in the next few rounds, if members of committees change, these have to be decided and published before data is sent to them. Note that, resorting to broadcast is not possible since this would impair scalability.
In our approach, often communications involve a multiplicity of nodes as destination of the message. At the same time, we need to avoid broadcast to scale with respect to network resources of the nodes, that we assumed to be constant. We note that the destination of messages are always bounded in number, hence, scalability can be theoretically obtained by resorting to unicast communications, provided that the set of the destinations are known when the message should be sent. However, most of the communication needs in our approach seems to be suitable for a multicast delivery. However, the use of standard multicast techniques may not completely suite our needs. In particular, we the following aspects may be worth noting. 1. the group of the receivers of a multicast channel might change rapidly, depending on the round duration. 2. Preparation of the multicast groups can be performed in advance with respect to when they are needed. 3. Multicast groups may be needed for only one round and then discarded. This might simplify the development of a specific technique for this application. 4. The number of needed multicast channels might be huge, for example one for each used address. A complete list of the requirements of the features needed by each communication used in our approach is beyond the scope of this paper and it is leaved for an extended version of the paper.
In Section V, we often relayed on the possibility for a committee to communicate data to other committees that needs them in the next rounds. Inter-committees communications should be part of the consensus protocol, in the sense that each receiver should be able to check a quorum to accept a communication as “coming from a committee”. This might imply a quadratic number of communication in the size of the committees. While practically this might be a problem, from a theoretical point of view this is not a problem, since the size of committees is supposed to be constant.
Synchronization and committee decision failing
In our description, we essentially assumed a sort of global synchronization. In practice, synchronization spread across a large number of nodes (although partitioned in committees) might be difficult to achieve. The problem of modifying our approach to relax synchronization requirements is left as a future work. A related problem is the failure of a committee to reach an agreement, which is unlikely to for a direct attack, but may occur under large network faults. Again, we left the investigation of these aspect as a future work.
Ix Conclusions and Open Problems
We showed a novel blockchain design that shares processing load for the next block on many parallel executing committees, potentially covering all nodes, and avoid broadcast in all cases where this impairs scalability. We provided formal proof of the scalability of our approach and of its correctness. We also discussed how scaling does not impair decentralization and security providing a solution to the well known blockchain scalability trilemma.
We think that our contribution may stimulate interesting future works. From a theoretical point of view, inter-committee communications and synchronization should be better studied to achieve a solution that is efficient and usable in practice. From a practical point of view, an experimentation or simulation with realistic parameters would be desirable.
We extremely grateful to Ciro Oliviero for his important contribution in the beginning of this research.
-  Matteo Bernardini, Diego Pennino, and Maurizio Pizzonia. Blockchains meet distributed hash tables: Decoupling validation from state storage. In Paolo Mori, Massimo Bartoletti, and Stefano Bistarelli, editors, Distributed Ledger Technology Workshop (DLT 2019), volume 2334, pages 43–55, 2019.
-  Vitalik Buterin and Virgil Griffith. Casper the friendly finality gadget. arXiv preprint arXiv:1710.09437, 2017.
-  Jing Chen, Sergey Gorbunov, Silvio Micali, and Georgios Vlachos. Algorand agreement: Super fast and partition resilient byzantine agreement. Cryptology ePrint Archive, Report 2018/377, 2018. https://eprint.iacr.org/2018/377.
-  Jing Chen and Silvio Micali. Algorand: A secure and efficient distributed ledger. Theoretical Computer Science, 2019.
-  Kyle Croman, Christian Decker, Ittay Eyal, Adem Efe Gencer, Ari Juels, Ahmed Kosba, Andrew Miller, Prateek Saxena, Elaine Shi, Emin Gün Sirer, Dawn Song, and Roger Wattenhofer. On scaling decentralized blockchains. In Jeremy Clark, Sarah Meiklejohn, Peter Y.A. Ryan, Dan Wallach, Michael Brenner, and Kurt Rohloff, editors, Financial Cryptography and Data Security, pages 106–125, Berlin, Heidelberg, 2016. Springer Berlin Heidelberg.
-  Hung Dang, Tien Tuan Anh Dinh, Dumitrel Loghin, Ee-Chien Chang, Qian Lin, and Beng Chin Ooi. Towards scaling blockchain systems via sharding. In Proceedings of the 2019 International Conference on Management of Data, pages 123–140. ACM, 2019.
-  Yossi Gilad, Rotem Hemo, Silvio Micali, Georgios Vlachos, and Nickolai Zeldovich. Algorand: Scaling byzantine agreements for cryptocurrencies. In Proceedings of the 26th Symposium on Operating Systems Principles, SOSP ’17, pages 51–68, New York, NY, USA, 2017. ACM.
-  Ian Grigg. Eos-an introduction. Whitepaper) iang. org/papers/EOS_An_Introduction. pdf, 2017.
-  Federico Griscioli and Maurizio Pizzonia. Securing promiscuous use of untrusted usb thumb drives in industrial control systems. In Proceedings of the 14th Annual Conference on Privacy Security and Trust (PST 2016), pages 477–484, 2016.
-  Federico Griscioli, Maurizio Pizzonia, and Marco Sacchetti. Usbcheckin: Preventing badusb attacks by forcing human-device interaction. In Proceedings of the 14th Annual Conference on Privacy Security and Trust (PST 2016), pages 493–496, 2016.
-  Eleftherios Kokoris-Kogias, Philipp Jovanovic, Linus Gasser, Nicolas Gailly, Ewa Syta, and Bryan Ford. Omniledger: A secure, scale-out, decentralized ledger via sharding. In 2018 IEEE Symposium on Security and Privacy (SP), pages 583–598. IEEE, 2018.
-  Derek Leung, Adam Suhl, Yossi Gilad, and Nickolai Zeldovich. Vault: Fast bootstrapping for the algorand cryptocurrency. In NDSS, 2019.
-  Loi Luu, Viswesh Narayanan, Kunal Baweja, Chaodong Zheng, Seth Gilbert, and Prateek Saxena. Scp: A computationally-scalable byzantine consensus protocol for blockchains. See https://www. weusecoins. com/assets/pdf/library/SCP, 20(20):2016, 2015.
-  E. Etchevès Miciolino, D. Di Noto, F. Griscioli, M. Pizzonia, J. Kippe, X. Clotet, G. Leòn, F.B. Kassim, D. Lund, and E. Costante. Preemptive: an integrated approach to intrusion detection and prevention in industrial control systems. International Journal of Critical Infrastructures (IJCIS), 13(2/3):206–236, 2017.
-  Diego Pennino, Maurizio Pizzonia, and Federico Griscioli. Pipeline-integrity: Scaling the use of authenticated data structures up to the cloud. Future Generation Computer Systems, 2019.
-  Diego Pennino, Maurizio Pizzonia, and Alessio Papi. Overlay indexes: Efficiently supporting aggregate range queries and authenticated data structures in off-the-shelf databases. Tech. Report arXiv:1910.11754, Cornell University, 2019.
-  Jiaping Wang and Hao Wang. Monoxide: Scale out blockchains with asynchronous consensus zones. In 16th USENIX Symposium on Networked Systems Design and Implementation (NSDI 19), pages 95–112, 2019.
-  Brent Xu, Dhruv Luthra, Zak Cole, and Nate Blakely. Eos: An architectural, performance, and economic analysis, 2018.
-  Mahdi Zamani, Mahnush Movahedi, and Mariana Raykova. Rapidchain: Scaling blockchain via full sharding. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, pages 931–948. ACM, 2018.