Blockchains are one of the hottest topics in modern distributed transaction processing. However, from the perspective of database research, one could raise the question: what makes these systems so special over classical distributed databases, that have been out there for a long time already?
The answer lies in byzantine fault tolerance: while classical distributed database systems require a trusted set of participants, blockchain systems are able to deal with a certain amount of maliciously behaving nodes. This feature opens lots of new application fields such as transactions between organizations, that do not fully trust each other. Unfortunately, ensuring BFT over all nodes of the network also heavily complicates transaction processing. If any node of the network is considered to be potentially malicious, a complex consensus mechanism is required to ensure the integrity of the system. This consensus mechanism assures, that a transaction can only commit, if a majority of the network agrees to it.
Of course, the expensiveness of the consensus mechanism has also been observed by the blockchain engineers. Therefore, some systems trade BFT with performance by simply assuming certain parties of the network to be trustworthy. A great example for this is Hyperledger Fabric , a popular open-source blockchain system introduced by IBM. In terms of BFT, it differs from other major players such as Bitcoin or Ethereum in two additional assumptions: First, Fabric assumes that the ordering service, which globally orders all transactions that go through the system, is trustworthy. Second, it allows for the forming of so called organizations. Within an organization, it is assumed that all peers trust each other. These two assumptions heavily simplify transaction processing. First of all, no complex consensus mechanism, such as PBFT , is necessary. Second, the trust within an organization allows for a distribution of work within it and enables parallelism, as not every peer in the organization has to execute every transaction.
With this relaxed view on BFT in mind, can we actually still consider Fabric a true blockchain system? A trustworthy ordering service, which globally arranges and schedules transactions, is a component that is present in classical distributed database systems as well. Further, the concept of an organization, in which all peers trust each other, is also present in distributed databases in its extreme form: all peers belong to a single organization. Besides of that, other core requirements of transaction management, such as ensuring transaction isolation or managing the data in a store, are essentially present one-to-one in both blockchains and database systems.
At the example of Fabric, it becomes obvious that conceptually the lines between blockchain systems and distributed database systems are rather blurry. We believe this blurriness should be seen as a chance for the database community: Due to all these conceptual similarities, it becomes possible to transition well-understood database technology to the world of blockchains, significantly enhancing this new technology.
The question remains which similarities can be exploited to transition database technology to Fabric and by how much can we improve on the state-of-the-art? To tackle this problem, we perform the following steps in this work:
To have a basis for the discussion, we first inspect the transaction flow of Hyperledger Fabric in the latest version 1.2 from a conceptual perspective. Fabric will serve as our case-study for the rest of the paper on how to ”databasify” a blockchain system (Section 2).
Based on the analysis of the transaction flow in Fabric, we then inspect its components, that show the highest resemblance with those of database systems. We identify weaknesses in the implementation of Fabric of these components and describe, how database technology can be utilized to counter them. (Section 3).
We transition database technology to the transaction pipeline of Fabric. Precisely, we first improve on the ordering of transactions. By default, the system orders transactions arbitrarily after simulation, leading to unnecessary serialization conflicts. To counter this problem, we introduce an advanced transaction reordering mechanism, which aims at reducing the number of serialization conflicts between transactions within a block. This mechanism significantly increases the number of valid transactions, that make it through the system and therefore the overall throughput (Section 4.1).
Next, we advance the abort of transactions. By default, Fabric checks whether a transaction is valid right before the commit. This late abort unnecessarily penalizes the system by processing transactions, that have no chance to commit. To tackle this issue, we introduce the concept of early abort to various stages of the pipeline. We identify invalid transactions as early as possible and abort them, assuring that the pipeline is not throttled by transactions that have no chance to commit eventually. A requirement for this concept is a fine-grained concurrency control mechanism, by which we extend Fabric as well (Section 4.2). These modifications significantly extend the vanilla Fabric, turning it into what we call Fabric++.
We perform an extensive experimental evaluation of the optimizations of Fabric++ under a custom blockchain benchmark simulating an asset transfer scenario. In total, we evaluate the transactional throughput under different configurations of Fabric and Fabric++ and show that we are able to significantly boost the performance over the vanilla version. Additionally, we vary the number of channels and clients and show, that our optimizations also have a positive impact on the scaling capabilities of the system (Section 5).
2 Hyperledger Fabric
Before diving into the conceptual similarities and differences between Fabric and distributed database systems, we have to understand the workflow of Fabric. Let us describe in the following section how it behaves in the latest version 1.2.
2.1 High-level Workflow
At its core, Fabric follows a simulate-order-validate-commit workflow, as shown in Figure 1:
(1) In the simulation phase, a client submits a transaction proposal to a specified subset of the peers, called the endorsement peers or endorsers. The endorsers simulate the effects of the transaction proposal against a local copy of their current state. Interestingly, none of the writes become durable in the current state at this point. If the endorsers endorse the transaction proposal, an actual transaction is formed from the execution result, that is then sent to the ordering service (via the client).
(2) In the ordering phase, the ordering service establishes a global order among all received transactions and distributes the ordered transactions at the granularity of blocks to all peers of the network.
(3) In the validation phase, all peers individually validate the transactions within the received blocks in terms of endorsement policy and serializability.
(4) In the commit phase, the blocks are appended to the local ledger and the changes made by the valid transactions are applied to the current state.
Following these four phases assures that each honest peer stores the same transaction sequence.
Fabric is a permissioned blockchain system, meaning all peers of the network are known at any point in time. Peers are grouped into organizations, which typically host them. For example, two companies trading with each other could each host an organization of machines, forming a network of peers. Within an organization, all peers trust each other.
Each peer runs a local instance of Fabric. This instance includes a copy of the ledger, containing the ordered sequence of all transactions that went through all four phases. This includes both valid and invalid transactions. Apart from the ledger, each peer also contains the current state in form of a state database. The current state can be seen as an optimization of the ledger: while the ledger simply contains the sequence of all processed transactions, the current state represents the state after the application of all valid transactions in the ledger to the initial state. Fabric implements the current state in form of a versioned key-value-store. For every key in the store, a pair of value and version-number is kept, where the version-number111The version number is actually composed of transaction-ID and the block-ID, see Section 4.2.1 for details. counts the number of changes that already happened to the value of this key.
Apart from the peers, which play an important role both in the simulation phase and the validation phase, there is a separate instance called the ordering service, which is the core component of the ordering phase and assumed to be trustworthy. Although it can be composed out of multiple machines for fault tolerance, it is a central service responsible for establishing a global order among all transactions.
2.3 Running Example
With the basic components of the architecture in mind, let us now discuss how transactions flow through the system. To do so, we present a simple running example in Figure 2 (simulation phase), Figure 3 (ordering phase), and Figure 4 (validation and commit phase), where two organizations and want to transfer money between each other.
Each organization contributes two peers to the network. The balances of the organizations are stored by two variables and , where stores the value in its current version and stores in version . We can also see that the ledger already contains six transactions to , where the four transactions , , , and were valid ones and lead to the current state. The transactions and were invalid transactions. They are still stored in the ledger, although they did not pass the validation phase.
2.4 Simulation Phase
Transaction processing starts with the simulation phase in Figure 2. In step , a client proposes a transaction proposal (or short proposal) to the system. In our example, the proposal intends to transfer the amount of from to . The two involved operations -= and += are expressed in a smart contract222Smart contracts are typically called chaincodes in Fabric. However, as they do not conceptually differ from smart contracts in blockchain systems such as Ethereum, we stick to this term throughout the paper., an arbitrary program, that is bound to the proposal. Additionally to the smart contract, an endorsement policy must be specified. It determines which and/or how many peers have to endorse the proposal. In our example of money transfer between two organizations, a reasonable endorsement policy is to request endorsement from one peer of each organization — like two lawyers, preserving and defending the individual rights of their clients.
Therefore, in step , the proposal is sent to the two endorsement peers and according to the policy. These two peers now individually simulate the smart contract (-=, +=), that is bound to the proposal, against their local current state. Note that, as the name suggests, the simulation of the smart contract against the current state does not change the current state in any way. Instead, each endorsement peer builds an auxiliary read set and a write set during the simulation to keep track of all accesses that happen. In our case of money transfer over the amount of , the smart contract first reads the two current balances and along with their current version-numbers. Second, the smart contract updates the two balances according to the transferred amount, resulting in the new balances and . Overall, this builds the following read and write set:
In this sense, the simulation of the smart contract is actually only a monitoring of the execution effects. The reason for performing only a simulation is that in this phase, we can not be sure yet whether this transaction will be allowed to commit eventually – this check will be performed later in the validation phase.
After the simulation of the smart contract on all endorsement peers, in step , the endorsement peers return their individually computed read and write sets to the client, that sent the transaction proposal. Additionally, they return a signature of their simulation, that will be relevant in the validation phase in Section 2.6. If all read sets and write sets match333They might not match due to non-determinism in the smart contract or due to malicious behavior of the endorsement peer(s)., in step , the actual transaction (called in the following) is formed from the results of the endorsement. This transaction now contains the effects of the execution in form of the read and write set as well as all signatures and can be passed on to the ordering service.
2.5 Ordering Phase
As mentioned, the central component of the ordering phase is the ordering service, that we visualize in Figure 3. It receives all transactions, that made it through the simulation phase. Consequently, it receives in step our transaction , that we followed through the simulation phase in Section 2.4. In step , we assume that it also receives two other transactions and , that were endorsed in parallel to .
The ordering service now has the sole purpose of establishing a global order among the transactions. It treats the transactions in a black box fashion and does not inspect the transaction semantics, such as the read and write set, in any way. By default, it essentially arranges the transactions in the order in which they arrive, resulting in what we call for the rest of the paper the arrival order. In step , the ordering service now outputs the ordered stream of transactions in form of blocks, containing a certain number of transactions. Outputting whole blocks instead of individual transactions reduces the pressure on the network, as less communication overhead is produced.
Finally, the generated block is distributed to all four peers of the network to start the validation phase. Note that there is no guarantee that all peers receive a block at the same time, as the distribution happens partially from ordering service to peers directly as shown in step and partially between the peers using a gossip protocol as shown in step . However, the service assures that all peers receive the blocks in the same order.
2.6 Validation and Commit Phase
When a block arrives at a peer, the validation phase starts, visualized in Figure 4 for peer . The three remaining peers execute the same validation process. Overall, the validation phase has two purposes.
2.6.1 Endorsement Policy Evaluation
The first purpose is to validate the transactions in the block with respect to the endorsement policy. For example, it is possible that a malicious transaction was generated by a malicious client and a malicious peer in conspiracy to take advantage of the money transfer. Let us assume that transaction is such a malicious transaction and that the malicious client, which proposed , works together with peer , which is also malicious. Instead of using the legit write set from B2, the client creates a proposal with the write set , that it received from its collaborator A2.
How is this transaction now detected in the validation phase? The key to this lies in the signatures and , that the endorsement peers generate at the end of the simulation phase. The signature is computed over the read and write set, the executed smart contract, and the used endorsement policy. The client receives these cryptographically secure signatures and must pack them into the transaction along with the read and write set. The peers that validate the transaction recompute the signatures of all endorsement peers, that were responsible for transaction and compare the signatures with the received ones and . In our example, in step , the peers detect that the signature of the honest peer
does not match to the one they computed from the received write set and thus, would classifyas invalid. and , the remaining transactions in the block, are evaluated in parallel. Their signatures match the ones computed from the read and write set and therefore, these transactions are valid with respect to the endorsement policy.
2.6.2 Serializability Conflict Check
The second purpose of the validation is to analyze the transactions with respect to serializability conflicts, that can arise from the order of transactions. For every transaction, it must be checked whether the version-numbers of all keys in the read set match the version-numbers in the current state. Only if this is the case, a transaction operates on an up-to-date state. Considering our example, let us perform the serializability conflict check for the received block. is already marked as invalid as it did not pass the endorsement policy evaluation, so it is not checked again. passed the endorsement policy evaluation and is now tested for serialization conflicts in step . Its read set is . The version numbers of and in the read set match the ones of the current state and therefore, is marked as valid. As a consequence, in step , the write set of , namely is written to the current state. This changes the current state to and . Note that the version-numbers of the modified variables are incremented.
The next transaction to be checked is in step . Let us assume it also performs a money transfer and has the following read and write set:
This transaction will not pass the conflict check, as it read in version and in version , while the current state already contains in version and in version . Therefore, it operated on outdated data and is marked as invalid. As a consequence, its write set is not applied to the current state and simply discarded.
Finally, after validating all transactions of the block, in step the entire block is appended to the ledger along with the information about which transactions are valid or invalid.
3 Blurred Lines: Fabric vs Distributed Database Systems
As we now have an understanding of the workflow of Fabric, we are able to discuss its architecture in relation to distributed database systems. In particular, we are interested in aspects of Fabric, that are (a) conceptually shared with distributed database systems, but (b) have potential for the application of database technology.
3.1 The Importance of Transaction Order
The first component we look at is the ordering mechanism. Such a component is also present in any distributed database system with transaction semantics and therefore a great candidate for transitioning database technology to Fabric.
As described in Section 2.5, Fabric relies on a single trustworthy ordering service for ordering transactions. Since Fabric simulates the smart contracts bound to proposals before performing the ordering, the order actually has an influence on the number of serialization conflicts between transactions. Again, this is a property shared with any parallel database system, that separates transaction execution from transaction commit.
In ordering transactions, various different strategies are possible: The simplest option is to arbitrarily order them, for instance in the order in which they arrive. While this arrival order is fast to establish, it can lead to serialization conflicts, that are potentially unnecessary. These conflicts increase the number of invalid transactions, which must be resubmitted by the client. Unfortunately, the vanilla Fabric follows exactly this naive strategy. This is caused by the design decision that the ordering service is not supposed to inspect the transaction semantics, such as the read and write set, in any way. Instead, it simply leaves the transactions in the order in which they arrive. This strategy can be problematic, as the example in Table 1 shows. In this example, four transactions are scheduled in the order in which they arrive, namely , where updates the key from version to . Since the transactions , , and each read in version during their simulations, they have no chance to commit, as they operated on an outdated version of the value of . They will be identified as invalid in the validation phase and the corresponding transaction proposals must be resubmitted by the client, resulting in a new round of simulation, ordering, and validation.
|Transaction||Read Set||Write Set||Is Valid?|
Interestingly, for the four transactions from the previous example, there exists an order that is conflict free. In the schedule , as shown in Table 2, all four transactions are valid, as their read and write sets do not conflict with each other in this order.
|Transaction||Read Set||Write Set||Is Valid?|
This example shows that the vanilla orderer of Fabric misses a chance of removing unnecessary serialization conflicts. While this problem is new to the blockchain domain, as blockchains typically offer only a serial execution of transactions, within the database community, this problem is actually well known. There exist reordering mechanisms which aim at minimizing the number of serialization conflicts via a reordering of transactions [21, 13, 20]. However, in a database system, it is typically avoided to buffer a large number of incoming transactions before processing as low latency is mandatory. Thus, reordering is not always an option in such a setup. Fortunately, as blockchain systems buffer the incoming transactions anyways to group them into blocks, this gives us the opportunity to apply sophisticated transaction reordering mechanisms without introducing significant overhead.
We will add such a transaction reordering mechanism to Fabric in Section 4.1, which significantly enhances the number of valid transactions, that make it through the system.
3.2 On the Lifetime of Transactions
The second aspect we look at from a database perspective tackles the lifetime of transactions within the pipeline. In Fabric, every transaction that goes through the system is either classified as valid or as invalid with respect to the validation criteria. In the vanilla version, this classification happens in the validation phase right before the commit phase. A severe downside of this form of late abort is that a transaction, that violated the validation criteria already in an earlier phase, is still processed and distributed across all peers. This penalizes the whole system with unnecessary work, throttling the performance of valid transactions. Besides, this concept also delays the abort notification to the client.
We have to distinguish in which phase a violation happens. First, a violation can occur already in the simulation phase, in form of so called cross-block conflicts, meaning a transaction from a later block, which is currently in the simulation phase, conflicts with a valid transaction from an earlier block. Second, a violation can occur as well as in the ordering phase, in form of within-block conflicts between conflicting transactions in a single block.
3.2.1 Violation in the simulation phase (cross-block conflicts)
To understand the problem in the simulation phase, let us look at the following situation and how the vanilla version of Fabric handles it. Let us assume there are four transactions , , , and that are currently in the ordering phase and that end up in a block of size four, which is shipped to all peers for validation. Before the validation of that block starts within a peer , the smart contract of a transaction proposal starts its simulation in . To do so, it acquires a read lock444The read lock can be shared by multiple simulation phases, as they do not modify the current state. on the entire current state. While the simulation is running, the block has to wait for the validation, as it has to acquire an exclusive write lock on the current state. The problem in this situation is: if , , , or write the value of a key, that is read by , then
simulates on stale data. Therefore, in the moment of the read, the transaction becomes virtually invalid. Still, in the vanilla version of Fabric, this stale read is not detected before the validation phase of. Thus, would continue its simulation and go through the ordering phase, just to be invalidated in the very end.
3.2.2 Violation in the ordering phase (within-block conflicts)
Apart from conflicts across blocks, there can be conflicts between transactions within a block. These conflicts appear after putting the transactions into a particular order in the ordering phase. For instance, the example from Table 1 in Section 3.1 showed a schedule, where the three transactions , , and individually conflict with the previously scheduled transaction of the same block. Unfortunately, these conflicts are not detected within the orderer of the vanilla version of Fabric. The block containing , , and would be distributed across all peers of the network for validation, although of transactions within the block are virtually invalid. As before, this originates from the design decision that the ordering service does not inspect transaction semantics.
The mentioned situations show that Fabric misses several chances to abort transactions right at the time of violation. In contrast to that, database systems are typically very eager in aborting transactions , as it decreases network traffic and saves computing resources. This concept of ”cleaning” the pipeline as early as possible is called early abort in the context of databases, which apply this concept in various flavors. For instance, besides of the early abort of transactions, that violate certain criteria, database systems eliminate records from the query result set as early as possible by pushing down selection and projection operations in the query plan.
To overcome the mentioned problems, we will apply the concept of early abort at several stages of the transaction processing pipeline of Fabric. By this, we assure to utilize the available resources with meaningful work to the extend. We will detail this in Section 4.2.
We have outlined the problems of Fabric and how they relate to key problems known in the context of database systems. Let us now see precisely how we counter them. First, in Section 4.1, we introduce a transaction reordering mechanism, that aims at minimizing the number of unnecessary within-block conflicts. Second, in Section 4.2, we introduce early transaction abort to several stages of the Fabric pipeline. This also involves the introduction of a fine-grained concurrency control mechanism.
4.1 Transaction Reordering
When reordering a set of transactions , multiple challenges must be faced. First, we have to identify which transactions of actually conflict with each other with respect to the actions they perform. Precisely, we have a conflict between two transactions and (denoted as ), if writes to a key that is read by . In this case, must be ordered after (denoted as ) to make the schedule serializable, as otherwise, the read of would be outdated. Unfortunately, the problem is typically more complex as cycles of conflicts can occur, such that simple reordering can not resolve the problem. For example, if we have the cycle of conflicts , there is no order of these three transactions that is serializable. Therefore, before reordering transactions, our mechanism must actually first remove certain transactions of to form a subset , from which a serializable schedule can be generated. Of course, a goal must be to remove as few transactions as possible. Finally, after computing , we can derive a concrete serializable schedule from the transactions in .
On a high-level, we have to carry out the steps as shown in the pseudo-code of Algorithm 5 to create a serializable schedule for a set of transactions .
To understand the principle and to discuss some of the implementation details, let us go through a concrete example. Let us assume we have a set of six transactions to to consider for reordering. These six transactions have read and write sets as shown in Table 3. In total, they access ten unique keys to .
Step (1): Based on this information, we now have to generate the conflict graph of the transactions as done by the function buildConflictGraph() in line 5 of Algorithm 5. To do so in an efficient way, we interpret the rows of Table 3
as bit-vectors of length. Let us refer to them as for the reading accesses and for the writing accesses of a transaction . For each transaction , we now perform a bitwise &-operation between and for all . If the result of an &-operation is not , we have identified a read-write conflict and create an edge in the conflict graph between the corresponding transactions. For example, for we have the reading accesses The bitwise &-operation with leads to , which is not . This means writes a key that is reading and thus, we put a corresponding edge in the conflict graph. As a result, we obtain the conflict graph of our six transactions as shown in Figure 6.
Note that this algorithm has quadratic complexity on the number of transactions. Still, we apply it as the number of transactions to consider is very small in practice due to the limitation by the block size and therefore, the overhead is negligible.
Step (2): To identify the cycles, we apply Tarjan’s algorithm  in the function divideIntoSubgraphs() in line 5 to identify all strongly connected subgraphs. In general, this can be done in linear time in over the number of nodes and number of edges and results in the three subgraphs as shown in Figure 7.
Using Johnson’s algorithm , we then identify all cycles within the strongly connected subgraphs. Again, this step can be done in linear time in , where is the number of cycles. Therefore, if there are no cycles in the subgraphs, the overhead of this step is very small.
We identify that the first subgraph (colored in green) consisting of , , and contains the two cycles and . The second subgraph (colored in red) consisting of and contains the cycle . The third subgraph (colored in yellow) contains only one node and is thus cycle-free.
Step (3): From this information, we can build a table denoting for every transaction in which cycle it appears as shown in the lines 5 to 5 of Algorithm 5. Table 4 visualizes the result for our example. If a transaction is part of a cycle , the corresponding cell is set to , otherwise . The last row of the table sums up for every transaction in how many cycles it is contained in total.
Step (4): We now iteratively remove transactions, that participate in cycles, starting from the ones that appear in most cycles. The lines 5 to 5 of Algorithm 5 show the corresponding pseudo-code. As we can see, and both appear in two cycles, so we take care of them first. If we can choose between two transactions, such as and , we pick the one with the smaller subscript. This assures that our algorithm is deterministic. We remove , which clears all cycles in which appears, namely and . The sum is updated accordingly, as we can see in Table 5.
The transactions and remain with a participation in cycle each. We remove which clears and thereby the last cycle. This results in the state of Table 6.
Step (5): Generating the final schedule is essentially a repetitive execution of two parts until all nodes are scheduled: (a) the locating of the source node in the current subgraph (lines 5 to 5) and (b) the scheduling of all nodes that reachable from that source (lines 5 to 5).
We start part (a) at the node of representing the transaction with the smallest subscript, namely . From this starting node, we have to find a source node, as sources have to be scheduled last. has two parents, namely and , so it not a source. We follow the edge to , which has not been visited yet but is also not a source, as it has as a parent as well. We follow the edge to , which has not been visited yet and which is a source. Therefore, we can schedule safely at the last position in our schedule, to which we refer to as position . Now, part (b) starts as all nodes that are reachable from must be scheduled before it. has two children, namely and . We follow the edge to , which has not been scheduled yet. However, as has an incoming edge from , we also can not directly schedule it. First, we visit and identify that it has a parent in form of , the source at which we started. With this information, we know that must be scheduled at position and must be scheduled at position . This ends part (b), as all reachable nodes have been scheduled. Next, we restart at the only remaining node . As is not only a source but also a sink, we can schedule it instantly at position . This results in the final schedule , which is returned to the orderer.
Please note that our reordering mechanism is not guaranteed to abort a minimal number of transactions, as this would be a NP-hard problem. However, it offers a very lightweight way to generate a serializable schedule with a small number of aborts.
4.1.2 Batch Cutting
In the context of transaction reordering, we have to discuss and extend a mechanism within the ordering service, that we omitted for simplicity in the description of Fabric in Section 2, namely batch cutting. When the ordering service receives the transactions in form of a constant stream, it decides based on multiple criteria when to ”cut” a batch of transactions to finalize it and to form the block. In the vanilla version, a batch is cut as soon as one of the following three conditions hold: (a) The batch contains a certain number of transactions. (b) The batch has reached a certain size in terms of bytes. (c) A certain amount of time has passed since the first transaction of this batch was received.
In Fabric++, we extend these criteria by one additional condition. We also cut the batch, if (d) the transactions within the batch access a certain number of unique keys. This condition ensures that the runtime of our reordering mechanism, in particular the time of step (1), remains bounded.
To analyze the effectiveness of our reordering mechanism, we first evaluate it in a stand-alone micro-benchmark in isolation of Fabric. For a given sequence of input transactions we compute the number of valid transactions for this particular sequence (called ”arrival order” in the following plots) as well as for the sequence that is generated by our reordering mechanism (called ”reordered” in the following plots). Additionally, we measure the time to compute the reordered schedule. In Figure 9, we test a workload pattern with varying number of conflicts. For the interested reader, we provide a second micro-benchmark in the Appendix B on the effect of varying the length of the cycles (Figure 15) and see how well our reordering mechanism performs in comparison to the naive arrival order.
4.1.4 Micro-Benchmark 1: Interleave reads and writes to vary the number of conflicts
The first input sequence we test consists of two equal sized sub sequences, where one subsequence contains only transactions that perform writes (colored in red) and the other sequence only transactions that read (colored in blue). Each transaction performs only one operation (either read or write). Neither two writes nor two reads happen to the same key. For the example of transactions, we start with the following sequence :
To generate , we move the last transaction of to the front, leading to the following sequences , , and .
The more writing transactions happen before the corresponding reading transactions, the more conflicts happen. We want to find out whether our reordering mechanism can solve this problem.
Figure 9 shows the results for transactions. As we can see, our reordering mechanism is able to reorder the transactions for every input sequence in a way such that all transactions are valid. In contrast to that, the arrival order suffers under a lot of invalid reading transactions, if writing transactions happen before. We can also see that our reordering mechanism is computationally cheap: it takes only around 1 to 2 ms to rearrange the transactions on a Macbook Pro with Intel Core i7 running at 3.1 GHz.
4.2 Early Transaction Abort using Advanced Concurrency Control
The reordering mechanism previously described not only tries to minimize the number of unnecessary aborts, it also enables a form of early abort. Transactions, that are removed from because of their participation in conflict cycles can be aborted already in the ordering phase instead of later on the validation phase. This assures that less transactions are distributed across the network.
In the following, we want to push this concept of aborting transactions as early as possible in the pipeline to the limits. Additionally to early aborting transactions that occur in conflict cycles, we can integrate two more applications of early abort, as we will describe in Section 4.2.1 and Section 4.2.2. The first one is happening already in the simulation phase. Let us see in the following how this works.
4.2.1 Early Abort in the Simulation Phase
To realize early abort in the simulation phase, we first have to extend Fabric by a more fine-grained concurrency control mechanism, that allows for the parallel execution of simulation and validation phase within a peer. With such a mechanism at hand, we have the chance of identifying stale reads during the simulation already.
To understand the concept, let us consider the example from Section 3.2.1 again. With a fine-grained concurrency control mechanism, the block containing , , , and would not have to pend for validation while the smart contract bound to the proposal is simulating. Instead, the four transactions would apply their updates in an atomic fashion while is simulating. As a consequence of this design, for every read performs, we can check whether the read value is still up-to-date. As soon as we detect a stale read, we can abort the simulation of the transaction proposal. Additionally, we directly notify the corresponding client about the abort, such that it can resubmit the proposal without delay.
Let us discuss in the following, how exactly our fine-grained concurrency control mechanism works and how we realize it in Fabric++. In the context of modern database systems, advanced concurrency control mechanisms are well established [14, 17, 18, 19, 12, 15]. Instead of locking the entire store, these techniques typically perform a fine-grained locking on the record level or even at the level of individual cells/values. As there is conceptually no difference between the store of a database system and the store used within the Fabric peers, similar techniques can be applied here.
As discussed in Section 2, Fabric implements its current state in form of a key-value store, which maps each individual key to a pair of value and version-number. The version-number is actually composed of the ID of the transaction, that performed the update, as well as the ID of the block that contains the transaction. In the original version of Fabric, the sole purpose of the version-numbers is to identify stale reads. In the validation phase, for every transaction we check whether the version-number of the read value still matches the one in the current state.
We can go one step further and exploit the available version-numbers to implement a lock-free concurrency control mechanism protecting the current state. To do so, in Fabric++, we first remove the read-write lock, that was unnecessarily sequentializing simulation and validation phase. The version-number, that is maintained with each value, is sufficient to ensure the same transaction isolation semantics as the vanilla version. As no lock is acquired anymore, we need a mechanism to ensure that updates performed by the validation phase are not seen by simulation phases running in parallel. To achieve this behavior, during simulation, we have to inspect the version-number of every read value and test whether it is still up-to-date.
Figure 10 visualizes this concept using a concrete example. At the start of the simulation phase, we first identify the block-ID of the last block that made it into the ledger. Let us refer to this block-ID as the last-block-ID. In our example, last-block-ID = 4. During the simulation of a smart contract bound to a transaction proposal , no read must encounter a version-number containing a block-ID higher than the last-block-ID. If it does see a higher block-ID it means that during the simulation phase, a validated transaction in the validation phase modified a value in the read set of and thus, the read set is outdated.
In our example, the read of in the simulation phase happens before the update of to in the validation phase. This is reflected by the version-number of , namely block-ID = 4. Therefore, this read is up-to-date and the simulation continues. In contrast to that, the read of happens after the update of to in the validation phase. This is reflected by the version-number of , namely block-ID = 5. As 5 is higher than the last-block-ID = 4, we can directly classify as invalid, as the transaction will not have a chance to pass the validation phase later on. Please note that the overall correctness of our lock-free mechanism is ensured by the atomic updates of the version-numbers.
4.2.2 Early Abort in the Ordering Phase
In addition to the early abort in the simulation phase, as explained in Section 4.2.1, we can transition a similar concept also to the ordering phase. As Fabric performs commits at the granularity of whole blocks, two transactions within the same block, that read the same key, must read the same version of that key. For example, let us consider two transactions and , where is ordered before within the same block (). If read version of a key and read version of in their respective simulations, then is invalid. Such a version mismatch can happen, if between the simulations of and a change to the value of was committed by a valid transaction from a previous block. Therefore, as soon as we detect a version mismatch between transactions within the same block, we can early abort the latter transaction. Again, this strategy assures that only those transactions end up in a block, that have a realistic chance of commit.
5 Experimental Evaluation
In the previous section, we have extended and modified core components of Fabric in several ways, turning it into Fabric++. It is now time to evaluate the modifications in terms of effectiveness. Primarily, we are interested in the throughput of valid/successful and invalid/failed transactions, that make it through the system. Secondarily, we are interested on the influence of certain system configurations and the workload characteristics on the system.
Before starting with the actual experiments, let us discuss the setup. Our cluster consists of six identical servers, that are located within the same rack and connected via gigabit-ethernet. Four machines serve as peers, one machine runs the ordering service, and one machine serves as the client, which fires transaction proposals. Each server consists of two quad-core Intel Xeon CPU E5-2407 (SandyBridge architecture) running at GHz with KB of L1 cache, KB of L2 cache, and MB of a shared L3 cache. GB of DDR3 ram are attached to each of the two NUMA regions. The operating system used is a 64-bit Arch Linux with kernel version . Fabric is set up to use LevelDB as the current state database.
5.2 Benchmark Framework and Workload
In the database community, there exist numerous established benchmarks that can be used to test and to compare systems, such as TPC-C , TPC-H , or YCSB . Unfortunately, since blockchains are still a relatively young field, there exist only very few benchmarks with standardized workloads.
At first, we looked at the Caliper  benchmarking suite which seemed like a natural candidate, as it is part of the Hyperledger project just like Fabric. It is compatible to Fabric , but comes with a few limitations: First, the framework provides only sample smart contracts and not a real benchmarking workload. Second, for certain metrics such as transactions per second or latency, it remains unspecified how they are actually measured. Third, it supports only a single channel. Apart from these limitations, other researcher have experienced incorrect behavior of Caliper in form of events, that were not properly registered. . As a consequence, they released a fork of Caliper named Gauge  that claims to resolve these problems. Unfortunately, Gauge is not compatible with version of Fabric right now. Next, we looked at Blockbench, which originates from a survey paper  on blockchain systems. While Blockbench actually provides some benchmarking workloads such as YCSB, again, it lacks the support for Fabric and would need significant changes to make it compatible.
As a consequence of this journey, we decided to build our own benchmarking framework and to introduce a highly customizable workload. This allows us to fire transaction proposals at a specified rate from multiple clients in multiple channels. Our benchmark setup looks as follows: Initially, we create a certain number of accounts (10,000 accounts throughout this section, 20,000 accounts in Appendix C
), each represented by a randomly generated account balance. Our workload is formed of a single smart contract, that reads and writes an adjustable number of account balances, simulating a typical asset transfer scenario between accounts. Among the accounts, there exist a certain number of hot accounts, that are involved in transactions more frequently than the remaining ones. By varying the number of read and write accesses per transaction, the probability of picking hot accounts, and the number of hot accounts, we are able to generate a wide range of different workloads.
In a single run, we fire a constant stream of transaction proposals, that are bound to our smart contract, for a certain amount of time at a certain firing rate. In the following, we test numerous different system and workload configurations to identify the impact of the system. In the individual experiments, we will detail the chosen configuration.
5.3 Transactional Throughput
We start our experimental evaluation by testing Fabric and Fabric++ under probably the most important criterium for a transaction processing system, namely the throughput of transactions. We differentiate between successful and failed transactions: a good system should try to maximize the number of successful transactions while keeping the number of failed transactions as small as possible.
|Fired transaction proposals per second per client||512|
|Duration in which transaction proposals are fired||90 sec|
|Number of channels||1|
|Number of clients per channel||4|
|Maximum time to form a block||1 sec|
|Maximum number of keys accessed per block||16384|
|Maximum size per block||2MB|
|Maximum number of transactions per block (BS)||256, 512, 1024|
|Number of account balances||10000|
|Number of read & written balances per transaction (RW)||4, 8|
|Probability for picking a hot account for reading (HR)||10%, 20%, 40%|
|Probability for picking a hot account for writing (HW)||5%, 10%|
|Number of hot account balances (HSS)||1%, 2%, 4%|
To measure this property, we fire a constant stream of transaction proposals for seconds into a single channel using four clients. Each client fires at a rate of proposals per second. This firing rate is sufficient to fully sustain the system in our setup. Table 7 shows the detailed configuration. To identify their impact on the throughput, we vary five important parameters: the maximum number of transactions per block (BS), the number of read balances and written balances per transaction (RW), the probability for picking a hot account for reading (HR) respectively for writing (HW), as well as the number of hot account balances (HSS). In total, we evaluate different configurations in this experiment.
Figure 11 shows the results. First and foremost, we vary the maximum number of transaction per block (BS), as it has a large impact on the transaction processing in general and the ordering in particular. The results for Fabric and Fabric++ for BS=256 are presented in first and the second row, for BS=512 in the third and fourth row, and for BS=1024 in the fifth and sixth row, respectively. Along the columns, we vary the remaining four parameters RW, HR, HW, and HSS in a total of 36 configurations. For a single run, we show the transactional throughput (blue) that was achieved for each second of the second run. This throughput is additionally split into successful transactions (green) and failed transactions (red).
To interpret the results, let us look at Figure 11 as a whole. We can see that Fabric++ significantly increases the throughput of successful transactions over Fabric for essentially all tested configurations. For vanilla Fabric, we can observe that under configurations accessing many accounts (RW=8), the number of failed transactions per second is actually significantly higher than the throughput of successful transactions. This problem is highly reduced by Fabric++, where the successful throughput is at least on par with the failed throughput, or even dominates it. The largest improvement of Fabric++ over Fabric in terms of successful transactions we observe is around factor 3x for the configuration BS=1024, RW=8, HR=40%, HW=10%, HSS=1%, which we also show in a zoomed-in version in Figure 14 of the appendix. We also observe a significant decrease in the throughput of the successful transactions with the increase in the hotness of the transactions. For large block-sizes (BS ), each block () roughly updates every key in the hotset and a large fraction of coldset. This forces most of the transactions in block () to abort because of read-write conflicts. So, we observe a pattern of blocks committing with alternating highly-successful and highly-failed transactions. In Fabric, most of the transactions are aborted due to this inter-block conflicts. In addition to this, due to a large block size, Fabric creates a large amount of within-block conflicts, which results in a large fraction of the total number of processed transactions to abort. In Fabric++, we observe a similar alternating behavior in terms of cross-block conflicts. However, since Fabric++ reorders the transactions within the block to remove the within-block conflicts, the number of successful transactions remain on-par with the number of failed transactions. We observe that the strength of Fabric++ lies in contended workload, where the hotness has temporal behavior. If, due to temporal behavior, hot reads, and updates end up in a same block, Fabric++ can possibly optimize the order of transactions to extract a largest set of transactions that have a chance to commit. In contrast to Fabric++, Fabric will behave similarly for temporal and non-temporal hotness in the workloads, forcing a large fraction of transactions to abort, even though they could commit.
Apart from the overall comparison of Fabric and Fabric++, we can analyze the influence of the parameters on the system. A larger block size generally results in a higher throughput. In the case of Fabric++, a larger block size also increases the reordering possibilities of our mechanism. Besides, we can see that a higher number of accesses per transactions results in more failed transactions.
5.4 Optimization Breakdown
In Section 5.3, we measured the throughput of Fabric++ with both optimizations activated. Let us now see at a sample configuration, how much the individual optimizations of reordering and early abort contribute to the improvement. Figure 12 shows the improvement breakdown for the configuration BS=1024, RW=8, HR=40%, HW=10%, HSS=1% in comparison to standard Fabric. While Fabric achieves only a throughput of around successful transactions per second, activating one of our two optimization techniques alone improves this to around transactions per second. In comparison to that, activating both techniques at the same time results in the highest throughput of successful transactions with around transactions per second. This shows nicely how both techniques work together: Transactions, that are already early aborted in the simulation phase do not end up in a block in the ordering phase. As a consequence, only transactions, that have a realistic chance of being successful, are considered in the reordering process.
5.5 Scaling Channels and Clients
So far, in all experiments we used four clients to fire transactions into a single channel. Let us now vary the number of channels as well as the number of clients per channel to see the effect on the throughput. Again, we use the configuration BS=1024, RW=8, HR=40%, HW=10%, HSS=1% and evaluate the average throughput of successful transactions for Fabric and Fabric++.
First, we vary the number of channels in Figure 13(a) from to . Per channel, we use clients to fire transaction proposals. We can see that when going from channel to channels, the throughput of both Fabric and Fabric++ significantly increases. Obviously, the additional mechanisms of Fabric++ do not harm the scaling with the number of channels. Only when using channels, the throughput decreases again for both Fabric and Fabric++. This is simply the case because individual channels start competing for resources. This also increases the number of failed transactions: Scaling from to channels increases the number of failed transactions from TPS to TPS for Fabric and from TPS to TPS for Fabric++. Due to the competition for resources, individual simulations phase take longer and increase the chance of working on stale data.
After varying the number of channels, let us now vary the number of clients per channel in Figure 13(b). We test , , , and clients, where all clients fire their transaction proposals into a single channel. Here, the picture is a slightly different to the behavior when scaling channels. The throughput of Fabric increases very gently with the number of clients, and we see an improvement from around to successful transactions per seconds when going from to clients. For Fabric++, we see the highest throughput with around successful transactions per second already for clients. For clients, the throughput drops by around factor to the throughput of Fabric, clearly showing that the firing clients also compete for resources. This is also visible in an increase in failed transactions when going from to clients per channel, which increase from TPS to TPS for Fabric and from TPS to TPS for Fabric++.
In this work, we identified strong similarities of the transaction pipeline of contemporary blockchain systems at the case of Hyperledger Fabric and distributed database systems in general. We analyzed these similarities in detail and exploited them to transition mature techniques from the context of database systems to Fabric, namely transaction reordering to remove serialization conflicts as well as early abort of transactions, that have no chance to commit. In an extended experimental evaluation, where we tested different configurations of workload and system, we showed that this improved version Fabric++ outperforms the vanilla Fabric in terms of throughput of successful transactions by up to factor x, while keeping the scaling capabilities intact.
We would like to thank Immanuel Haffner for helping us in setting up the Fabric cluster, running benchmarks, as well as profiling the internals of the system.
-  https://github.com/brianfrankcooper/ycsb.
-  https://github.com/hyperledger/caliper.
-  https://github.com/persistentsystems/gauge.
-  https://github.com/persistentsystems/gauge/blob/master/docs/caliper-changes.md.
-  TPC-C On-Line Transaction Processing Benchmark http://www.tpc.org/tpcc/.
-  TPC-H Decision Support Benchmark http://www.tpc.org/tpch/.
-  E. Androulaki, A. Barger, V. Bortnikov, et al. Hyperledger fabric: a distributed operating system for permissioned blockchains. In EuroSys 2018, Porto, Portugal, April 23-26, pages 30:1–30:15, 2018.
-  M. Castro and B. Liskov. Practical byzantine fault tolerance. In Third USENIX Symposium on Operating Systems Design and Implementation (OSDI), New Orleans, Louisiana, USA, February 22-25, pages 173–186, 1999.
-  T. T. A. Dinh, R. Liu, M. Zhang, G. Chen, B. C. Ooi, and J. Wang. Untangling blockchain: A data processing view of blockchain systems. IEEE Trans. Knowl. Data Eng., 30(7):1366–1385, 2018.
-  Z. He and B. Hong. Impact of early abort mechanisms on lock-based software transactional memory. In 16th International Conference on High Performance Computing, HiPC 2009, December 16-19, 2009, Kochi, India, Proceedings, pages 225–234, 2009.
-  D. B. Johnson. Finding all the elementary circuits of a directed graph. SIAM J. Comput., 4(1):77–84, 1975.
-  P. Larson, S. Blanas, C. Diaconu, C. Freedman, J. M. Patel, and M. Zwilling. High-performance concurrency control mechanisms for main-memory databases. PVLDB, 5(4):298–309, 2011.
-  G. Luo, J. F. Naughton, C. J. Ellmann, and M. Watzke. Transaction reordering. Data Knowl. Eng., 69(1):29–49, 2010.
-  T. Neumann, T. Mühlbauer, and A. Kemper. Fast serializable multi-version concurrency control for main-memory database systems. In SIGMOD 2015, Melbourne, Victoria, Australia, May 31 - June 4, pages 677–689, 2015.
-  A. Sharma, F. M. Schuhknecht, and J. Dittrich. Accelerating analytical processing in MVCC using fine-granular high-frequency virtual snapshotting. In SIGMOD 2018, Houston, TX, USA, June 10-15, 2018, pages 245–258, 2018.
-  R. E. Tarjan. Depth-first search and linear graph algorithms. SIAM J. Comput., 1(2):146–160, 1972.
-  T. Wang, R. Johnson, A. Fekete, and I. Pandis. Efficiently making (almost) any concurrency control mechanism serializable. The VLDB Journal, 26(4):537–562, Aug 2017.
-  T. Wang and H. Kimura. Mostly-optimistic concurrency control for highly contended dynamic workloads on a thousand cores. PVLDB, 10(2):49–60, 2016.
-  X. Yu, G. Bezerra, A. Pavlo, S. Devadas, and M. Stonebraker. Staring into the abyss: An evaluation of concurrency control with one thousand cores. PVLDB, 8(3):209–220, 2014.
-  B. Zhang, B. Ravindran, and R. Palmieri. Reducing aborts in distributed transactional systems through dependency detection. In Proceedings of the 2015 International Conference on Distributed Computing and Networking, ICDCN 2015, Goa, India, January 4-7, 2015, pages 13:1–13:10, 2015.
-  N. Zhou, X. Zhou, X. Zhang, et al. Reordering transaction execution to boost high-frequency trading applications. Data Science and Engineering, 2(4):301–315, 2017.
Appendix A Throughput Timeline
Figure 14 presents the detailed zoom-in of the run for configuration BS=1024, RW=8, HR=40%, HW=10%, HSS=1% for Fabric (Figure 14(a)) and Fabric++ (Figure 14(b)). We can see that the throughput remains very stable over the run of seconds. In the beginning, there is a small ramp-up phase visible, which is actually very interesting. For Fabric, the throughput of successful transactions directly starts very low with only transactions per second. In contrast to that, for Fabric++, the initial throughput of successful transactions almost reaches transactions per second with the number of failed transactions per second being . This shows that for the first block, our reordering mechanism manages to completely resolve all intra-block conflicts. After that, inter-block conflicts can arise which increase the number of failed transactions in any case.
Appendix B Ordering Service Micro-Benchmark 2: Vary the length of cycles
In the following experiment, we want to analyze the impact of cycles on the arrival order and on our reordering mechanism. To do so, we again form a sequence of transactions, that contains cycles of size transactions of the form