FairLedger: A Fair Blockchain Protocol for Financial Institutions

06/10/2019 ∙ by Kfir Lev-Ari, et al. ∙ 0

Financial institutions are currently looking into technologies for permissioned blockchains. A major effort in this direction is Hyperledger, an open source project hosted by the Linux Foundation and backed by a consortium of over a hundred companies. A key component in permissioned blockchain protocols is a byzantine fault tolerant (BFT) consensus engine that orders transactions. However, currently available BFT solutions in Hyperledger (as well as in the literature at large) are inadequate for financial settings; they are not designed to ensure fairness or to tolerate selfish behavior that arises when financial institutions strive to maximize their own profit. We present FairLedger, a permissioned blockchain BFT protocol, which is fair, designed to deal with rational behavior, and, no less important, easy to understand and implement. The secret sauce of our protocol is a new communication abstraction, called detectable all-to-all (DA2A), which allows us to detect participants (byzantine or rational) that deviate from the protocol, and punish them. We implement FairLedger in the Hyperledger open source project, using Iroha framework, one of the biggest projects therein. To evaluate FairLegder's performance, we also implement it in the PBFT framework and compare the two protocols. Our results show that in failure-free scenarios FairLedger achieves better throughput than both Iroha's implementation and PBFT in wide-area settings.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 15

page 16

page 17

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

As of today, support for financial transactions between institutions is limited, slow, and costly. For example, an oversees money transfer between two banks might take several days and entail fees of tens of dollars. The source of this cost (in term of both time and money) is the need for a reliable clearing house; sometimes this even requires physical phone calls at the end of the day to make sure all the balances coincide. At the same time, emerging decentralized cryptocurrencies like Bitcoin [42] complete transactions within less than hour, at a cost of microcents. It is therefore not surprising that financial institutions are looking into newer technologies to bring them up to speed and facilitate trading in today’s global economy.

Perhaps the most prominent technology considered in this context is that of a blockchain, which implements a secure peer-to-peer ledger of financial transactions on top of a consensus engine. A major effort in this direction is Hyperledger [27], an open source project hosted by the Linux Foundation and backed by a consortium of more than a hundred companies. In contrast to cryptocurrency protocols deployed over the Internet, which are fully anonymous and allow any party to join or leave at any time, blockchain protocols for financial institutions, also called permissioned blockchains, are much more conservative: Every participant is known and certified, so that it has to be responsible for its actions in the real world. In addition, such systems are intended to be deployed over a secure and reliable wide-area network (WAN). Therefore, proposed solutions for permissioned blockchains [40, 45, 27] abandon the slow and energy-consuming proof-of-work paradigm of Bitcoin, and tend to go back to more traditional distributed consensus protocols. Because of the high stakes, malicious deviations from the protocol (due to bugs or attacks), as rare as they might be, should never compromise the service. Such deviations are modeled as byzantine faults [35], and to deal with them, proposed solutions use byzantine fault tolerant (BFT) consensus protocols.

Yet, dealing with byzantine failures is only a small part of what is required in permissioned blockchains. In fact, a break-in that causes a bank’s software to behave maliciously is so unusual that it is a top news story, and is investigated by official authorities such as the FBI. On the other hand, financial institutions always try to maximize their own profit, and would never use a system that discriminates against them. Moreover, they can be expected to selfishly deviate from the protocol whenever they can benefit from doing so. In particular, financial entities typically receive a fee for every transaction they append to the ledger, and thus can be expected to attempt to game the system in a way that to maximizes the rate of their transactions in the ledger. Such rational behavior, if not carefully considered, not only can discriminate against some of the entities, but may also compromise safety.

As a result, in the FinTec context, one faces a number of important challenges that were not emphasized in previous BFT work: (1) fairness in terms of the opportunities each participant gets to append transactions to the ledger; (2) expected rational behavior by all parties; and (3) optimized failure-free performance, given that financial institutions are usually very secure. In addition, it is important to stress (4) protocol simplicity, because complex protocols are inherently bug-prone and easier to attack. In this work we develop FairLedger, a new BFT permissioned blockchain protocol for the Hyperledger framework, which addresses all of these challenges. Our protocol is fair, designed for rational participants, optimized for the failure-free case, simple to understand, and easy to implement. Specifically, we show that following the protocol is an equilibrium, and when rational participants do follow the protocol, they all get perfectly fair shares of the ledger.

Given that byzantine failures are expected to be rare, our philosophy is to optimize for the “normal mode” when they do not occur (as also emphasized in some previous work, e.g., Zyzzyva [32]

). For this mode, we design a simple protocol that provides high performance when all parties are rational but not byzantine. Under byzantine failures, the normal mode protocol remains safe and fair, but does not necessarily guarantee progress. Upon detecting that a rogue participant is attempting to prevent progress, we switch to the “alert mode”. At this point, it is expected that real-world authorities (such as the FBI or Interpol) will step in and investigate the break-in. But such an investigation may take days to complete, and in the time being, the service remains operational using the alert mode protocol, even if at degraded performance.

An important lesson learned from the deployment of Paxos-like protocols in real systems such as ZooKeeper [31] and etcd [18], is that systems will only be used if they are easy to understand, implement, and maintain. Specifically, Paxos-like protocols use a quorum to agree on every transaction appended to the ledger. Whereas the general Paxos [33] allows using a different quorum for every transaction, practical systems do away with this freedom. Instead, they follow the Vertical Paxos [34, 4] approach of using a fixed quorum for a sequence of transactions, and reconfiguring it upon failures [31, 18]. We follow this approach in FairLedger. Specifically, we designate a committee of participants who are interested in issuing transactions (e.g., banks) and have them run a sequencing protocol to order all their transactions. A complementary master service monitors the committee’s progress and initiates reconfiguration when needed. Including all interested parties in quorums is also instrumental in achieving fairness– this way, all the committee members benefit from sequencing batches that include transactions by all of them. We use rate adjustments, batching, and asynchronous broadcast to achieve high throughput even if some committee members are slow.

In the absence of failures, the committee runs an efficient normal mode sequencing protocol. In this mode, byzantine participants cannot violate safety but may prevent progress, causing the master to switch the system to the alert mode. We assume a loosely synchronous model, where a master can use a coarse time bound (e.g., one minute) to detect lack of progress. This bound is only used for failure recovery, and does not otherwise affect performance. The key feature of our alert mode sequencing protocol is that if participants deviate from the protocol in a way that jeopardizes progress, they are accurately detected. If they are slow, their transaction rate is lowered, and if they are not cooperative, they are removed from the committee altogether. Unlike in other Hyperledger protocols [45], FairLedger never indicts correct participants. Identifying faulty components without accusing correct ones is essential in allowing the system to heal itself following attacks.

The sequencing protocol uses all-to-all communication among committee members. Since the quorum includes all participants and all messages are signed, the protocol ensures safety despite byzantine failures of almost any minority. Specifically, for failures, our protocol is correct when the number of participants .

Nevertheless, it is enough for one participant to withhold a single message in order to prevent progress. Such a deviation from the protocol is hard to detect even with reliable communication since one participant can claim that it sent a message to another, while the recipient claims the message was not sent. To deal with such deviations we define a new communication abstraction, which we call detectable all-to-all (DA2A). Besides the standard broadcast and deliver API, DA2A exposes a detect method that returns an accurate set of participants that deviated from the broadcast or deliver protocols.

We implement FairLedger’s sequencing protocol in Iroha [45], which is part of the Hyperledger [27] open-source project, and compare its performance to their up-to-date version. Specifically, since Iroha’s implementation is modular, we are able to replace their BFT consensus protocl (which is based on [22]) with our sequencing protocol without changing other components (e.g., communication, cryptographic, and database libraries). We compare both versions in Emulab [49], configured to simulate a wide-area network, and our results show that FairLeadger outperforms Iroha’s BFT protocol in the vast majority of the tested scenarios (both in normal mode and in alert mode).

In addition, since the Iroha system consists of many components (e.g., GRPC [29] communication) that may include overheads and bottlenecks, we also implement FairLedger’s sequencing protocol in the PBFT [16] framework, which provides a clean environment to check performance. Again, we use Emulab [49], configured to simulate a wide-area network, to compare FairLedger to the steady state protocol of PBFT [15]. Our results show that Fairledger’s latency is better than PBFT’s in both the normal and alert modes. Fairledger’s throughput outperforms PBFT’s in normal mode and is inferior to it in the alert mode, but PBFT’s advantage is diminished as the system scale grows.

In summary, this paper makes the following contributions:

  1. We define a fair distributed ledger abstraction for rational participants.

  2. We define a detectable all-to-all (DA2A) abstraction that identifies participants deviating from the communication protocol.

  3. We design FairLedger, the first BFT blockchain protocol that ensures strong fairness when all participants are rational: FairLedger is safe under byzantine failures of almost any minority, and detects and punishes deviating (byzantine and rational) participants. It is also simple to understand and implement.

  4. We implement FairLedger’s sequencing protocol in the Hyperledger framework and substitute it for Iroha, a ledger solution included therein. Our results show that FairLedger outperforms Iroha’s original BFT implementation in the vast majority of cases.

  5. We implement and test FairLedger’s sequencing protocol in the PBFT framework. Our result shows that FairLedger outperforms PBFT in the normal mode, and achieves slightly lower results in the alert mode.

The rest of the paper is organized as follows: Section 2 defines rational participants and the fair ledger service, while Section 3 details our system model. In Section 4 we present our architecture, and in Section 5 we give the FairLedger protocol. In Section 6, we describe the implementation in Hyperledger and in the PBFT frameworks, and in Section 7 we evaluate our protocol in the both of them. Finally Section 8 discusses related work, and Section 9 concludes the paper.

2 A Fair Ledger Abstraction for Rational Players

We consider a set of players, each of which represents a real-world financial entity (e.g., a bank), jointly attempting to agree on a shared ledger of financial transactions. Every player has an unbounded stream of transactions that it wants to append to the common ledger. The stream, for example, can come from the entity’s clients. We assume that the financial entity receives a fee (or some other benefit) for every transaction it appends to the shared ledger, and so it is motivated to append as many transactions as possible. A principal goal for our service is fairness, that is, providing entities with equal opportunity for appending transactions. Section 2.1 discusses how we model the players, and Section 2.2 defines the fair shared ledger abstraction.

2.1 Byzantine and rational behavior

Traditional distributed systems are usually managed by one organization, and thus whenever an entity deviates from the protocol, it can be explained as a software or hardware bug or by this entity being hacked. Therefore, protocols for such environments are designed to remain correct even if some entities deviate from the protocol in an arbitrary manner. Such protocols are called byzantine fault tolerant, and obviously only a small subset of the entities are allowed to be byzantine. But, since in this work we seek a protocol that coordinates among many organizations, and especially because financial assets are involved, we have to take into account that every entity may behave rationally, and deviate from the protocol if doing so increases its benefit.

To reason about such rational behavior we follow [41], and assume that each entity can be either byzantine or rational. A rational entity has a known utility function that it tries to maximize and deviates from the protocol only if this increases its utility, whereas a byzantine entity can deviate arbitrarily from the protocol,i,e, its utility function is unknown.

We assume that the system involves two types of entities – players and auditors. Players (e.g., banks) propose transactions they would like to append to the ledger, while auditors oversee the system. The same physical entity may be both a player and an auditor, but additional entities (e.g., government central banks) may act as auditors as well. There are initialy players, and any number of auditors. The number of byzantine players is bounded by a known parameter , and we require . In addition, at most a minority of the auditors can be byzantine. We assume that byzantine entities can collude, but rational ones do not.

In order to prove that a protocol is correct in our model, we need to show that (1) the problem specification is satisfied in case all the rational entities follow the protocol and there are at most byzantine ones, and (2) following the protocol is an equilibrium for rational entities even in the presence of byzantine ones. These two conditions imply the protocol’s correctness assuming that players do not deviate unless they benefit from doing so. A similar assumption was made in previous works on BAR (byzantine, altruistic, rational) fault tolerance [5, 36].

2.2 Distributed fair ledger

A log is a sequence of transactions from some domain . Note that is defined by the high-level application that uses our ledger abstraction. A ledger is an abstract object that maintains a log (initially empty) and supports two operations, append and read, with the following sequential specification: An append operation with changes the state of the log by appending to the end of the log. A read operation returns the last transactions in the log.

The utility function of a rational player is the ratio of transactions that it appends to the ledger, i.e., the number of transactions it appends to the ledger out of the total number of transactions in the ledger. If two ledgers has the same ratio, then the one with the more transactions is preferred. Meaning that the players care about the overall system progress but they care more about getting a fair share of it.

The utility function of an auditor is the following: an auditor that is also a player has the player’s utility function. Otherwise, its utility function is the number of players on the committee in case progress is being made, and 0 in case the system stalls. In other words, the auditors aim is to ensure the system’s overall health, which mean not to remove a player unless it couses the system to stall.

We enforce strict fairness. Intuitively, this means that every player gets an equal number of opportunities to append a transaction to the log. Thus, if player follows the protocol, then at any point when the log contains transactions appended by , the log does not contain more than transactions appended by any other player. In Section 3 we formalize and extend this definition to a case in which different players are allocated different shares of the log, and these shares (as well as the set of players) may change over time.

A distributed ledger protocol emulates an abstract ledger with atomic operations to a set of players that access it concurrently. The shared ledger state at any time (1) reflects all completed operations by players that follow the protocol, and (2) may or may not reflect pending (not completed) operations as well as operations performed by players (byzantine or rational) that deviate from the protocol. Intuitively, this means that the protocol tolerates players that do not cooperate by restricting the possible outcomes of their behavior to executing correct operations or leaving the log’s state unchanged.

3 System Model

We now state our assumptions on the deployment environment of our protocol.

Certificates.

We assume that players have been certified by some trusted certification authority (CA) known to all players. In addition, we assume a PKI [44]: each player has a unique pair of public and private cryptographic keys, where the public keys are known to all players, and no coalition of players has enough computational power to unravel other players’ private keys.

Reliable communication.

We assume reliable communication channels (implemented, e.g., using TCP or using retransmissions over UDP) between pairs of players. Such channels are not strictly required among all pairs, but there must be at least players that communicate reliably with all others.

Timing assumptions.

As in previous works on permissioned blockchains [45, 22, 27], we assume that there is a known upper bound on message latency. Nevertheless, our sequencing protocol is safe and fair even if the bound does not hold. We exploit this bound to detect failures when the protocol stalls because a byzantine (or rational) player deviates from the protocol by withholding messages. Thus, the bound can be set very conservatively (e.g., in the order of minutes) so as to avoid false detection.

Rational and byzantine behavior.

We assume that rational entities do not collude, but byzantine players are controlled by a strong adversary, and thus can arbitrarily deviate from the protocol (e.g., crash, withhold messages, or send incorrect protocol messages) and collude. Because we assume synchrony and a PKI, we can overcome byzantine failures of almost any minority [20].

Quality of service.

Above, we gave a simplistic definition of fairness assuming all players are allowed to append transactions at the same rate. However, this does not necessarily have to be the case. For example, slow players sometimes cannot sustain the throughput required by others, and thus by insisting on strict fairness we decrease the total system throughput. In addition, it is possible that some players deserve more throughput, e.g., because they pay more for the service. Therefore, we generalize our fairness definition to allow general quality of service (QoS) allocations. Because QoS allocations may change over time, we define QoS-fairness for segments of the ledger. Denote by the segment of the ledger from the i to the j entry (inclusive).

Definition 1 (QoS).

Given a tuple s.t.  and , we say that the segment of a (sequential) ledger is R-fair if for every player that follows the protocol, the number of transactions in that were appended by is at least .

Note that the ledger fairness definition from the previous section coincides with the R-fair ledger definition when for every , .

4 System Architecture

Our goal in this paper is to design a ledger protocol that financial institutions will be able to use. Such a protocol, besides being fair, secure against malicious attacks, and resilient to selfish behavior, must be simple to understand, implement, and maintain. Therefore, although we appreciate, from both theoretical and practical perspectives, complex protocols with many corner cases and clever optimizations, we try here to keep the design as simple as possible. The simple design not only reduces vulnerabilities, it also makes it much easier to reason about selfish behavior.

Committee and master.

We adopt the Vertical Paxos [34, 4] paradigm, which unlike the original Paxos protocol does not allow different quorums to be used for different transactions. Instead, there is a single (known to all) quorum, called committee, which partakes in agreeing on every transaction. Initially, the committee consists of all players. By requiring all committee members to endorse transactions, we create an incentive for all of them to sequence batches of transactions from all of them. To handle cases when committee members stop responding (e.g., due to a crash or an attack), a complementary master service performs reconfiguration: detecting such members and removing (or replacing) them. Thus, we logically implement two components: (1) a committee of players that runs the sequencing protocol to append transactions to the ledger, and (2) a master, which is responsible for progress and determines the QoS allocations; see Figure 1. The master is implemented by auditors using a minority-resilient synchronous BFT protocol [20]. Its impact on overall system performance is small, and so we do not focus on distributed implementation of the master in this paper. For our purposes, the master is a single trusted authority.

The committee’s sequencing protocol is implemented on top of a communication primitive we call detectable all-2-all (DA2A), as explained shortly. The sequencing protocol is safe and fair even if almost any minority of players are byzantine, but may stall even if only one player deviates from the protocol

The role of the master is to monitor the protocol and detect players that prevent progress. When lack of progress is observed, the master runs a recovery protocol to reconfigure the system: it removes byzantine players, possibly adds new players, and punishes slow or selfish players by reducing their ratio in the ledger (i.e., changes the QoS allocation).

Figure 1: System architecture. The sequencing protocol is run by a committee using a detectable all-to-all (DA2A) service and is monitored by a master.
QoS adjustment.

In addition to forming the committee, the master also determines the QoS that should be enforced by it. Every time the master reconfigures the system, it provides a new vector

that represents the ratio each committee member should get in the ledger. The portion of the log decided by the new committee satisfies the QoS-fairness requirement with respect to .

The initial value of the QoS allocations can be chosen based on real-world contracts among the financial institutions, or by their available throughput or payment. Subsequently, the master’s authority to modify the QoS enforced by the protocol empowers it both to ensure that rational players follow the protocol and to adjust the bandwidth allocations to player capabilities: Whenever the master detects a player that sends messages at a low rate or deviates from the protocol, it immediately reduces the ratio of transactions that player gets. A rational player, whose utility function is the ratio of transactions it appends to the log, will prefer to collaborate in fear of such punishment.

Detectable byzantine broadcast.

The master’s ability to use the punishment mechanism as well as to evict byzantine players relies on its ability to detect deviations from the protocol. We divide the possible deviations into two categories: active and passive. An active deviation occurs when a player tries to break consistency or fairness by sending messages that do not coincide with the protocol. By singing all messages with private keys, we achieve non-repudiation, i.e., messages can be linked to their senders and provide evidence of misbehavior, which the master can use to detect deviation from the protocol.

Passive deviation, which stalls the protocol by withholding messages, is much harder to detect. Even a single player can stop our sequencing protocol’s progress by simply not sending messages. Accurately detecting passive deviation is impossible in asynchronous systems, and is not an easy task even in synchronous systems with reliable communication. For example, if the protocol hangs waiting for to take an action following a message it expects from , we cannot, in general, know if is the culprit (because it never sent a message to ) or is at fault.

To address this problem we present a new broadcast abstraction, which we call detectable all-to-all (DA2A), and use it in our sequencing protocol.

Definition 2 (Da2a).

Consider players and a master. The API of DA2A supports broadcast and deliver(m) operations for the players, and a detect() operation for the master. Every player invokes broadcast for some message s.t. all the other players should deliver. The detect() operation performed by the master returns a set of players that deviate from the protocol together with corresponding proofs; for every two players s.t.  does not deliver a message from , contains (with a proof of ’s deviation) in case did not perform properly, and otherwise, it contains (with a proof of ’s deviation).

Note that in case is empty, all the players follow the protocol, meaning that all the players broadcast a message and deliver messages broadcast by all other players. Clearly, implementing DA2A, and in particular, its method requires an upper bound on message latency and a correct majority. We present an implementation under this assumptions in the next section.

5 FairLedger Protocol

We start by presenting our detectable all-to-all building block in Section 5.1. Then, we describe how we use it for our sequencing protocol in Section 5.2, and for the recovery protocol in Section 5.3. Finally, in Section 5.4, we give correctness arguments.

5.1 Detectable all-to-all (DA2A)

Communication patterns.

We start by discussing two ways to implement all-to-all communication over reliable links. The simplest way to do so is direct all-to-all, in which broadcast(m) simply sends message to all other players (see Figure 1(a)). This implementation has the optimal cost of 1 hop and messages, but cannot reveal any information about passive deviations: In case does not deliver any message from , the master has no way of knowing whether did not send a message to , or is lying about not receiving the message.

Another way of implementing all-to-all communication is by using a subset of the players as relays. We call this approach relayed all-to-all. In this approach, a broadcast() operation sends to all the players, and when a relay receives a message for the first time, it forwards it to all players (see Figure 1(b)). For relays this requires 1-2 hops depending whether any of the players are byzantine and messages.

(a) direct all-to-all
(b) relayed all-to-all
Figure 2: All-to-all communication patterns.

Note that when using relays, it is possible to have players send their messages only to the relays. This induces lower overhead but takes longer in case all players cooperate. This approach may be used when direct all-to-all communication is not feasible. For example, in case the system is deployed on top of private physical links, such links might not necessarily exist among all pairs of players. Similarly, note that the relayed communication does not necessarily have worse latency than direct all-to-all, since the latter depends on the slowest link, while in relayed communication we can pick the relays with the fastest links.

Detectability.

Obviously, we cannot implement detectable all-to-all (DA2A) broadcast with direct all-to-all communication. On the other hand, if we assume that all non-byzantine players follow the protocol, we can perfectly detect a guilty party by using relays. A detect operation by the master waits time to make sure that all messages that were sent arrived, and then for every two players and s.t.  does not deliver a message from , it asks the relays whether they received a message from . The relays’ replies are signed and used as proof of a deviation. In case relays say yes, then at least one correct relay received a message from and sent it to , meaning that received it – recall that we assume reliable communication – and deviated from the protocol by not delivering it. Otherwise, did not send a message to all relays, meaning that deviated from the broadcast protocol. Thus, either way, the master can detect the faulty player. It is important to notice that the detection is accurate: it has no false positives, and finds the player responsible for every message omission.

In order to prove that following the protocol is a Nash equilibrium for rational player, we need the method to tolerate one more possible deviation by a non-byzantine player; that is, we need to accurately detect passive deviations that stall progress even if players deviate from the protocol. Note that when a progress problem is caused by a player failing to deliver a message broadcast by player , we know that at least one of them deviates from the protocol. Thus, at most of the remaining players may deviate. Therefore, it is enough to pick players different from and to be the relays in order to identify the culprit in case the problem is not solved. Note that this is always possible since we assume .

Practical deployment.

Since we assume byzantine failures are very rare, a practical strategy is to employ direct all-to-all communication (in case it is feasible) for as long as there is progress. In case direct communication among players is unavailable or slow, we can use any number of relays (e.g., one relay). We call this the normal mode. In case of a progress problem, we switch to the degraded alert mode, with relays. If the progress problem is not resolved, it is guaranteed that the master detects the misbehaving players, and replaces them. At this point, we can switch the system back to the normal mode. Note, however, that byzantine players can avoid detection by behaving properly in the alert mode. They can thus force the system to stay in this mode and continue to send more messages (by using relays), but do not compromise progress. In the meantime, an official external authority (e.g., FBI or Interpol) can investigate the security breach to find the misbehaving component.

5.2 Sequencing protocol

The sequencing protocol works in epochs

, where in each epoch every participating player gets an opportunity to append one transaction or one fixed-size batch of transactions to the log. The key mechanism to ensure fairness is to commit all the epoch’s transactions to the log atomically (all or nothing). Since we assume that players have infinite streams of transactions they wish to clear, they always have enough transactions to append. And if not, they can always append an empty (dummy) transaction.

An operation locally buffers for inclusion in an ensuing epoch, and waits for it to be sequenced. Each epoch consists of three DA2A communication rounds (see Figure 3) among players participating in the current epoch, proceeding as follows:

  1. Broadcast a transaction or batch to all; upon receiving transactions from all (including self), order received transactions by some deterministic rule and sign the hash of the sequence.

  2. Broadcast to all; receive from all and verify that all players signed the same hash.

  3. Broadcast (signed) to all, return when receive the same message times.

The sequencing protocol is described in Algorithm 1. For clarity, we do not include signature manipulation although all the messages are signed and verified; we also present a version where the QoS allocation is equal for all players.

1: Local state:  
2:      a set of players
3:     , initially,
4:      queue of new transactions for append calls
5:      sequence of triples , where is a      sequence of transactions, is a set of signed hashes, and      is a set of signed commit messages, initially empty.
6:
7: Sequencing protocol
8:
9:while true do
10: round 1
11:     broadcast() to committee
12:     collect responses in
13:     wait until
14:     order by some predefined function
15:     
16: round 2
17:     broadcast() to committee
18:     collect responses in
19:     wait until
20:     if not all the hashes in are the same then
21:          complain to the master active deviation detected
22:          wait for a message from the master      
23: round 3
24:     broadcast() to committee
25:     collect responses in
26:     wait until
27:      move to next epoch
28:     
29:
30: Reconfiguration
31:upon receiving  from the master do
32:     send to the master
33:     wait for message from the master
34:     
35:     
36:     if  then
37:          empty
38:     else
39:          
40:          
41:                
42:     continue
Algorithm 1 FairLedger committee member pseudocode.

The purpose of the first round is to broadcast all the transactions of the epoch. The second round ensures safety; at the end of this round each player validates that all other players signed the same hash of transactions, meaning that only this hash can be committed in the current epoch. The last round ensures recoverability during reconfiguration as we explain in Section 5.3 below. Note that we achieve fairness by waiting for all players; an epoch is committed only if all the players sign the same hash, and since each player signs a hash that contains its own transaction, we get that either all the players’ transactions appear in the epoch, or the epoch is not committed.

Read operations.

Since all players make progress together, they all have up-to-date local copies of the ledger. A operation simply returns the last committed transaction in the local ledger, where for every returned sequence of transactions pertaining to some epoch k, it attaches a proof for . We need the attached proof in order to make sure byzantine players do not lie about committed transactions. The proof is either (1) a message from the master that includes (more details below), or (2) epoch round 3 messages, each of which contains a hash of .

Figure 3: Sequencing protocol.
Supporting quality of service.

To support non-uniform quality of service we include different batch sizes from different players in each epoch. For example, for players , and vector , appends a batch of two transactions in every epoch, whereas and each append one transaction.

Asynchronous broadcast.

The first round of our sequencing protocol exchanges transactions (data), the second round exchanges hashes of the transactions (meta-data), and the last round exchanges commit messages (meta-data). Hence, the first round consumes most of the bandwidth. In order to increase throughput, we decouple data from meta-data and asynchronously broadcast transactions (i.e., execute the first round) of every epoch as soon as possible. However, in order to be able to validate transactions, we perform rounds 2 and 3 sequentially.

In other words, we divide our communication into a data path and a meta-data path, where the data path is out-of-order and the meta-data path orders the data. This is a common approach, used, for example, in atomic broadcast algorithms that use reliable broadcast to exchange messages and a consensus engine to order them [19, 12].

5.3 Recovery

Since we use a PKI, proving active deviations is easy, and every time a player sees evidence for active deviation, it sends it to the master. One example appears in Algorithm 1 line 21: a player gets two different hashes (corresponding to different sequences of transactions) in the second round, in which case, to ensure correctness, it cannot move on to round three. Instead, it complains to the master and waits for reconfiguration. When the master receives both hashes it checks which of the players signed two different transactions in the first round and issues a reconfiguration to remove that player. Other active deviations, e.g., incorrect messages formats, are handled in a similar manner; for simplicity we omit this from the code, and focus only on processing of correctly-formatted messages.

The master’s protocol is described in Algorithm 2. To detect passive deviations that prevent progress, we use the operation exposed by the DA2A abstraction. The sequencing protocol is simply an infinite sequence of DA2A instances. Therefore, the master sequentially invokes (Algorithm 2, line 6) for all epochs, until it returns a non-empty set indicating that the sequencing protocol is stalled, in which case the master invokes reconfigure (Algorithm 2, lines 10-19).

First, it stops the current configuration and learns its closing state by sending a message to the current committee. To prove to the players on the committee that a reconfiguration is indeed necessary, the master attaches to the message proof reconfiguration is warranted. this can be evidence of active deviation, or a proof of passive deviation returned from the method of the DA2A. It can also be a real-world contract (signed by a CA) adding a new player or increasing a player’s ratio. When a player receives a message (Algorithm 1, lines 31-42), it validates the proof for the reconfiguration, sends its local state (ledger) to the master, and waits for a message from the master. When a player receives with a new configuration, it validates that every player addition or remove is justified by a proof, and ignores requests that do not have a valid proof.

State transfer.

Note that while a byzantine player cannot make the master believe that an uncommitted epoch was committed (a committed epoch must be signed by all the epoch’s players), it can omit a committed epoch when asked (by the master) about its local state. Such behavior, if not addressed, could potentially lead to a safety violation: suppose that some byzantine player does not broadcast its last message in the third round in epoch , but delivers messages from all other players. In this case, has proof that epoch is committed, and may return these transactions in response to a read. However, no other player has proof that epoch is committed and withholds epoch k’s commit from the master. In this case, the new configuration will commit different transactions in epoch , which will lead to a safety violation when a read operation will be performed.

The third round of the epoch is used to overcome this potential problem. If the master observes that some player receives all messages in the second round of epoch (Algorithm 2, line 15), it concludes that some byzantine player may have committed this epoch. Therefore, in this case, the master includes epoch in the closing state. Since the private keys of byzantine players are unavailable to the master, it signs the epoch with its own private key, and sends it to all players in the new configuration (committee) as the opening state. A player that sees an epoch with the master’s signature refers to it as if it is signed by all players. (Recall that the master is a trusted entity, emulated by a BFT protocol.)

1: Local state:  
2:      a set of players
3:for all epoch , in order, do
4:     for all instance of DA2A in , in order, do
5:          wait until
6:          
7:          if  then
8:               reconfigure                
9:
10:procedure reconfigure()
11:     ;
12:     send to committee
13:     wait time
14:     
15:     if  that contains same hashes then
16:          send to committee
17:     else
18:          send to committee
19:          go to in line 4 need to check this epoch again      
20:
21:upon receiving  from player  do
22:     
23:     
Algorithm 2 Code for the master.

5.4 Protocol analysis

To prove that our protocol is correct we need to show that (1) it is safe and fair in case all the rational players follow the protocol and there are at most byzantine players (Section 5.4.1), and (2) following the protocol is an equilibrium for all rational players (Section 5.4.2). We refer to a player that follows the protocol as follower.

5.4.1 Safety and fairness

Safety.

First we show that if non-byzantine players follow the protocol, then there are always at lest followers on the committee. Initially, the committee consists of players, of which at least members follow the protocol. Now since the operation of the DA2A abstraction never returns members that follow the protocol in case there are at least such members, we get that the master never removes a player that follows the protocol, and thus there are always at least followers on the committee.

Now we show that if one player commits a sequence of transactions in epoch , no other committee member commits a different sequence of transactions in epoch . Note that in order to commit a sequence of transactions , players must have proof that is allowed to be committed. One option for such proof is to have a message from the master, and another option is to have round 3 messages from committee members, each of which contains a hash of .

First note that every two members that commit epoch after receiving from the master commit the same sequence of transactions because the master does not send different messages in the same epoch. Second, since all followers sends the same hash of transactions to all committee members in round , all followers that send round 3 messages include the same hash therein. And since players that commit with the second option must have in the proof at least one message from a follower, all players that commit with the second option commit the same sequence. It remains to show that members that commit with the first option and members that commit with the second one commit the same sequence of transactions. Let be a committee member that commits a sequence of transactions with the second option. Since receives messages in round 3, then it received a round 3 message from at least one follower . Moreover, sent the round 3 message to before it received from the master. In addition, since sent a round 3 message, then it received round 2 messages that contains the hash of from all committee members before it received from the master. Therefore, includes all the rounds 2 messages that it received in the reply to the master, and thus the master includes in the closing state, and sends to the new committee. Hence, all members that commit with the first option commit as well.

Fairness.

We need to show that every committed epoch contains transactions of all committee members. First, note that the hash of transactions each player sends in round 2 contains its own transaction. Second, a player commits a sequence of transactions only if some player receives the hash of from all committee members in the second round. Therefore, a player commits a sequence of transactions only if it contains transactions of all committee members.

5.4.2 Rationality

We now show that following the protocol is an equilibrium for all rational players. We first discuss committee players, and then the auditors that emulate the master.

Committee players.

First, players cannot increase their ratio in the ledger by simply submitting more transactions than their allocated QoS, because the QoS allocation is known to all, and the excessive messages will be ignored. In addition, since a round 2 message is required from all committee members in order for an epoch to be committed, and since no committee member will sign a hash on a sequence that excludes its transaction, we get that a player on the committee cannot be excluded from a committed epoch. Therefore, players cannot increase their ratio in the ledger by (any) active deviation from the protocol. Moreover, since the master may punish them for an active deviation by reducing their ratio (or removing them from the committee), following the protocol is a better strategy for them than any active deviation.

As for passive deviations, a possible strategy for a rational player is to try to “frame” another player and get it removed by the master, in which case ’s ratio in the ledger will grow. It can try to do this by not sending messages to or by lying about not delivering ’s messages. Now recall that our DA2A abstraction never wrongly accuses players for passive deviation as long as there are at most deviating players. Since we assume that players do not collude, even if deviates, there are at most such players ( plus byzantine players). Therefore, it is impossible for to increase his ratio by passive deviation. Moreover, since we assume that for a fixed ratio players prefer long ledgers, sending protocol messages as fast as possible is an equilibrium.

Finally, we argue that a rational player will ignore reconfiguration change requests that do not have proofs. This is because all players need to move to the same new configuration in order to commit transactions, and so accepting invalid remove will stall the protocol.

Master auditors.

Consider first auditors who are not players. In case there is progress, the utility function of the auditors is the number of players on the committee. Otherwise, it is zero. Therefore, auditors have no incentive to remove players as long as there is a progress, and it is in their best interest to detect and remove deviating players when they stall the sequencing protocol.

Second, consider auditors who also act as players. Again, the only possible strategy for an auditor that is also a player to increase its utility function is to try to remove another player from the committee. However, since players will not remove without a valid proof from the DA2A, cannot cause ’s removal even if byzantine auditors also try to remove .

6 FairLedger implementations

We implement FairLedger based on Iroha’s framework, written in C++. We intend to contribute our code to Hyperledger. Therefore, we only change Iroha’s consensus algorithm (called Sumeragi [46]) with our sequencing protocol, while keeping other components almost untouched (e.g., cryptographic components, communication layer, and client API). This implementation is described in Section 6.1.

In order to evaluate the FairLedger protocol itself, independently of the Hyperledger framework, we implement another version of FairLedger’s sequencing protocol based on PBFT’s code structure, written in C++ as well. This implementation is described in Section 6.2.

6.1 Hyperledger implementation

The Hyperledger framework consists of two types of entities, participants (committee members in our case) that run the protocol, and clients that generate transactions and send them to participants for sequencing.

The FairLedger protocol at each participant is orchestrated by a single thread, referred to as logic thread. The logic thread receives transactions from clients as well as messages from other participants into a wait-free incoming event queue. The connections between clients and participants are implemented as GRPC sessions [29] (internally using TCP) sending Protobuf messages [30]. The logic thread maintains a map of epoch numbers to epoch states. An epoch state consists of verified events of that epoch, one event slot per participant.

Upon receiving a new message, the logic thread verifies it and decides based on the epoch state whether it needs to broadcast a message to other participants. Whenever broadcast is required, the logic thread creates and signs the new message, determines the set of its destinations (based on the epoch state), and creates send-message tasks, one per destination. These tasks are handed over to a work-stealing thread pool, in which each thread communicates with its destination over a GRPC connection (See Figure 4).

Figure 4: FairLedger implementation in Hyperledger.

Iroha is built in a modular fashion, which allows us to swap Sumeragi with FairLedger in a straightforward way. Our evaluation (in Section 7.2) shows that additional Iroha components beyond the consensus engine adversely affect performance. Yet, these components are essential for Hyperledger. For example, Iroha supports multiple operating systems (including Android and iOS) and can be activated from java script code (via a web interface). Such features are essentials for client-facing systems like Iroha, and using standard libraries such as GRPC enables simple and clean development, which is less prone to bugs.

6.2 Standalone implementation

To eliminate the effect of the overhead induced the Hyperledger framework, we further evaluate the FairLedger protocol by itself, independently of the additional components. To this end, we employ the popular PBFT code [16] as our baseline. PBFT uses UDP channels, and is almost entirely self-contained, it depends only on one external library, for cryptographic functions.

In this implementation of FairLedger, the logic thread directly communicates with clients and participants over UDP sockets. As in our Hyperledger implementation, the logic thread uses a map of epoch numbers to epoch states, and follows the same logic for generating new messages.

Using UDP requires us to handle packet loss. We use a dedicated timer thread that wakes up periodically, (after a delay determined according to the line latency), verifies the progress of the minimal unfinished epoch, and requests missing messages from the minimal epoch if needed.

7 Evaluation

We now evaluate our FairLedger protocol using the two prototypes. The Hyperledger prototype is comparable to Iroha, and the standalone prototype is comparable to PBFT. Section 7.1 describes the environment in which we conduct our experiments and our test cases. Section 7.2 evaluates our Hyperledger prototype, Section 7.3 compares FairLedger to PBFT, and Section 7.4 evaluates performance under different QoS allocations.

7.1 Experiment setup

Configuration.

We conduct our experiments on Emulab [49]. We allocate 32 servers: 16 Emulab D710 machines for protocol participants, and 16 Emulab PC3000 machines for request-generating threads (clients). Each D710 is a standard machine with a 2.4 GHz 64-bit Quad Core Xeon E5530 Nehalem processor, and 12 GB 1066 MHz DDR2 RAM. Each PC3000 is a single 3GHz processor machine with 2GB of RAM.

Given that our system is intended for deployment over WAN among financial institutions, we configure the network latency among participants to 20ms. In Emulab, the communication takes place over a shared 1Gb LAN, denoted S-LAN. Each client is connected to a single (local) participant with a zero latency 1Gb LAN. In case clients need to communicate directly with remote participants (as they do in Iroha’s design), they do so over S-LAN, i.e., with a latency penalty. We benchmark the system at its throughput saturation point.

In our Hyperledger prototype evaluation, we use version v0.75. Since in normal mode we assume no byzantine behavior, we configure Iroha with no faulty participants, so it signs each transaction once. The request-generating threads create transactions formatted according to Iroha’s specification (given in Protobuf), which consists of a few hundreds of bytes of data.

In our standalone prototype evaluation, we create packets of a similar size, namely 512B of data, as this is the transaction size in our expected use case.

Test scenarios.

We compare Iroha and PBFT to FairLedger’s two operation modes – the failure-free normal mode and the alert mode activated in case of attacks.

We evaluate FairLedger’s normal mode both using direct all-to-all and using a single relay. We evaluate the alert mode both under attack of a single byzantine participant, and without an attack. In the alert mode we assume that =1, and hence employ 3 relays. In the attack scenario the byzantine participant remains undetectable by the master. Specifically, one of the relays withholds messages that it needs to send to one of the other relays.

In Section 7.4 we evaluate FairLedger with a slow participant that requires a lower QoS allocation.

7.2 Hyperledger

In order to deal with failures, FairLedger needs 2+3 participants, and Iroha needs 3+1. However, Iroha only uses 2+1 signatures, and later broadcasts the committed requests to all 3+1 participants. We chose to use only 2+1 participants in favor of Iroha, as it reduces Iroha’s broadcast cost. We scale our evaluation from 3 to 9 participants. Iroha’s clients perform asynchronous operations, and so the operation latency is always zero. Hence, we focus this comparison on throughput.

Figure 5 compares the two modes of FairLedger with Iroha. Results show that FairLedger’s unrelayed normal mode has roughly the same throughput with 3 participants as Iroha, and much higher throughput (up to 3.5x) than Iroha with more participants. In both algorithms, due to the usage of GRPC, the bottleneck is the broadcast. FairLedger commits more transactions per broadcast, since each epoch consists of one message from every participant, whereas Iroha pays the cost of broadcast for every client request. Therefore, Iroha suffers more as the broadcast cost increases (as we have more participants to send messages to).

FairLedger’s relayed modes incur a 22% reduction in throughput with 3 participants, and even more as the number of participants increases, because the relays worsen the bottleneck by issuing additional broadcast operations. Since in this implementation relaying is very costly, using relays in the normal mode is undesirable, and the performance reduction in alert mode is significant – up to 66%. Note that a single router hampers FairLedger as much as three routers do. This is because of the protocol structure, where all nodes progress at the rate of the slowest one.

Figure 5: Throughput of FairLedger and Iroha over simulated WAN.

Byzantine behavior slightly improves performance since withholding messages reduces the load on the relays. However, this effect is negligible.

7.3 Standalone prototype

We evaluate our FairLedger prototype that is based on PBFT’s code structure. We configure PBFT parameters in a way that maximizes PBFT’s throughput, enabling batching and enough outstanding client-requests to saturate the system. We indeed achieve similar results to those reported in recent work running PBFT over WAN [40]. In order to deal with failures, PBFT requires 3+1 participants, so we run the evaluation with 4 to 16 participants. Figure 6 shows the throughput and latency achieved by the protocols.

First, we observe that the absolute throughput is 5x higher than with Iroha. This is thanks to PBFT’s optimized bare-metal approach, which sacrifices modularity and maintainability for raw performance. We further see that FairLedger’s normal mode has higher throughput than PBFT. This is because PBFT’s clients are directed to a single participant (referred to as primary or leader), while FairLedger’s clients address their nearest participant, distributing the load evenly among them.

FairLedger’s alert mode with three relays reduces throughput by 30%-40% compared to the normal mode. Note that with 4 participants, PBFT achieves about 25% higher throughput than FairLedger’s alert mode, but as the number of participants increases, the gap closes, reaching 9% lower throughput than PBFT’s with 16 participants.

The alert mode with one relay shows better performance than PBFT, and even better performance that FairLedger’s normal mode in some cases. This happens because the single relay helps FairLedger overcome packet loss, while the cost of broadcasting a message over UDP is low. The packet loss is configured to be zero, i.e., there is no simulated loss, but our evaluation shows that it slightly increases with the number of participants due to load on the S-LAN.

We measure latency below the saturation point. The results for all configuration sizes are similar, and so we depict in Figure 7 only the results with 10 nodes. Error bars depict the standard deviation. The average latency of FairLedger clients in the normal mode is 64ms, which is close to the network latency of 3 rounds of 20ms. Indeed when communicating over WAN, the performance penalty of signing and verifying signatures is negligible. PBFT’s average latency is about 106ms, and consists of 3 PBFT rounds and 2 client-primary communication steps.

The average latency of FairLedger’s alert mode with a byzantine relay is 86ms, since it consists of 4 rounds of communication. The reason is that one participant is always one round behind the rest due to missing the byzantine participant’s message. Since in the third round he require messages from +1 participants (and not all of them), there is no need to wait for the lagging participant’s round 3 message, and the epoch ends after 4 rounds. The latency of the alert modes without byzantine participants is 64ms, similarly to the normal mode.

Figure 6: Throughput and latency of FairLedger and PBFT over simulated WAN.

7.4 QoS evaluation

We next show how QoS adjustment may help mitigate the throughput reduction due to a slow participant. We do so by testing the system in a relatively low rate, focusing solely on the effect of QoS adaptation.

We experiment with one slow participant that produces messages at half the rate of other participants. Figure 7 compares three scenarios: In the first, QoS is not enabled, and the slow participant dictates the rate of committed epochs. In the second the slow client’s QoS is adjusted to 50%. In the third scenario all participants are fast.

Since the rate of the slow participant is half the rate of the fast participants, when QoS is adjusted, fast participants commit two transactions in every epoch, while the slow participant commits one. Results shows that when enabling QoS, the actual throughput that each fast participant receives is identical to the case in which all participants are fast.

Figure 7: QoS of FairLedger.

8 Related Work

Fairness and rationality.

Our work is indebted to recent works that combine game theory and distributed systems 

[3, 2, 5, 36, 41, 23, 24, 47] to implement different cooperative services. In particular, we adopt a BAR-like model [5, 36, 41]. As in previous works on BAR fault tolerance [5, 36], we assume non-colluding rational players, whereas colluding players are deemed byzantine. As in [41], , we do not assume altruistic players – all non-byzantine players are rational in our model.

Practical byzantine fault tolerant consensus protocols [28, 14, 1, 50, 15, 32, 8, 39, 37, 6, 17, 48, 7, 40, 38, 22] have been studied for more than two decades, but to the best of our knowledge, none deals with rational players. Moreover, we are only familiar with two previous works that consider some notion of fairness: Prime [7] and Honeybadger [40].

One of the important insights in Prime [7] is that the freedom of the leader to propose transactions must be restricted and verified by other participants. To this end, Prime extends PBFT [15] with three additional all-to-all communication rounds at the beginning, in which participants distribute among them self transactions they wish to append to the ledger. The leader proposes in round 4 a batch of transactions that includes all sets of transactions it gets in round 3 from participants. Since each transaction proposed by some participant is passed to the leader by at least participants, its participant may expect its transaction to be proposed. In case a participant send a request and the leader does not propose it for some time , the participant votes to replace the leader. As a result, Prime guarantees that during synchronous periods every transaction is committed in a bounded time .

Similarly to FairLedger, Prime uses batching to commit transactions of different participants atomically together, and uses a PKI to ensure fairness and provide proofs that the batches are valid. However, their fairness guarantee is weaker than ours. Since the first three rounds are asynchronous (i.e., participants do not wait to hear from all, but rather echo messages as soon as they receive them), there is no bound on the ratio of transactions issued by different participants that are committed during . More importantly, Prime assumes that all non-byzantine participants follow the protocol, and we do not see a simple way to adjust to overcome rational behavior. For example, there is no incentive for participants to echo transactions issued by other participants in the first three rounds; to the contrary – the less they echo, the less transactions from other participants will be proposed by the leader.

Honeybadger [40] is a recent protocol for permissioned blockchians, which is built on top of an optimization of the atomic broadcast algorithm by Cachin et al. [12]. It works under fully asynchronous assumptions and provides probabilistic guarantees. Honeybadger assumes a model with servers and infinitely many clients. In brief, clients submit transactions to all the servers, and servers agree on their order in epochs. In each epoch, participants pick a batch of transactions (previously submitted to them by clients) and use an efficient variation of Bracha’s reliable broadcast [10] to disseminate the batches. Then, participants use a randomized binary consensus algorithm by Ben-Or et al. [9] for every batch to agree whether or not to include it in the epoch.

Similarly to FairLedger, they use epochs to batch transactions proposed by different players, and commit them atomically together. Their (probabilistic) fairness guarantee is stronger than the one in Prime: they bound the number of epochs (and accordingly the number of transactions) that can be committed before any transaction that is successfully submitted to servers. However, if we adapt their protocol to our model where we do not consider clients and require fairness among players, we observe that their guarantee is weaker than ours: Since communication is asynchronous, it may take arbitrarily long for a transaction by player to get (be submitted) to players, and in the meantime, other players may commit an unbounded number of transactions. In addition, their protocol uses building blocks (e.g., Bracha’s broadcast [10] and Ben-Or et al. [9] randomized consensus) that are not designed to deal with rational behavior. Moreover, rational players that wish to increase their ratio in the ledger will not include transactions issued by other players in their batches.

Finally, it worth noting that both Prime and Honeybadger are much more complex than FairLedger. Prime’s description in [7] is spread over more than 6 pages, and the reader is referred to their full paper for more details. Honeybadger combines several building blocks (e.g., the atomic broadcast by Cachin et al. [12]), each of which is complex by itself.

BFT protocols and assumptions.

The vast majority of the practical BFT protocols [22, 38, 50, 32, 8, 39, 37, 6], staring with PBFT [15] assume a model with symmetric servers (participants) that communicate via reliable eventually synchronous channels. Therefore, they can tolerate at most byzantine failures [25], and cannot accurately detect participants’ passive deviations (withholding a message or lying about not receiving it); intuitively, it is impossible to distinguish whether a player maliciously withholds its message or the message is just slow. Since passively deviating participants cannot be accurately detected, they cannot be punished or removed, and thus byzantine participants can forever degradate performance [17], and rational behavior cannot be disincentivize.

We, in contrast, assume synchronous communication, which together with the use of a PKI allows FairLedger to be simple, tolerate almost any minority of byzantine failures, guarantee fairness, detect passive as well as active deviations, and penalize deviating players. FairLedger uses the synchrony bound only to detect and remove byzantine players that prevent progress, allowing it to be very long (even minutes) without hurting normal case performance. To reduce the cost of using a PKI, FairLedger signs only the hashes of the messages. Moreover, in WAN networks the cost of PKI is reduced due to longer channels delays.

As illustrated by works on Prime [7] and Aardvark [17] most BFT protocols are vulnerable to performance degradation caused by byzantine participants. To remedy this, Aardvark focuses on improving the worst case scenario. We, on the other hand, follow the approach taken in Zyzzyva [32], and optimize the failure-free scenario. We take this approach because byzantine failures are rare in financial settings, and one can expect break-ins to be investigated remedied.

We implement FairLedger inside Iroha [45], which is part of the Hyperledger [27] project. Specifically, we substitute the ledger protocol in Iroha, which was originally based on the BFT protocol in BChain [22], with FairLedger. In brief, their protocol consists of a chain of participants, where the first order transactions. To deal with a passively deviating participant that withholds messages in the chain, they transfer both the sender and the receiver (although only one of them deviates from the protocol) to the back of the chain, where they do not take part in ordering transactions. Similarly to FairLedger, they assume synchrony with coarse time bounds and use it to detect passive deviations. However, in contrast to FairLedger, they do no accurately detect byzantine players and punish correct ones as well. Moreover, since the head of the chain decides on the transaction order, Iroha does not guarantee fairness.

Broadcast primitives.

In order to detect passive deviation we define DA2A, a new detectable all-to-all communication abstraction. Even though many practical byzantine broadcasts [11, 43, 12, 21, 13, 26, 19] were proposed in the past, DA2A is the first to extend its API with a method, which accurately returns all misbehaving players.

9 Discussion

Blockchains are widely regarded as the trading technology of the future; industry leaders in finance, banking, manufacturing, technology, and more are dedicating significant efforts towards advancing this technology. The heart of a blockchain is a distributed shared ledger protocol. In this paper, we developed FairLedger, a novel shared ledger protocol for the blockchain setting. Our protocol features the first byzantine fault-tolerant consensus engine to ensure fairness when all players are rational. It is also simple to understand and implement. We integrated our protocol into Hyperledger, a leading industry blockchain for business framework, and showed that it achieves superior performance to existing protocols therein. We further compared FairLedger to PBFT in a WAN setting, achieving better results in failure-free scenarios.

References

  • [1] Abd-El-Malek, M., Ganger, G. R., Goodson, G. R., Reiter, M. K., and Wylie, J. J. Fault-scalable byzantine fault-tolerant services. In ACM SIGOPS Operating Systems Review (2005), vol. 39, ACM, pp. 59–74.
  • [2] Abraham, I., Alvisi, L., and Halpern, J. Y. Distributed computing meets game theory: combining insights from two fields. Acm Sigact News 42, 2 (2011), 69–76.
  • [3] Abraham, I., Dolev, D., and Halpern, J. Y. Distributed protocols for leader election: A game-theoretic perspective. In International Symposium on Distributed Computing (2013), Springer, pp. 61–75.
  • [4] Abraham, I., and Malkhi, D. Bvp: Byzantine vertical paxos, 2016.
  • [5] Aiyer, A. S., Alvisi, L., Clement, A., Dahlin, M., Martin, J.-P., and Porth, C. Bar fault tolerance for cooperative services. In ACM SIGOPS operating systems review (2005), vol. 39, ACM, pp. 45–58.
  • [6] Amir, Y., Coan, B., Kirsch, J., and Lane, J. Customizable fault tolerance forwide-area replication. In Reliable Distributed Systems, 2007. SRDS 2007. 26th IEEE International Symposium on (2007), IEEE, pp. 65–82.
  • [7] Amir, Y., Coan, B., Kirsch, J., and Lane, J. Prime: Byzantine replication under attack. IEEE Transactions on Dependable and Secure Computing 8, 4 (2011), 564–577.
  • [8] Amir, Y., Danilov, C., Kirsch, J., Lane, J., Dolev, D., Nita-Rotaru, C., Olsen, J., and Zage, D. Scaling byzantine fault-tolerant replication towide area networks. In Dependable Systems and Networks, 2006. DSN 2006. International Conference on (2006), IEEE, pp. 105–114.
  • [9] Ben-Or, M., Kelmer, B., and Rabin, T. Asynchronous secure computations with optimal resilience. In Proceedings of the thirteenth annual ACM symposium on Principles of distributed computing (1994), ACM, pp. 183–192.
  • [10] Bracha, G. Asynchronous byzantine agreement protocols. Information and Computation 75, 2 (1987), 130–143.
  • [11] Bracha, G., and Toueg, S. Asynchronous consensus and broadcast protocols. Journal of the ACM (JACM) 32, 4 (1985), 824–840.
  • [12] Cachin, C., Kursawe, K., Petzold, F., and Shoup, V. Secure and efficient asynchronous broadcast protocols. In Annual International Cryptology Conference (2001), Springer, pp. 524–541.
  • [13] Cachin, C., and Tessaro, S. Asynchronous verifiable information dispersal. In Reliable Distributed Systems, 2005. SRDS 2005. 24th IEEE Symposium on (2005), IEEE, pp. 191–201.
  • [14] Canetti, R., and Rabin, T. Fast asynchronous byzantine agreement with optimal resilience. In

    Proceedings of the twenty-fifth annual ACM symposium on Theory of computing

    (1993), ACM, pp. 42–51.
  • [15] Castro, M., Liskov, B., et al. Practical byzantine fault tolerance. In OSDI (1999), vol. 99, pp. 173–186.
  • [16] Castro, M., Liskov, B., et al. BFT - Practical Byzantine Fault Tolerance (software). http://www.pmg.csail.mit.edu/bft/#sw, 2017. [Online; accessed 16-Apr-2017].
  • [17] Clement, A., Wong, E. L., Alvisi, L., Dahlin, M., and Marchetti, M. Making byzantine fault tolerant systems tolerate byzantine faults. In NSDI (2009), vol. 9, pp. 153–168.
  • [18] CoreOS. etcd – a highly-available key value store for shared configuration and service discovery. https://coreos.com/etcd/, 2017. [Online; accessed 16-Apr-2017].
  • [19] Cristian, F., Aghili, H., Strong, R., and Dolev, D. Atomic broadcast: From simple message diffusion to Byzantine agreement. Citeseer, 1986.
  • [20] Dolev, D., and Strong, H. R. Authenticated algorithms for byzantine agreement. SIAM Journal on Computing 12, 4 (1983), 656–666.
  • [21] Drabkin, V., Friedman, R., and Kama, A. Practical byzantine group communication. In Distributed Computing Systems, 2006. ICDCS 2006. 26th IEEE International Conference on (2006), IEEE, pp. 36–36.
  • [22] Duan, S., Meling, H., Peisert, S., and Zhang, H. BChain: Byzantine Replication with High Throughput and Embedded Reconfiguration. Springer International Publishing, Cham, 2014, pp. 91–106.
  • [23] Feigenbaum, J., Papadimitriou, C., and Shenker, S. Sharing the cost of muliticast transmissions (preliminary version). In Proceedings of the thirty-second annual ACM symposium on Theory of computing (2000), ACM, pp. 218–227.
  • [24] Feldman, M., Papadimitriou, C., Chuang, J., and Stoica, I. Free-riding and whitewashing in peer-to-peer systems. IEEE Journal on Selected Areas in Communications 24, 5 (2006), 1010–1019.
  • [25] Fischer, M. J., Lynch, N. A., and Merritt, M. Easy impossibility proofs for distributed consensus problems. Distributed Computing 1, 1 (1986), 26–39.
  • [26] Fitzi, M., and Hirt, M. Optimally efficient multi-valued byzantine agreement. In Proceedings of the twenty-fifth annual ACM symposium on Principles of distributed computing (2006), ACM, pp. 163–168.
  • [27] Foundation, T. L. Hyperledger – blockchin technologoes for business. https://www.hyperledger.org/. [Online; accessed 16-Apr-2017].
  • [28] Golan Gueta, G., Abraham, I., Grossman, S., Malkhi, D., Pinkas, B., Reiter, M. K., Seredinschi, D.-A., Tamir, O., and Tomescu, A. Sbft: a scalable decentralized trust infrastructure for blockchains. arXiv preprint arXiv:1804.01626 (2018).
  • [29] Google. GRPC – A high performance, open-source universal RPC framework. http://www.grpc.io/, 2017. [Online; accessed 16-Apr-2017].
  • [30] Google. Protocol Buffers - Google’s data interchange format. https://github.com/google/protobuf, 2017. [Online; accessed 16-Apr-2017].
  • [31] Hunt, P., Konar, M., Junqueira, F. P., and Reed, B. Zookeeper: Wait-free coordination for internet-scale systems. In USENIX annual technical conference (2010), vol. 8, p. 9.
  • [32] Kotla, R., Alvisi, L., Dahlin, M., Clement, A., and Wong, E. Zyzzyva: speculative byzantine fault tolerance. In ACM SIGOPS Operating Systems Review (2007), vol. 41, ACM, pp. 45–58.
  • [33] Lamport, L. Paxos made simple, ACM Sigact News, 2001.
  • [34] Lamport, L., Malkhi, D., and Zhou, L. Vertical paxos and primary-backup replication. In Proceedings of the 28th ACM symposium on Principles of distributed computing (2009), ACM, pp. 312–313.
  • [35] Lamport, L., Shostak, R., and Pease, M. The byzantine generals problem. ACM Transactions on Programming Languages and Systems (TOPLAS) 4, 3 (1982), 382–401.
  • [36] Li, H. C., Clement, A., Wong, E. L., Napper, J., Roy, I., Alvisi, L., and Dahlin, M. Bar gossip. In Proceedings of the 7th symposium on Operating systems design and implementation (2006), USENIX Association, pp. 191–204.
  • [37] Li, J., and Maziéres, D. Beyond one-third faulty replicas in byzantine fault tolerant systems. In NSDI (2007).
  • [38] Liu, S., Cachin, C., Quéma, V., and Vukolic, M. Xft: practical fault tolerance beyond crashes. CoRR, abs/1502.05831 (2015).
  • [39] Martin, J.-P., and Alvisi, L. Fast byzantine consensus. IEEE Transactions on Dependable and Secure Computing 3, 3 (2006), 202–215.
  • [40] Miller, A., Xia, Y., Croman, K., Shi, E., and Song, D. The honey badger of bft protocols. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security (2016), ACM, pp. 31–42.
  • [41] Moscibroda, T., Schmid, S., and Wattenhofer, R. When selfish meets evil: Byzantine players in a virus inoculation game. In Proceedings of the Twenty-fifth Annual ACM Symposium on Principles of Distributed Computing (New York, NY, USA, 2006), PODC ’06, ACM, pp. 35–44.
  • [42] Nakamoto, S. Bitcoin: A peer-to-peer electronic cash system, 2008.
  • [43] Reiter, M. The rampart toolkit for building high-integrity services. Theory and Practice in Distributed Systems (1995), 99–110.
  • [44] Rivest, R. L., Shamir, A., and Adleman, L. A method for obtaining digital signatures and public-key cryptosystems. Communications of the ACM 21, 2 (1978), 120–126.
  • [45] Soramitsu. Iroha - A simple, decentralized ledger. http://iroha.tech/en/. [Online; accessed 16-Apr-2017].
  • [46] Soramitsu. Sumeragi - a Byzantine Fault Tolerant consensus algorithm. https://github.com/hyperledger/iroha/blob/master/docs/iroha_whitepaper.md, 2017. [Online; accessed 16-Apr-2017].
  • [47] Srinivasan, V., Nuggehalli, P., Chiasserini, C.-F., and Rao, R. R. Cooperation in wireless ad hoc networks. In INFOCOM 2003. Twenty-Second Annual Joint Conference of the IEEE Computer and Communications. IEEE Societies (2003), vol. 2, IEEE, pp. 808–817.
  • [48] Veronese, G. S., Correia, M., Bessani, A. N., and Lung, L. C. Spin one’s wheels? byzantine fault tolerance with a spinning primary. In Reliable Distributed Systems, 2009. SRDS’09. 28th IEEE International Symposium on (2009), IEEE, pp. 135–144.
  • [49] White, B., Lepreau, J., Stoller, L., Ricci, R., Guruprasad, S., Newbold, M., Hibler, M., Barb, C., and Joglekar, A. An integrated experimental environment for distributed systems and networks. In OSDI02 (Boston, MA, dec 2002), USENIXASSOC, pp. 255–270.
  • [50] Yin, J., Martin, J.-P., Venkataramani, A., Alvisi, L., and Dahlin, M. Separating agreement from execution for byzantine fault tolerant services. ACM SIGOPS Operating Systems Review 37, 5 (2003), 253–267.