Flexible Byzantine Fault Tolerance

04/22/2019
by   Dahlia Malkhi, et al.
VMware
0

Existing Byzantine fault tolerant (BFT) protocols work in a homogeneous model where a service administrator picks a set of assumptions (timing model and the fraction of Byzantine faults) and imposes these assumptions on all clients using the service. This paper introduces Flexible BFT, a family of BFT protocols that support clients with heterogenous assumptions. In a Flexible BFT protocol, replicas execute a set of instructions while each client decides whether a transaction is committed based on its own assumption. At a technical level, Flexible BFT makes two key contributions. First, it introduces a synchronous BFT protocol in which only the commit step requires to know the network delay bound and thus replicas execute the protocol without any synchrony assumption. Second, it introduces a notion called Flexible Byzantine Quorums by deconstructing the roles of different quorums in existing consensus protocols. This paper also introduces a new type of fault called alive-but-corrupt faults: adversaries that attack safety but maintain liveness. Flexible BFT can tolerate a combination of Byzantine and alive-but-corrupt faults that exceed one-third with partial synchrony or exceeds one-half with synchrony while still respecting Byzantine fault tolerance bounds in respective models.

READ FULL TEXT VIEW PDF

page 1

page 2

page 3

page 4

10/03/2020

DuoBFT: Resilience vs. Efficiency Trade-off in Byzantine Fault Tolerance

This paper presents DuoBFT, a Byzantine fault-tolerant protocol that pro...
05/27/2021

PAC: Practical Accountability for CCF

Permissioned ledger systems execute transactions on a set of replicas go...
04/12/2019

ezBFT: Decentralizing Byzantine Fault-Tolerant State Machine Replication

We present ezBFT, a novel leaderless, distributed consensus protocol cap...
02/05/2018

Gosig: Scalable Byzantine Consensus on Adversarial Wide Area Network for Blockchains

Existing Byzantine fault tolerance (BFT) protocols face significant chal...
04/22/2020

Twins: White-Glove Approach for BFT Testing

Byzantine Fault Tolerant (BFT) systems have seen extensive study for mor...
10/14/2020

BFT Protocol Forensics

Byzantine fault-tolerant (BFT) protocols allow a group of replicas to co...
06/21/2019

Asymmetric Distributed Trust

Quorum systems are a key abstraction in distributed fault-tolerant compu...

1 Introduction

Byzantine fault tolerant (BFT) protocols are used to build replicated services [31, 21, 32]. Recently, they have received revived interest as the algorithmic foundation of what is known as decentralized ledgers, or blockchains.

In the classic approach to BFT protocol designs, a protocol designer or a service administrator first picks a set of assumptions (e.g., the fraction of Byzantine faults and certain timing assumptions) and then devises a protocol (or chooses an existing one) tailored for that particular setting. The assumptions made by the protocol designer are imposed upon all parties involved — every replica maintaining the service as well as every client (also known as the ”learner” role) using the service. Such a protocol collapses if deployed under settings that differ from the one it is designed for. In particular, optimal-resilience partially synchronous solutions [12, 10] break (lose safety and liveness) if the fraction of Byzantine faults exceeds . Similarly, optimal-resilience synchronous solutions [1, 15] do not obtain safety or liveness if the fraction of Byzantine faults exceeds or if the synchrony bound is violated.

In this work, we introduce a new approach for BFT protocol design called Flexible BFT. Our approach offers advantages in the two aspects above. First, the Flexible BFT approach enables protocols that tolerate more than (resp. ) corruption faults in the partial-synchrony (resp. synchrony) model — provided that the number of Byzantine faults do not exceed the respective resilience bounds. Second, the Flexible BFT approach allows a certain degree of separation between the fault model and the protocol design. As a result, Flexible BFT allows diverse clients with different fault assumptions and timing assumptions (synchrony or not) to participate in the same protocol. We elaborate on these two aspects below.

Stronger resilience.

We introduce a mixed fault model with a new type of fault called alive-but-corrupt (a-b-c for short) faults. Alive-but-corrupt replicas actively try to disrupt the system from maintaining a safe consensus decision and they might arbitrarily deviate from the protocol for this purpose. However, if they cannot break safety, they will not try to prevent the system from reaching a (safe) decision. The rationale for this new type of fault is that violating safety may provide the attacker gains (e.g., a double spend attack) but preventing liveness usually does not. In fact, a-b-c replicas may gain rewards from keeping the replicated service live, e.g., by collecting service fees. We show a family of protocols that tolerate a combination of Byzantine and a-b-c faults that exceeds in the partially synchronous model and exceeds in the synchronous model. Our results do not violate existing resilience bounds because the fraction of Byzantine faults is always smaller than the respective bounds.

Diversity.

The Flexible BFT approach further provides certain separation between the fault model and the protocol. The design approach builds a protocol whose transcript can be interpreted by external clients with diverse beliefs, who draw different consensus commit decisions based on their beliefs. Flexible BFT guarantees safety and liveness so far as the clients’ beliefs are correct; thus two clients with correct assumptions agree with each other. Clients specify (i) the fault threshold they need to tolerate, and (ii) the message delay bound, if any, they believe in. For example, one instance of Flexible BFT can support a client that requires tolerance against Byzantine faults plus a-b-c faults, while simultaneously supporting another client who requires tolerance against Byzantine faults plus a-b-c faults, and a third client who believes in synchrony and requires Byzantine plus a-b-c tolerance.

This novel separation of fault model from protocol design can be useful in practice in several ways. First, different clients may naturally hold different assumptions about the system. Some clients may be more cautious and require a higher resilience than others; some clients may believe in synchrony while others do not. Moreover, even the same client may assume a larger fraction of faults when dealing with a $1M transaction compared to a $5 one. The rationale is that more replicas may be willing to collude to double spend a high-value transaction. In this case, the client can wait for more votes before committing the $1M transaction. Last but not least, a client may update its assumptions based on certain events it observes. For example, if a client receives votes for conflicting values, which may indicate an attempt at attacking safety, it can start requiring more votes than usual; if a client who believes in synchrony notices abnormally long message delays, which may indicate an attack on network infrastructure, it can update its synchrony bound to be more conservative or switch to a partial-synchrony assumption.

The notion of “commit” needs to be clarified in our new model. Clients in Flexible BFT have different assumptions and hence different commit rules. It is then possible and common that a value is committed by one client but not another. Flexible BFT guarantees that any two clients whose assumptions are correct (but possibly different) commit to the same value. If a client’s assumption is incorrect, however, it may commit inconsistent values which may later be reverted. While this new notion of commit may sound radical at first, it is the implicit behavior of existing BFT protocols. If the assumption made by the service administrator is violated in a classic BFT protocol (e.g., there are more Byzantine faults than provisioned), clients may commit to different values and they have no recourse. In this sense, Flexible BFT is a robust generalization of classic BFT protocols. In Flexible BFT, if a client performs conflicting commits, it should update its assumption to be more cautious and re-interpret what values are committed under its new assumption. In fact, this “recovery” behavior is somewhat akin to Bitcoin. A client in Bitcoin decides how many confirmations are needed (i.e., how “deeply buried”) to commit a block. If the client commits but subsequently an alternative longer fork appears, its commit is reverted. Going forward, the client may increase the number of confirmations it requires.

Key techniques.

Flexible BFT centers around two new techniques. The first one is a novel synchronous BFT protocol with replicas executing at network speed; that is, the protocol run by the replicas does not assume synchrony. This allows clients in the same protocol to assume different message delay bounds and commit at their own pace. The protocol thus separates timing assumptions of replicas from timing assumptions of clients. Note that this is only possible via Flexible BFT’s separation of protocol from the fault model: the action of committing is only carried out by clients, not by replicas. The other technique involves a breakdown of the different roles that quorums play in different steps of partially synchronous BFT protocols. Once again, made possible by the separation in Flexible BFT, we will use one quorum size for replicas to run a protocol, and let clients choose their own quorum sizes for committing in the protocol.

Contributions.

To summarize, our work has the following contributions.

  1. [topsep=8pt,itemsep=8pt]

  2. Alive-but-corrupt faults. We introduce a new type of fault, called alive-but-corrupt fault, which attack safety but not liveness.

  3. Synchronous BFT with network speed replicas. We present a synchronous protocol in which only the commit step requires synchrony. Since replicas no longer perform commits in our approach, the protocol simultaneously supports clients assuming different synchrony bounds.

  4. Flexible Byzantine Quorums. We deconstruct existing BFT protocols to understand the role played by different quorums and introduce the notion of Flexible Byzantine Quorums. A protocol based on Flexible Byzantine Quorums simultaneously supports clients assuming different fault models.

  5. One BFT Consensus Solution for the Populace. Putting the above together, we present a new approach for BFT design, Flexible BFT. Our approach has stronger resilience and diversity: Flexible BFT tolerates a fraction of combined (Byzantine plus a-b-c) faults beyond existing resilience bounds. And clients with diverse fault and timing beliefs are supported in the same protocol.

Organization.

The rest of the paper is organized as follows. Section 2 defines the Flexible BFT model where replicas and clients are separated. We will describe in more detail our key techniques for synchrony and partial-synchrony in Sections 3 and 4, respectively. Section 5 puts these techniques together and presents the final protocol. Section 6 discusses the result obtained by the Flexible BFT design and Section 7 describes related work.

2 Modeling Flexible BFT

The goal of Flexible BFT is to build a replicated service that takes requests from clients and provides clients an interface of a single non-faulty server, i.e., it provides clients with the same totally ordered sequence of values. Internally, the replicated service uses multiple servers, also called replicas, to tolerate some number of faulty servers. The total number of replicas is denoted by . In this paper, whenever we speak about a set of replicas or messages, we denote the set size as its fraction over . For example, we refer to a set of replicas as “ replicas” where .

Borrowing notation from Lamport [20], such a replicated service has three logical actors: proposers capable of sending new values, acceptors who add these values to a totally ordered sequence (called a blockchain), and learners who decide on a sequence of values based on the transcript of the protocol and execute them on a state machine. Existing replication protocols provide the following two properties:

[topsep=8pt,itemsep=4pt]

-

Safety. Any two learners learn the same sequence of values.

-

Liveness. A value proposed by a proposer will eventually be executed by every learner.

In existing replication protocols, the learners are assumed to be uniform, i.e., they interpret a transcript using the same rules and hence decide on the same sequence of values. In Flexible BFT, we consider diverse learners with different assumptions. Based on their own assumptions, they may interpret the transcript of the protocol differently. We show that so far as the assumptions of two different learners are both correct, they will eventually learn the same sequence of values. A replication protocol in the Flexible BFT approach satisfies the following properties:

[topsep=8pt,itemsep=4pt]

-

Safety for diverse learners. Any two learners with correct but potentially different assumptions learn the same sequence of values.

-

Liveness for diverse learners. A value proposed by a proposer will eventually be executed by every learner with a correct assumption.

In a replicated service, clients act as proposers and learners, whereas the replicas (replicated servers) are acceptors. Thus, safety and liveness guarantees are defined with respect to clients.

Fault model.

We assume two types of faults within the replicas: Byzantine and alive-but-corrupt (a-b-c for short). Byzantine replicas behave arbitrarily. On the other hand, the goal of a-b-c replicas is to attack safety but to preserve liveness. These replicas will take any actions that help them break safety of the protocol. However, if they cannot succeed in breaking safety, they will help provide liveness. Consequently, in this new fault model, the safety proof should treat a-b-c replicas similarly to Byzantine. Then, once safety is proved, the liveness proof can treat a-b-c replicas similarly to honest. We assume that the adversary is static, i.e., the adversary determines which replicas are Byzantine and a-b-c before the start of the protocol.

Other assumptions.

We assume hash functions, digital signatures and a public-key infrastructure (PKI). We use to denote a message signed by a replica . We assume pair-wise communication channels between replicas. We assume that all replicas have clocks that advance at the same rate.

3 Synchronous BFT with Network Speed Replicas - Overview

Protocol executed by the replicas.
  1. Propose. The leader of view proposes a value .

  2. Vote. On receiving the first value in a view , a replica broadcasts and votes for if it is safe to do so, as determined by a locking mechanism described later. The replica records the following.

    • If the replica collects votes on , denoted as and called a certificate of from view , then it “locks” on and records the lock time as .

    • If the replica observes an equivocating value signed by at any time after entering view , it records the time of equivocation as . It blames the leader by broadcasting and the equivocating values.

    • If the replica does not receive a proposal for sufficient time in view , it times out and broadcasts .

    • If the replica collects a set of messages, it records the time as , broadcasts them and enters view .

If a replica locks on a value in a view, then it votes only for in subsequent views unless it “unlocks” from by learning that replicas are not locked on in that view or higher views (they may be locked on other values or they may not be locked at all).

Commit rules for clients.

A value is said to be committed by a client assuming -synchrony iff replicas each report that there exists a view such that,

  1. is certified, i.e., exists.

  2. the replica observed an undisturbed- period after certification, i.e., no equivocating value or view change was observed at a time before after it was certified, or more formally,

Figure 1: Synchronous BFT with network speed replicas.

Early synchronous protocols [11, 17, 29] have relied on synchrony in two ways. First, the replicas assume a maximum network delay for communication between them. Second, they require a lock step execution, i.e., all replicas are in the same round at the same time. Hanke et al. showed a synchronous protocol without lock step execution [15]. Their protocol still contains a synchronous step in which all replicas perform a blocking wait of time before proceeding to subsequent steps. Sync HotStuff [4] improves on it further to remove replicas’ blocking waits during good periods (when the leader is honest), but blocking waits are still required by replicas during bad situations (view changes).

In this section, we show a synchronous protocol where the replicas do not ever have blocking waits and execute at the network speed. In other words, replicas run a partially synchronous protocol and do not rely on synchrony at any point. Clients, on the other hand, rely on synchrony bounds to commit. This separation is what allows our protocol to support clients with different assumptions on the value of . To the best of our knowledge, this is the first synchronous protocol to achieve such a separation. In addition, the protocol tolerates a combined Byzantine plus a-b-c fault ratio greater than a half (Byzantine fault tolerance is still less than half).

For simplicity, in this overview, we show a protocol for single shot consensus. In our final protocol in Section 5, we will consider a pipelined version of the protocol for consensus on a sequence of values. We do not consider termination for the single-shot consensus protocol in this overview because our final replication protocol is supposed to run forever.

The protocol is shown in Figure 1. It runs in a sequence of views. Each view has a designated leader who may be selected in a round robin order. The leader drives consensus in that view. In each view, the protocol runs in two steps – propose and vote. In the propose step, the leader proposes a value . In the vote step, replicas vote for the value if it is safe to do so. The vote also acts as a re-proposal of the value. If a replica observes a set of votes on , called a certificate , it “locks” on . For now, we assume . (To be precise, is slight larger than 1/2, e.g., out of .) We will revisit the choice of in Section 6. In subsequent views, a replica will not vote for a value other than unless it learns that replicas are not locked on . In addition, the replicas switch views (i.e., change leader) if they either observe an equivocation or if they do not receive a proposal from the leader within some timeout. A client commits if replicas state that there exists a view in which is certified and no equivocating value or view change was observed at a time before after it was certified. Here, is the maximum network delay the client believes in.

The protocol ensures safety if there are fewer than faulty replicas. The key argument for safety is the following: If an honest replica satisfies the commit condition for some value in a view, then (a) no other value can be certified and (b) all honest replicas are locked on at the end of that view. To elaborate, satisfying the commit condition implies that some honest replica has observed an undisturbed- period after it locked on , i.e., it did not observe an equivocation or a view change. Suppose the condition is satisfied at time . This implies that other replicas did not observe an equivocation or a view change before . The two properties above hold if the quorum honesty conditions described below hold. For liveness, if Byzantine leaders equivocate or do not propose a safe value, they will be blamed by both honest and a-b-c replicas and a view change will ensue. Eventually there will be an honest or a-b-c leader to drive consensus if quorum availability holds.

[topsep=8pt,itemsep=4pt]

Quorum honesty (a) within a view.

Since the undisturbed period starts after is certified, must have voted (and re-proposed) at a time earlier than . Every honest replica must have received before . Since they had not voted for an equivocating value by then, they must have voted for . Since the number of faults is less than , every certificate needs to contain an honest replica’s vote. Thus, no certificate for any other value can be formed in this view.

Quorum honesty (b) across views.

sends at time . All honest receive by time and become locked on . For an honest replica to unlock from in subsequent views, replicas need to claim that they are not locked on . At least one of them is honest and would need to falsely claim it is not locked, which cannot happen.

Quorum availability.

Byzantine replicas do not exceed so that replicas respond to the leader.

Tolerating a-b-c faults.

If we have only honest and Byzantine replicas (and no a-b-c replicas), quorum honesty requires the fraction of Byzantine replicas . Quorum availability requires . If we optimize for maximizing , we obtain . Now, suppose represents the fraction of a-b-c replicas. Quorum honesty requires , and quorum availability requires . Thus, the protocol supports varying values of and at different values of such that safety and liveness are both preserved.

Separating client synchrony assumption from the replica protocol.

The most interesting aspect of this protocol is the separation of the client commit rule from the protocol design. In particular, although this is a synchronous protocol, the replica protocol does not rely on any synchrony bound. This allows clients to choose their own message delay bounds. Any client that uses a correct message delay bound enjoys safety.

4 Flexible Byzantine Quorums for Partial Synchrony - Overview

In this section, we explain the high-level insights of Flexible Byzantine Quorums in Flexible BFT. Again, for ease of exposition, we focus on a single-shot consensus and do not consider termination. We start by reviewing the Byzantine Quorum Systems [25] that underlie existing partially synchronous protocols that tolerate 1/3 Byzantine faults (Section 4.1). We will illustrate that multiple uses of 2/3-quorums actually serve different purposes in these protocols. We then generalize these protocols to use Flexible Byzantine Quorums (Section 4.2), the key idea that enables more than 1/3 fault tolerance and allows diverse clients with varying assumptions to co-exist.

4.1 Background: Quorums in PBFT

Existing protocols for solving consensus in the partially synchronous setting with optimal -resilience revolve around voting by Byzantine quorums of replicas. Two properties of Byzantine quorums are utilized for achieving safety and liveness. First, any two quorums intersect at one honest replica – quorum intersection. Second, there exists a quorum that contains no Byzantine faulty replicas – quorum availability. Concretely, when less than the replicas are Byzantine, quorums are set to size . (To be precise, is slightly larger than 2/3, i.e., out of where is the number of faults, but we will use for ease of exposition.) This guarantees an intersection of size at least , hence at least one honest replica in the intersection. As for availability, there exist honest replicas to form a quorum.

To dissect the use of quorums in BFT protocols, consider their use in PBFT [10] for providing safety and liveness. PBFT operates in a view-by-view manner. Each view has a unique leader and consists of the following steps:

  • Propose. A leader proposes a value .

  • Vote 1. On receiving the first value for a view , a replica votes for if it is safe, as determined by a locking mechanism described below. A set of votes form a certificate .

  • Vote 2. On collecting , a replica “locks” on and votes for .

  • Commit. On collecting votes for , a client learns that proposal becomes a committed decision.

If a replica locks on a value in a view, then it votes only for in subsequent views unless it “unlocks” from . A replica “unlocks” from if it learns that replicas are not locked on in that view or higher (they may be locked on other values or they may not be locked at all).

The properties of Byzantine quorums are harnessed in PBFT for safety and liveness as follows:

[topsep=8pt,itemsep=4pt]

Quorum intersection within a view.

Safety within a view is ensured by the first round of votes. A replica votes only once per view. For two distinct values to both obtain certificates, one honest replica needs to vote for both, which cannot happen.

Quorum intersection across views.

Safety across views is ensured by the locking mechanism. If becomes a committed decision in a view, then a quorum of replicas lock on in that view. For an honest replica among them to unlock from , a quorum of replicas need to claim they are not locked on . At least one replica in the intersection is honest and would need to falsely claim it is not locked, which cannot happen.

Quorum availability within a view.

Liveness within each view is guaranteed by having an honest quorum respond to a non-faulty leader.

4.2 Flexible Byzantine Quorums

Our Flexible BFT approach separates the quorums used in BFT protocols for the replicas (acceptors) from the quorums used for learning when a decision becomes committed. More specifically, we denote the quorum used for forming certificates (locking) by and the quorum used for unlocking by . We denote the quorum employed by clients for learning certificate uniqueness by , and the quorum used for learning commit safety by . In other words, clients mandate first-round votes and second-round votes in order to commit a decision. Below, we outline a modified PBFT-like protocol that uses these different quorum sizes instead of a single quorum size . We then introduce a new definition, Flexible Byzantine Quorums, that capture the requirements needed for these quorums to provide safety and liveness.

  • [itemsep=4pt,leftmargin=*]

  • Propose. A leader proposes a value .

  • Vote 1. On receiving the first value for a view , a replica votes for if it is safe, as determined by a locking mechanism described below. A set of votes forms a certificate .

  • Vote 2. On collecting , a replica “locks” on and votes for .

  • Commit. On collecting votes for and votes for , a client learns that proposal becomes a committed decision.

If a replica locks on a value in a view, then it votes only for in subsequent views unless it “unlocks” from by learning that replicas are not locked on .

[topsep=8pt,itemsep=4pt]

Flexible quorum intersection (a) within a view.

Contrary to PBFT, in Flexible BFT, a pair of certificates need not necessarily intersect in an honest replica. Indeed, locking on a value does not preclude conflicting locks. It only mandates that every quorum intersects with every quorum at at least one honest replica. For safety, it is essential that the fraction of faulty replicas is less than .

Flexible quorum intersection (b) across views.

If a client commits a value in a view, replicas lock on in that view. For an honest replica among them to unlock from , replicas need to claim they are not locked on . This property mandates that every quorum intersects with every quorum at at least one honest replica. Thus, for safety, it is essential that the fraction of faulty replicas is less than .

Flexible quorum availability within a view.

For liveness, Byzantine replicas cannot exceed so that the aforementioned quorums can be formed at different stages of the protocol.

Given the above analysis, Flexible BFT ensures safety if the fraction of faulty replicas is less than , and provides liveness if the fraction of Byzantine replicas is at most . It is optimal to use balanced quorum sizes where and . To see this, first note that we should make sure ; otherwise, suppose the right-hand side is smaller, then setting to equal improves safety tolerance without affecting liveness tolerance. Next, observe that if we have but (and hence ), then once again setting to equal improves safety tolerance without affecting liveness tolerance.

Thus, in this paper, we set and . Since replicas use votes to lock, these votes can always be used by the clients to commit quorums. Thus, . The Flexible Byzantine Quorum requirements collapse into the following two conditions.

Flexible quorum intersection.

The fraction of faulty replicas is .

Flexible quorum availability.

The fraction of Byzantine replicas is .

Tolerating a-b-c faults.

If all faults in the system are Byzantine faults, then the best parameter choice is for fault tolerance, and Flexible Byzantine Quorums degenerate to basic Byzantine quorums. However, in our model, a-b-c replicas are only interested in attacking safety but not liveness. This allows us to tolerate total faults (Byzantine plus a-b-c), which can be more than . For example, if we set and , then such a protocol can tolerate Byzantine faults plus a-b-c faults. We discuss the choice for and and their rationale in Section 6.

Separating client commit rules from the replica protocol.

A key property of the Flexible Byzantine Quorum approach is that it decouples the BFT protocol from client commit rules. The decoupling allows clients assuming different fault models to utilize the same protocol. In the above protocol, the propose and two voting steps are executed by the replicas and they are only parameterized by . The commit step can be carried by different clients using different commit thresholds . Thus, a fixed determines a possible set of clients with varying commit rules (in terms of Byzantine and a-b-c adversaries). Recall that a Byzantine adversary can behave arbitrarily and thus may not provide liveness whereas an a-b-c adversary only intends to attack safety but not liveness. Thus, a client who believes that a large fraction of the adversary may attempt to break safety, not progress, can choose a larger . By doing so, it seeks stronger safety against dishonest replicas, while trading liveness. Conversely, a client that assumes that a large fraction of the adversary attacks liveness must choose a smaller .

5 Flexible BFT Protocol

In this section, we combine the ideas presented in Sections 3 and 4 to obtain a final protocol that supports both types of clients. A client can either assume partial synchrony, with freedom to choose as described in the previous section, or assume synchrony with its own choice of , as described in Section 3. Replicas execute a protocol at the network speed with a parameter . We first give the protocol executed by the replicas and then discuss how clients commit depending on their assumptions. Moreover, inspired by Casper [8] and HotStuff [34], we show a protocol where the rounds of voting can be pipelined.

5.1 Notation

Before describing the protocol, we will first define some data structures and terminologies that will aid presentation.

Block format.

The pipelined protocol forms a chain of values. We use the term block to refer to each value in the chain. We refer to a block’s position in the chain as its height. A block at height has the following format

where denotes a proposed value at height and is a hash digest of the predecessor block. The first block has no predecessor. Every subsequent block must specify a predecessor block by including a hash of it. We say a block is valid if (i) its predecessor is valid or , and (ii) its proposed value meets application-level validity conditions and is consistent with its chain of ancestors (e.g., does not double spend a transaction in one of its ancestor blocks).

Block extension and equivocation.

We say extends , if is an ancestor of (). We say two blocks and equivocate one another if they are not equal and do not extend one another.

Certificates and certified blocks.

In the protocol, replicas vote for blocks by signing them. We use to denote a set of signatures on by replicas in view . is a parameter fixed for the protocol instance. We call a certificate for from view . Certified blocks are ranked first by the views in which they are certified and then by their heights. In other words, a block certified in view is ranked higher than a block certified in view if either (i) or (ii) and .

Locked blocks.

At any time, a replica locks the highest certified block to its knowledge. During the protocol execution, each replica keeps track of all signatures for all blocks and keeps updating its locked block. Looking ahead, the notion of locked block will be used to guard the safety of a client commit.

5.2 Replica Protocol

The replica protocol progresses in a view-by-view fashion. Each view has a designated leader who is responsible for driving consensus on a sequence of blocks. Leaders can be chosen statically, e.g., round robin, or randomly using more sophisticated techniques [9, 28]. In our description, we assume a round robin selection of leaders, i.e., ( mod ) is the leader of view .

At a high level, the protocol does the following: The leader proposes a block to all replicas. The replicas vote on it if safe to do so. The block becomes certified once replicas vote on it. The leader will then propose another block extending the previous one, chaining blocks one after another at increasing heights. Unlike regular consensus protocols where replicas determine when a block is committed, in Flexible BFT, replicas only certify blocks while committing is offloaded to the clients. If at any time replicas detect malicious leader behavior or lack of progress in a view, they blame the leader and engage in a view change protocol to replace the leader and move to the next view. The new leader collects a status from different replicas and continues to propose blocks based on this status. We explain the steady state and view change protocols in more detail below.

Let be the current view number and replica be the leader in this view. Perform the following steps in an iteration.

  1. [topsep=8pt,itemsep=8pt,leftmargin=*]

  2. Propose. Executed by the leader of view

    The leader broadcasts . Here, is the newly proposed block and it should extend the highest certified block known to . In the steady state, an honest leader would extend the previous block it proposed, in which case and . Immediately after a view change, determines the highest certified block from the status received during the view change.

  3. Vote. Executed by all replicas

    When a replica receives a valid proposal from the leader , broadcasts the proposal and a vote if (i) the proposal is the first one in view , and it extends the highest certified block in , or (ii) the proposal extends the last proposed block in the view.

    In addition, replica records the following based on the messages it receives.

    • [topsep=8pt,itemsep=4pt]

    • keeps track of the number of votes received for this block in this view as .

    • If block has been proposed in view , marks as a locked block and records the locked time as .

    • If a block equivocating is proposed by in view (possibly received through a vote), records the time at which the equivocating block is received.

    The replica then enters the next iteration. If the replica observes no progress or equivocating blocks in the same view , it stops voting in view and sends message to all replicas.

Figure 2: Flexible BFT steady state protocol.
Steady state protocol.

The steady state protocol is described in Figure 2. In the steady state, there is a unique leader who, in an iteration, proposes a block, waits for votes from replicas and moves to the next iteration. In the steady state, an honest leader always extends the previous block it proposed. Immediately after a view change, since the previous leaders could have been Byzantine and may have proposed equivocating blocks, the new leader needs to determine a safe block to propose. It does so by collecting a status of locked blocks from replicas denoted by (described in the view change protocol).

For a replica in the steady state, on receiving a proposal for block , a replica votes for it if it extends the previous proposed block in the view or if it extends the highest certified block in . Replica can potentially receive blocks out of order and thus receive before its ancestor blocks. In this case, replica waits until it receives the ancestor blocks, verifies the validity of those blocks and before voting for . In addition, replica records the following to aid a client commit:

  • Number of votes. It records the number of votes received for in view as . Observe that votes are broadcast by all replicas and the number of votes for a block can be greater than . will be updated each time the replica hears about a new vote in view .

  • Lock time. If was proposed in the same view , it locks and records the locked time as .

  • Equivocation time. If the replica ever observes an equivocating block at height in view through a proposal or vote, it stores the time of equivocation as .

Looking ahead, the locked time and equivocation time will be used by clients with synchrony assumptions to commit, and the number of votes will be used by clients with partial-synchrony assumptions to commit.

Leader monitoring.

If a replica detects a lack of progress in view or observes malicious leader behavior such as more than one height- blocks in the same view, it blames the leader of view by broadcasting a message. It quits view and stops voting and broadcasting blocks in view . To determine lack of progress, the replicas may simply guess a time bound for message arrival or use increasing timeouts for each view [10].

View change.

The view change protocol is described in Figure 3. If a replica gathers messages from distinct replicas, it forwards them to all other replicas and enters a new view (Step 1). It records the time at which it received the blame certificate as . Upon entering a new view, a replica reports to the leader of the new view its locked block and transitions to the steady state (Step 2). status messages form the status . The first block proposes in the new view should extend the highest certified block among these status messages.

Let and be the leaders of views and , respectively.

  1. [topsep=8pt,itemsep=8pt,leftmargin=*,label=()]

  2. New-view. Upon gathering messages, broadcast them and enter view . Record the time as .

  3. Status. Suppose is the block locked by the replica. Send a status of its locked block to the leader using and transition to the steady state. Here, is the view in which was certified.

Figure 3: Flexible BFT view change protocol.

5.3 Client Commit Rules

  1. [itemsep=8pt,leftmargin=0.2cm,label=]

  2. (CR1) Partially-synchronous commit. A block is committed under the partially synchronous rule with parameter iff there exist and such that

    1. [topsep=8pt,itemsep=4pt]

    2. and exist where extends and (if , ).

    3. and .

  3. (CR2) Synchronous commit. A block is committed assuming synchrony iff the following holds for replicas. There exist and (possibly different across replicas) such that,

    1. [topsep=8pt,itemsep=4pt]

    2. exists where extends (if , ).

    3. An undisturbed- period is observed after is obtained, i.e., no equivocating block or view change of view were observed before time after was obtained, i.e.,

Figure 4: Flexible BFT commit rules

As mentioned in the introduction, Flexible BFT supports clients with different assumptions. Clients in Flexible BFT learn the state of the protocol from the replicas and based on their own assumptions determine whether a block has been committed. Broadly, we supports two types of clients: those who believe in synchrony and those who believe in partial synchrony.

5.3.1 Clients with Partial-Synchrony Assumptions (CR1)

A client with partial-synchrony assumptions deduces whether a block has been committed by based on the number of votes received by a block. A block (together with its ancestors) is committed with parameter iff and its immediate successor both receive votes in the same view.

Safety of CR1.

A CR1 commit based on votes is safe against faulty replicas (Byzantine plus a-b-c). Observe that if gets votes in view , due to flexible quorum intersection, a conflicting block cannot be certified in view , unless replicas are faulty. Moreover, extending has also received votes in view . Thus, replicas lock block in view . In subsequent views, honest replicas that have locked will only vote for a block that equals or extends unless they unlock. However, due to flexible quorum intersection, they will not unlock unless replicas are faulty. Proof of Lemma 1 formalizes this argument.

5.3.2 Client with Synchrony Assumptions (CR2)

Intuitively, a CR2 commit involves replicas collectively stating that no “bad event” happens within “sufficient time” in a view. Here, a bad event refers to either leader equivocation or view change (the latter indicates sufficient replicas believe leader is faulty) and the “sufficient time” is ; where is a synchrony bound chosen by the client. More formally, a replica states that a synchronous commit for block for a given parameter (set by a client) is satisfied iff the following holds. There exists that extends and , and the replica observes an undisturbed- period after obtaining during which (i) no equivocating block is observed, and (ii) no blame certificate/view change certificate for view was obtained, i.e.,

where denotes the time equivocation for in view was observed ( if no equivocation), denotes the time at which view change happened from view to ( if no view change has happened yet), and denotes the time at which was locked (or was proposed) in view . Note that the client does not require the fraction of replicas to report the same height or view .

Safety of CR2.

A client believing in synchrony assumes that all messages between replicas arrive within time after they were sent. If the client’s chosen is a correct upper bound on message delay, then a CR2 commit is safe against faulty replicas (Byzantine plus a-b-c), as we explain below. If less than replicas are faulty, at least one honest replica reported an undisturbed- period. Let us call this honest replica and analyze the situation from ’s perspective to explain why an undisturbed period ensures safety. Observe that replicas in Flexible BFT forward the proposal when voting. If -synchrony holds, every other honest replica learns about the proposal at most time after learns about it. If any honest replica voted for a conflicting block or quit view , would have known within time.

5.4 Safety and Liveness

We introduce the notion of direct and indirect commit to aid the proofs. We say a block is committed directly under CR1 if the block and its immediate successor both get votes in the same view. We say a block is committed directly under CR2 if some honest replica reports an undisturbed- period after its successor block was obtained. We say a block is committed indirectly if neither condition applies to it but it is committed as a result of a block extending it being committed directly. We remark that the direct commit notion, especially for CR2, is merely a proof technique. A client cannot tell whether a replica is honest, and thus has no way of knowing whether a block is directly committed under CR2.

Lemma 1.

If a client directly commits a block in view using a correct commit rule, then a certified block that ranks no lower than must equal or extend .

Proof.

To elaborate on the lemma, a certified block ranks no lower than if either (i) and , or (ii) . We need to show that if is directly committed, then any certified block that ranks no lower either equals or extends . We consider the two commit rules separately. For both commit rules, we will use induction on to prove the lemma.


For CR1 with parameter to be correct, flexible quorum intersection needs to hold, i.e., the fraction of faulty replicas must be less than . being directly committed under CR1 with parameter implies that there are votes in view for and where extends .

For the base case, a block with that does not extend cannot get certified in view , because that would require replicas to vote for two equivocating blocks in view .

Next, we show the inductive step. Note that replicas voted for in view , which contains . Thus, they lock or a block extending by the end of view . Due to the inductive hypothesis, any certified block that ranks equally or higher from view up to view either equals or extends . Thus, by the end of view , those replicas still lock or a block extending . Since the total fraction of faults is less than , the status shown by the leader of view must include a certificate for or a block extending it; moreover, any certificate that ranks equal to or higher than is for a block that equals or extends . Thus, only a block that equals or extends can gather votes from those replicas in view and only a block that equals or extends can get certified in view .


For CR2 with synchrony bound to be correct, must be an upper bound on worst case message delay and the fraction of faulty replicas is less than . being directly committed under CR2 with -synchrony implies that at least one honest replica voted for extending in view , and did not hear an equivocating block or view change within time after that. Call this replica . Suppose voted for extending in view at time , and did not hear an equivocating block or view change by time .

We first show the base case: a block with certified in view must equal or extend . Observe that if with does not equal or extend , then it equivocates . No honest replica voted for before time , because otherwise would have received the vote for by time , No honest replica would vote for after time either, because by then they would have received (from ) and voted for . Thus, cannot get certified in view .

We then show the inductive step. Because did not hear view change by time , all honest replicas are still in view by time , which means they all receive from by the end of view . Thus, they lock or a block extending by the end of view . Due to the inductive hypothesis, any certified block that ranks equally or higher from view up to view either equals or extends . Thus, by the end of view , all honest replicas still lock or a block extending . Since the total fraction of faults is less than , the status shown by the leader of view must include a certificate for or a block extending it; moreover, any certificate that ranks equal to or higher than is for a block that equals or extends . Thus, only a block that equals or extends can gather honest votes in view and only a block that equals or extends can get certified in view . ∎

Theorem 2 (Safety).

Two clients with correct commit rules commit the same block for each height .

Proof.

Suppose for contradiction that two distinct blocks and are committed at height . Suppose is committed as a result of being directly committed in view and is committed as a result of being directly committed in view . This implies is or extends ; similarly, is or extends . Without loss of generality, assume . If , further assume without loss of generality. By Lemma 1, the certified block must equal or extend . Thus, . ∎

Theorem 3 (Liveness).

If all clients have correct commit rules, they all keep committing new blocks.

Proof.

By the definition of a-b-c faults, if they cannot violate safety, they will preserve liveness. Theorem 2 shows that if all clients have correct commit rules, then safety is guaranteed even if a-b-c replicas behave arbitrarily. Thus, once we proved safety, we can treat a-b-c replicas as honest when proving liveness.

Observe that a correct commit rule tolerates at most Byzantine faults. If a Byzantine leader prevents liveness, there will be blame messages against it, and a view change will ensue to replace the leader. Eventually, a non-Byzantine (honest or a-b-c) replica becomes the leader and drives consensus in new heights. If replicas use increasing timeouts, eventually, all non-Byzantine replicas stay in the same view for sufficiently long. When both conditions occur, if a client’s commit rule is correct (either CR1 and CR2), due to quorum availability, it will receive enough votes in the same view to commit. ∎

5.5 Efficiency

Latency.

Clients with a synchrony assumption incur a latency of plus a few network speed rounds. In terms of the maximum network delay , this matches the state-of-the-art synchronous protocols [4]. The distinction though is that now depends on the client assumption and hence different clients may commit with different latencies Clients with partial-synchrony assumptions incur a latency of two rounds of voting; this matches PBFT [10].

Communication.

Every vote and new-view messages are broadcast to all replicas, incurring communication messages. This is the same complexity of PBFT [10] and Sync HotStuff [4].

6 Discussion

As we have seen, three parameters , , and determine the protocol. is the only parameter for the replicas and is picked by the service administrator. The choice of determines a set of client assumptions that can be supported. and are chosen by clients to commit blocks. In this section, we first discuss the client assumptions supported by a given and then discuss the trade-offs between different choices of .

6.1 Client Assumptions Supported by

Figure 5: Clients supported for .

Figure 5 represents the clients supported at . The x-axis represents Byzantine faults and the y-axis represents total faults (Byzantine plus a-b-c). Each point on this graph represents a client fault assumption as a pair: (Byzantine faults, total faults). The shaded gray area indicates an “invalid area” since we cannot have fewer total faults than Byzantine faults. A missing dimension in this figure is the choice of . Thus, the synchrony guarantee shown in this figure is for clients that choose a correct synchrony bound.

Clients with partial-synchrony assumptions can get fault tolerance on (or below) the starred orange line. The right most point on the line is , i.e., we tolerate less than a third of Byzantine replicas and no additional a-b-c replicas. This is the setting of existing partially synchronous consensus protocols [12, 10, 34]. Flexible BFT generalizes these protocols by giving clients the option of moving up-left along the line, i.e., tolerating fewer Byzantine and more total faults. By choosing , a client tolerates total faults for safety and Byzantine faults for liveness. In other words, as a client moves left, for every additional vote it requires, it tolerates one fewer Byzantine fault and gains overall one higher total number of faults (i.e., two more a-b-c faults). The left most point on this line tolerating no Byzantine replicas and the highest fraction of a-b-c replicas.

Moreover, for clients who believe in synchrony, if their assumption is correct, they enjoy 1/3 Byzantine tolerance and 2/3 total tolerance represented by the green diamond. This is because synchronous commit rules are not parameterized by the number of votes received.

How do clients pick their commit rules?

In Figure 5, the shaded starred orange portion of the plot represent fault tolerance provided by the partially synchronous commit rule (CR1). Specifically, setting to the total fault fraction yields the necessary commit rule. On the other hand, if a client’s required fault tolerance lies in the circled green portion of the plot, then the synchronous commit rule (CR2) with an appropriate picked by the client yields the necessary commit rule. Finally, if a client’s target fault tolerance corresponds to the white region of the plot, then it is not achievable with this .

Clients with incorrect assumptions and recovery.

If a client has incorrect assumption with respect to the fault threshold or synchrony parameter , then it can lose safety or liveness. If a client believing in synchrony picks too small a and commits a value , it is possible that a conflicting value may also be certified. Replicas may choose to extend the branch containing , effectively reverting and causing a safety violation. Whenever a client detects such a safety violation, it may need to revert some of its commits and increase to recover.

For a client with partial-synchrony assumption, if it loses safety, it can update its fault model to move left along the orange starred line, i.e., tolerate higher total faults but fewer Byzantine. On the other hand, if it observes no progress as its threshold is not met, then it moves towards the right. However, if the true fault model is in the circled green region in Figure 5, then the client cannot find a partially synchronous commit rule that is both safe and live and eventually has to switch to using a synchronous commit rule.

Recall that the goal of a-b-c replicas is to attack safety. Thus, clients with incorrect assumptions may be exploited by a-b-c replicas for their own gain (e.g., by double-spending). When a client updates to a correct assumption and recovers from unsafe commits, their subsequent commits would be safe and final. This is remotely analogous to Bitcoin – if a client commits to a transaction when it is a few blocks deep and a powerful adversary succeeds in creating an alternative longer fork, the commit is reverted.

6.2 Comparing Different Choices

Figure 6: Clients supported by Flexible BFT at different ’s. The legend represents the different values.

We now look at the service administrator’s choice at picking . In general, the service administrator’s goal is to tolerate a large number of Byzantine and a-b-c faults, i.e., move towards top and/or right of the figure. Figure 6 shows the trade-offs in terms of clients supported by different values in Flexible BFT.

First, it can be observed that for clients with partial-synchrony assumptions, dominates . Observe that the fraction of Byzantine replicas are bounded by and , so . Thus, as decreases, Byzantine fault tolerance decreases. Moreover, since the total fault tolerance is , a lower also tolerates a smaller fraction of total faults for a fixed .

For or for clients believing in synchrony, no value of is Pareto optimal. For clients with partial-synchrony assumptions, as increases, the total fault tolerance for safety increases. But since , we have , and hence the Byzantine tolerance for liveness decreases. For clients believing in synchrony, the total fault tolerance for safety is and the Byzantine fault tolerance for liveness is . In both cases, the choice of represents a safety-liveness trade-off.

6.3 Separating Alive-but-corrupt Resilience from Diversity

So far, we presented the Flexible BFT techniques and protocols to simultaneously support diverse client support and stronger a-b-c fault tolerance. Indeed, we believe both properties are desirable and they strengthen each other. But we remark that these two properties can be provided separately.

It is relatively straightforward to provide stronger fault tolerance in the a-b-c model in a classic uniform setting. For example, under partial-synchrony, one can simply use a larger quorum in PBFT (without the / replica/client quorum separation). But we note that a higher total (a-b-c plus Byzantine) tolerance comes at the price of a lower Byzantine tolerance. In a uniform setting, this means all clients have to sacrifice some Byzantine tolerance. In the diverse setting, Flexible BFT gives clients the freedom to choose the fault assumption they believe in, and a client can choose the classic Byzantine fault model.

On the flip side, if one hopes to support diverse clients in the classic Byzantine fault (no a-b-c faults), the “dimension of diversity” reduces. One example is the network speed replica protocol in Section 3, which supports clients that believe in different synchrony bounds. That protocol can be further extended to support clients with a (uniform) partial-synchrony assumption. Clients with partial-synchrony assumption are uniform since we have not identified any type of “diversity” outside a-b-c faults for them.

7 Related Work

Partially Synchronous protocols [10, 33, 26, 19, 34, 7]
Synchronous Protocols [30, 15, 4, 1]
Thunderella, Sync HotStuff (: optimistic) [30, 4]
Zyzzyva, SBFT (: optimistic) [19]
Figure 7: Comparing Flexible BFT to existing consensus protocols. The legend represent different values.

Most BFT protocols are designed with a uniform assumption about the system. The literature on BFT consensus is vast and is largely beyond scope for review here; we refer the reader to the standard textbooks in distributed computing [23, 6].

Resilience.

Figure 7 compares resilience in Flexible BFT with some existing consensus protocols. The x axis represents a Byzantine resilience threshold, the y axis the total resilience against corruption under the a-b-c fault mode. The three different colors (red, green, blue) represent three possible instantiations of Flexible BFT at different ’s.

Each point in the figure represents an abstract “client” belief. For the partial synchrony model, client beliefs form lines, and for synchronous settings, clients beliefs are individual circles. The locus of points on a given color represents all client assumptions supported for a corresponding , representing the diversity of clients supported. The figure depicts state-of-art resilience combinations by existing consensus solutions via uncolored shapes, . Partially synchronous protocols [10, 34, 7] that tolerate one-third Byzantine faults can all be represented by the ‘+’ symbol at . Similarly, synchronous protocols [15, 3, 1] that tolerate one-half Byzantine faults are represented by the ‘’ symbol at . It is worth noting that some of these works employ two commit rules that differ in number of votes or synchrony [26, 19, 30, 4]. For instance, Thunderella and Sync HotStuff optimistically commit in an asynchronous fashion based on quorums of size , as represented by a hollow triangle at . Similarly, FaB [26], Zyzzyva [19] and SBFT [14] optimistically commit when they receive all votes but wait for two rounds of votes otherwise. These are represented by two points in the figure. Despite the two commit rules, these protocols do not have client diversity, all parties involved (replicas and clients) make the same assumptions and reach the same commit decisions.

Diverse client beliefs.

A simple notion of client diversity exists in Bitcoin’s probabilistic commit rule. One client may consider a transaction committed after six confirmations while another may require only one confirmation. Generally, the notion of client diversity has been discussed informally at public blockchain forums.

Another example of diversity is considered in the XFT protocol [22]. The protocol supports two types of clients: clients that assume crash faults under partial synchrony, or clients that assume Byzantine faults but believe in synchrony. Yet another notion of diversity is considered by the federated Byzantine consensus model and the Stellar protocol [27]. The Stellar protocol allows nodes to pick their own quorums. Our Flexible BFT approach instead considers diverse clients in terms of a-b-c adversaries and synchrony. The model and techniques in [27] and our paper are largely orthogonal and complementary.

Flexible Paxos.

Flexible Paxos by Howard et al. [16] observes that Paxos may use non-intersecting quorums within a view but an intersection is required across views. Our Flexible Quorum Intersection (b) can be viewed as its counterpart in the Byzantine and a-b-c setting. In addition, Flexible BFT applies the flexible quorum idea to support diverse clients with different fault model and timing assumptions.

Mixed fault model.

Fault models that mix Byzantine and crash faults have been considered in various works, e.g., FaB [26] and SBFT [4]. The a-b-c faults are in a sense the opposite of crash faults, mixing Byzantine with “anti-crashes”. Our a-b-c adversary bears similarity to a rational adversary in the BAR model [5], with several important differences. BAR assumes no collusion exists among rational replicas themselves and between rational and Byzantine replicas, whereas a-b-c replicas have no such constraint. BAR solutions are designed to expose cheating behavior and thus deter rational replicas from cheating. The Flexible BFT approach does not rely on deterrence for good behavior, and breaks beyond the () corruption tolerance threshold in asynchronous (synchronous) systems. Last, BAR solutions address only the partial synchrony settings. At the same time, BAR provides a game theoretic proof of rationality. More generally, game theoretical modeling and analysis with collusion have been performed to other problems such as secret sharing and multiparty computation [2, 24, 13, 18]. Analyzing incentives for the a-b-c model remains an open challenge.

8 Conclusion and Future Work

We present Flexible BFT, a protocol that supports diverse clients with different assumptions to use the same ledger. Flexible BFT allows the clients to tolerate combined (Byzantine plus alive-but-corrupt) faults exceeding 1/2 and 1/3 for synchrony and partial synchrony respectively. At a technical level, under synchrony, we show a synchronous protocol where the replicas execute a network speed protocol and only the commit rule uses the synchrony assumption. For partial synchrony, we introduce the notion of Flexible Byzantine Quorums by deconstructing existing BFT protocols to understand the role played by the different quorums. We combine the two to form Flexible BFT which obtains the best of both worlds.

Our liveness proof in Section 5.4 employs a strong assumption that all clients have correct commit rules. This is because our alive-but-corrupt fault model did not specify what these replicas would do if they can violate safety for some clients. In particular, they may stop helping liveness. However, we believe this will not be a concern once we move to a more realistic rational model. In that case, the best strategy for alive-but-corrupt replicas is to attack the safety of clients with unsafe commit rules while preserving liveness for clients with correct commit rules. Such an analysis in the rational fault model remains interesting future work. Our protocol also assumes that all replicas have clocks that advance at the same rate. It is interesting to explore whether our protocol can be modified to work with clock drifts.

Acknowledgement

We thank Ittai Abraham and Ben Maurer for many useful discussions on Flexible BFT. We thank Marcos Aguilera for many insightful comments on an earlier draft of this work

References

  • [1] Ittai Abraham, Srinivas Devadas, Danny Dolev, Kartik Nayak, and Ling Ren. Synchronous byzantine agreement with expected rounds, expected communication, and optimal resilience. In Financial Cryptography and Data Security (FC), 2019.
  • [2] Ittai Abraham, Danny Dolev, Rica Gonen, and Joe Halpern.

    Distributed computing meets game theory: robust mechanisms for rational secret sharing and multiparty computation.

    In Proceedings of the twenty-fifth annual ACM symposium on Principles of distributed computing, pages 53–62. ACM, 2006.
  • [3] Ittai Abraham, Dahlia Malkhi, Kartik Nayak, and Ling Ren. Dfinity consensus, explored. Cryptology ePrint Archive, Report 2018/1153, 2018.
  • [4] Ittai Abraham, Dahlia Malkhi, Kartik Nayak, Ling Ren, and Maofan Yin. Sync hotstuff: Simple and practical state machine replication. Cryptology ePrint Archive, Report 2019/270, 2019. https://eprint.iacr.org/2019/270.
  • [5] Amitanand S Aiyer, Lorenzo Alvisi, Allen Clement, Mike Dahlin, Jean-Philippe Martin, and Carl Porth. Bar fault tolerance for cooperative services. In ACM SIGOPS operating systems review, volume 39, pages 45–58. ACM, 2005.
  • [6] Hagit Attiya and Jennifer Welch. Distributed computing: fundamentals, simulations, and advanced topics, volume 19. John Wiley & Sons, 2004.
  • [7] Ethan Buchman. Tendermint: Byzantine fault tolerance in the age of blockchains. PhD thesis, 2016.
  • [8] Vitalik Buterin and Virgil Griffith. Casper the friendly finality gadget. CoRR, abs/1710.09437, 2017.
  • [9] Christian Cachin, Klaus Kursawe, and Victor Shoup. Random oracles in Constantinople: Practical asynchronous byzantine agreement using cryptography. Journal of Cryptology, 18(3):219–246, 2005.
  • [10] Miguel Castro and Barbara Liskov. Practical byzantine fault tolerance. In OSDI, volume 99, pages 173–186, 1999.
  • [11] Danny Dolev and H. Raymond Strong. Authenticated algorithms for byzantine agreement. SIAM Journal on Computing, 12(4):656–666, 1983.
  • [12] Cynthia Dwork, Nancy Lynch, and Larry Stockmeyer. Consensus in the presence of partial synchrony. Journal of the ACM, 35(2):288–323, 1988.
  • [13] S Dov Gordon and Jonathan Katz. Rational secret sharing, revisited. In International Conference on Security and Cryptography for Networks, pages 229–241. Springer, 2006.
  • [14] Guy Golan Gueta, Ittai Abraham, Shelly Grossman, Dahlia Malkhi, Benny Pinkas, Michael K Reiter, Dragos-Adrian Seredinschi, Orr Tamir, and Alin Tomescu. Sbft: a scalable decentralized trust infrastructure for blockchains. In DSN, 2019.
  • [15] Timo Hanke, Mahnush Movahedi, and Dominic Williams. Dfinity technology overview series, consensus system. arXiv preprint arXiv:1805.04548, 2018.
  • [16] Heidi Howard, Dahlia Malkhi, and Alexander Spiegelman. Flexible paxos: Quorum intersection revisited. In OPODIS, volume 70 of LIPIcs, pages 25:1–25:14. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2016.
  • [17] Jonathan Katz and Chiu-Yuen Koo. On expected constant-round protocols for byzantine agreement. Journal of Computer and System Sciences, 75(2):91–112, 2009.
  • [18] Gillat Kol and Moni Naor. Cryptography and game theory: Designing protocols for exchanging information. In Theory of Cryptography Conference, pages 320–339. Springer, 2008.
  • [19] Ramakrishna Kotla, Lorenzo Alvisi, Mike Dahlin, Allen Clement, and Edmund Wong. Zyzzyva: speculative byzantine fault tolerance. In ACM SIGOPS Operating Systems Review, volume 41, pages 45–58. ACM, 2007.
  • [20] Leslie Lamport. Fast paxos. Distributed Computing, 19(2):79–103, 2006.
  • [21] Leslie Lamport, Robert Shostak, and Marshall Pease. The byzantine generals problem. ACM Transactions on Programming Languages and Systems, 4(3):382–401, 1982.
  • [22] Shengyun Liu, Christian Cachin, Vivien Quéma, and Marko Vukolic. XFT: practical fault tolerance beyond crashes. In 12th USENIX Symposium on Operating Systems Design and Implementation, pages 485–500. USENIX Association, 2016.
  • [23] Nancy A Lynch. Distributed algorithms. Elsevier, 1996.
  • [24] Anna Lysyanskaya and Nikos Triandopoulos. Rationality and adversarial behavior in multi-party computation. In Annual International Cryptology Conference, pages 180–197. Springer, 2006.
  • [25] Dahlia Malkhi and Michael Reiter. Byzantine quorum systems. In

    Proceedings of the Twenty-ninth Annual ACM Symposium on Theory of Computing

    , STOC ’97, pages 569–578, New York, NY, USA, 1997. ACM.
  • [26] J-P Martin and Lorenzo Alvisi. Fast byzantine consensus. IEEE Transactions on Dependable and Secure Computing, 3(3):202–215, 2006.
  • [27] David Mazieres. The stellar consensus protocol: A federated model for internet-level consensus, 2015.
  • [28] Silvio Micali. Algorand: The efficient and democratic ledger. arXiv:1607.01341, 2016.
  • [29] Silvio Micali and Vinod Vaikuntanathan. Optimal and player-replaceable consensus with an honest majority. 2017.
  • [30] Rafael Pass and Elaine Shi. Thunderella: Blockchains with optimistic instant confirmation. In Annual International Conference on the Theory and Applications of Cryptographic Techniques, pages 3–33. Springer, 2018.
  • [31] M. Pease, R. Shostak, and L. Lamport. Reaching agreement in the presence of faults. J. ACM, 27(2):228–234, April 1980.
  • [32] Fred B Schneider. Implementing fault-tolerant services using the state machine approach: A tutorial. ACM Computing Surveys (CSUR), 22(4):299–319, 1990.
  • [33] Jian Yin, Jean-Philippe Martin, Arun Venkataramani, Lorenzo Alvisi, and Mike Dahlin. Separating agreement from execution for byzantine fault tolerant services. ACM SIGOPS Operating Systems Review, 37(5):253–267, 2003.
  • [34] Maofan Yin, Dahlia Malkhi, Michael K Reiter, Guy Golan Gueta, and Ittai Abraham. HotStuff: BFT Consensus in the Lens of Blockchain. arXiv preprint arXiv:1803.05069, 2018.