Hot-Stuff the Linear, Optimal-Resilience, One-Message BFT Devil

03/13/2018 ∙ by Ittai Abraham, et al. ∙ 0

We describe a protocol called `Hot-Stuff the Linear, Optimal-Resilience, One-Message BFT Devil' (in short, Hot-Stuff) for n = 3f+1 replicas, of which 2f+1 are honest, to agree on a replicated, ever-changing state. The protocol is always safe against a threshold f of Byzantine failures, even when the system is asynchronous. Progress is guaranteed under periods of synchrony. The per-round communication cost in Hot-Stuff is linear, hence O(n^2) overall cost to a decision during periods of synchrony, an improvement of O(n^2) over previous asynchronous BFT protocols. Hot-Stuff uses one type of message exchange, and is succinctly described in under twenty lines of pseudo-code.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Blockchains have renewed interest in the long-standing problem of Byzantine fault tolerant (BFT) state replication, while at the same time demanding novel foundations for it. The applications to which blockchains are typically applied (e.g., crypto-currencies) demand greater dynamism at scale, allowing block proposers to rotate among replicas frequently. Therefore, in addition to the typical safety requirement that any two correct replicas decide on the same sequence of blocks, we pose the following two goals:

Linearity

After GST222Global Stabilization Time. This is a common notion for partially synchronous settings, referring to periods of stable communication. See the Model section below for definition., any correct proposer, once designated, needs only linear communication to commit a block to the chain. This should include the case where a proposer is replaced.

Optimistic Responsiveness

After GST, any correct proposer, once designated, needs to wait just for the first responses in order to guarantee that it can create a proposal that will make progress. This should include the case where a proposer is replaced.

All previous BFT solutions achieve one goal or the other, but not both. Responsive but not Linear protocols include PBFT [12], Zyzzyva [30], BFT-SMaRt [6], Hybrid Consensus [37], Thunderella [38] (which also coined the term Optimistic Responsiveness), and SBFT [22]. Linear but not Responsive protocols include Tendermint [8] and Casper [10].

This paper presents a new BFT replication protocol called HotStuff that demonstrates both Linearity and Optimistic Responsiveness — to our knowledge, the first such protocol to do so. The price we pay for achieving this property is a small amount of additional latency in block commitment in comparison to several of the protocols mentioned above, owing to an extra message exchange on the critical path of block commitment.

HotStuff bears many features that have become common for BFT replication: It ensures safety in an asynchronous network provided that fewer than one-third of the replicas exhibit arbitrary (Byzantine) failures. It ensures liveness in partially synchronous systems, i.e., that blocks will be delivered once the network stabilizes and so messages are delivered within a known delay. And, it is structured using a proposer replica that drives the protocol forward by making proposals.

Here is where the similarities between HotStuff and prior solutions end. The prevailing approach in BFT solutions is to optimize for the case of a “stable” proposer (aka a leader) that changes infrequently (e.g., PBFT [12], Zyzzyva [30], BFT-SMaRt [6], SBFT [22]). In all of these protocols, a leader broadcasts a “proof of safety” along with it first proposal. The proof carries information collected from all (responsive) parties, each response consisting of at least a single authenticated value per pending decision, incurring communication complexity per leader replacement. This approach has Responsiveness but not Linearity. When Byzantine consensus protocols were originally conceived, a typical target system size was or , tolerating one or two faults. But scaling BFT consensus to modern blockchain scenarios (say with replicas) makes a recurring communication cost of () per leader replacement prohibitive. No matter how good the engineering and how we tweak and batch the system, these theoretical measures are a roadblock for scalability.

In contrast, HotStuff is optimized for the case of a proposer that changes routinely, even per block, and possibly to involve proposers who themselves are not replicas (i.e., are auxiliaries), as a means to implement Chain Quality [20]. During periods of synchrony, a correct proposer in HotStuff, once designated, requires only communication, the minimum required to disseminate a new decision. Thus, HotStuff provides a far more scalable basis for blockchains than the above protocols. Indeed, many blockchain protocols employ routine, frequent leader replacements. Bitcoin-NG [18] and Byzcoin [29] rotate leaders/participants by following the progress of successful miner entities in a blockchain; Solida [4] is a chain-less protocol that rotates the proposer on each block based on Proof-of-Work directly; In Casper [10], proposers are auxiliary to the set of BFT “validator” replicas; ALGORAND [21] and Dfinity [23] rotate proposers on each block by cryptographic sampling.

Recently, Tendermint [8, 9] and Casper [10] introduced BFT solutions based on a different approach, that has a linear proposer replacement protocol but not Optimistic Responsiveness. In Tendermint/Casper, a replica has a “Preferred Block” and votes only on a branch extending it. A new proposer needs to discover this “Preferred Block” by waiting to hear from all correct replicas, hence liveness hinges on the proposer waiting a priori maximal network delay before making a new proposal. The actual network delay is typically much lower than any a-priori known upper bound, hence responsiveness suffers.

HotStuff instead delays the commit decision an extra voting round. Instead of a typical 2-step commit, a 3-step commit means that a proposer can discover the Preferred Block by waiting to hear only from replicas. Hence, in HotStuff, any correct proposer designated after GST incurs only communication complexity, and requires to wait for the first responses, to make progress. To our knowledge, HotStuff is the first protocol to achieve both of these properties simultaneously, and so provides a far more scalable basis for building BFT blockchains for large .

HotStuff has the additional benefit of being remarkably simple, owing in part to its economy of mechanism: There are only two message types (proposals and votes) and simple rules to determine how a replica treats each. Safety is specified via voting and commit rules over graphs of blocks. The mechanisms needed to achieve liveness are encapsulated within a Pacemaker, cleanly separated from the mechanisms needed for safety. At the same time, it is expressive in that it allows the representation of several known protocols (DLS [17], PBFT [12], Tendermint [8], and Casper [10]) as minor variations. In part this flexibility derives from its operation over a graph of blocks, in a way that forms a bridge between classical BFT foundations and modern blockchains.

We describe a prototype implementation and a preliminary evaluation of HotStuff. Deployed over a network with over a hundred replicas, HotStuff achieves throughput and latency comparable to, and sometimes exceeding, those of mature systems such as BFT-SMaRt, whose code complexity far exceeds that of HotStuff. We further demonstrate that the communication footprint of HotStuff remains constant in face of frequent proposer replacements, whereas BFT-SMaRt grows quadratically with the number of replicas.

Contributions

To recap, HotStuff is a BFT replication solution that is safe and live in partially synchronous networks and that embodies the first linear-cost proposer-replacement protocol. It is cast in a novel framework that illuminates the BFT world in the lens of blockchains. More concretely, we present the following contributions.

  1. HotStuff, a new protocol whose proposer incurs linear communication complexity (Linearity), and does not need to wait the maximal network delay (Optimistic Responsiveness). To our knowledge, HotStuff is the first BFT replication protocol for partially synchronous systems exhibiting both these properties.

  2. A framework for BFT replication over graphs of blocks. Safety is specified via voting and commit graph rules. Liveness is specified separately via a Pacemaker that extends the graph with new blocks.

  3. A casting of several known protocols (DLS [17], PBFT [12], Tendermint [8], Casper [10]) and one new (ours, HotStuff), in this framework.

2 Related work

Reaching consensus in face of Byzantine failures was formulated as the Byzantine Generals Problem by Lamport et al. [33], who also coined the term “Byzantine failures”. The first synchronous solution was given by Pease et al. [39], and later improved by Dolev and Strong [16]. The improved protocol has communication complexity, which was shown optimal by Dolev and Reischuk [15]. A leader-based synchronous protocol that uses randomness was given by Katz and Koo [28], showing an expected constant-round solution with resilience.

Meanwhile, in the asynchronous settings, Fischer et al. [19] showed that the problem is unsolvable deterministically in asynchronous setting in face of a single failure. Furthermore, an resilience bound for any asynchronous solution was proven by Ben-Or [5]. Two approaches were devised to circumvent the impossibility. One relies on randomness, initially shown by Ben-Or [5], using independently random coin flips by processes until they happen to converge to consensus. Later works used cryptographic methods to share an unpredictable coin and drive complexities down to constant expected rounds, and communication [11].

The second approach relies on partial synchrony, first shown by Dolev, Lynch, and Stockmeyer (DLS) [17]. This protocol preserves safety during asynchronous periods, and after the system becomes synchronous, DLS guarantees termination. Once synchrony is maintained, DLS incurs communication and rounds per decision.

State machine replication (SMR) [31, 42] relies on consensus at its core to order client requests so that correct replicas execute them in this order. The recurring need for consensus in SMR led Lamport to devise Paxos [32], a protocol that operates an efficient pipeline in which a stable leader drives decisions with linear communication and one round-trip. A similar emphasis led Castro and Liskov [12] to develop an efficient leader-based Byzantine SMR protocol named PBFT, whose stable leader requires communication and two round-trips per decision, and the leader replacement protocol incurs communication. PBFT has been deployed in several systems, including BFT-SMaRt [6]. Kotla et al. introduced an optimistic linear path into PBFT in a protocol named Zyzzyva [30], which was utilized in several systems, e.g., Upright [14] and Byzcoin [29]. The optimistic path has linear complexity, while the leader replacement protocol remains . Abraham et al. [1] later exposed a safety violation in Zyzzyva, and presented fixes [2, 22]. On the other hand, to also reduce the complexity of the protocol itself, Song et al. proposed Bosco [44], a simple one-step protocol with low latency on the optimistic path, requiring replicas. SBFT further reduces the communication complexities throughput by factor by harnessing two methods: A collector-based communication paradigm by Reiter [41], and signature combining via threshold cryptography on protocol votes by Cachin et al. [11].

A leader-based Byzantine SMR protocol that employs randomization was presented by Cachin et al. [40], and a leaderless variant named HoneyBadgerBFT was developed by Miller et al. [34]. At their core, these randomized Byzantine solutions employ randomized asynchronous Byzantine consensus, whose best known communication complexity is (see above), amortizing the cost via batching.

BitCoin’s core is a protocol known as Nakamoto Consensus [35], a synchronous protocol with only probabilistic consensus guarantee and no finality (see analysis in [20, 36, 3]). It operates in a permissionless model where participants are unknown, and resilience is kept via Proof-of-Work. As described above, recent blockchain solutions hybridize Proof-of-Work solutions with classical BFT solutions in various ways [18, 29, 4, 10, 21, 23, 38]. The need to address rotating leaders in these hybrid solutions and others provide the motivation behind HotStuff. We adopt the notion of Optimistic Responsiveness from one of these, Thunderella [38].

3 System Model

HotStuff provides an algorithmic framework for a family of protocols that solve multi-decree, quorum-based BFT replication, providing the same guarantees as, e.g., PBFT [12]. There is abundant literature that provides precise definitions for this model, including  [12, 13]. Here, we briefly sketch the main ingredients.

A system consists of replicas. Correct replicas remain live and follow their protocol. A threshold of the replicas may be Byzantine; their behavior is arbitrary but computationally bounded. In particular, they cannot forge the cryptographic signature of a correct replica .

The communication is partially synchronous, meaning that there is a time after which all messages among correct replicas arrive within a known bound . In practice, the system may swing between periods of synchrony and asynchrony, and termination will be guaranteed as soon as the system is synchronous for a sufficiently long period. However, to simplify the discussion, it suffices to terminate after (no guarantee can be provided before [19]).

The key performance measure of interest to us will be the “communication footprint” of the protocol after . measured as the number of signature or MAC verifications performed upon message processing. Beyond measuring the number of transmitted bits, this metric underscores the cryptographic computation load associated with transferring authenticated values. The minimal communication footprint for an update to be committed and executed is , as it needs to be disseminated to all replicas.

4 QC Chains

This section introduces the basic ingredients and concept used in HotStuff.

Figure 1: Parent pointers (solid arrows) and QC references (dashed arrows) in HotStuff.

Blocks.

The logical unit of replication in HotStuff is a block, depicted as a rectangle in Figure 1 above. A block proposal can either be committed or rejected according to a consensus decision, and the goal of BFT replication is to form agreement on a growing sequence (chain) of blocks. Blocks are opaque to the replication protocol. In practice, they can contain multiple state-machine “commands”. For ease of exposition, in our description we ignore batching and treat each block as a single command.

Chaining.

A block contains a reference to (a cryptographic digest of) a parent block. We use to denote a parent link, e.g., . Thus blocks are linked into chains of growing heights, forming a tree rooted at an initial “genesis” block. We denote the block’s height by . Note that a height is a unifying counter, replacing commonly used dual counters like a sequence number and a view number.

Two blocks (aka, Bee and Wasp) are on the same branch if one block is reachable by following parent links from the other. In notation, the path relation “” is the reflexive transitive closure of the parent relation. In particular, a block is on the path to itself: . Two blocks are conflicting iff. . Conflicting proposals may occur when a proposer for a height equivocates, or when multiple proposers contend for the decision at the same height. For example, in the blockchain depicted in Figure 1, is the parent of , and all blocks conflict with all blocks.

When a block becomes committed, the branch of parents from the block all the way back to the genesis block becomes committed as well.

Votes and Quorum Certificates (QC).

Replicas can post signed votes for blocks. A correct replica votes at most once per height, at increasing heights. Whenever a replica handles a block , we assume that it has already delivered and all its induced ancestors, and that all the content and signatures in are valid.

When votes for a block are collected, a Quorum Certificate (QC) can be created for it, proving votes.

The mechanism for collecting votes is intentionally left unspecified, and varies between different implementations. In particular, borrowing a known technique in the Byzantine protocols literature  [41, 11], signed votes from distinct replicas on the same block may be combined using threshold cryptography (e.g., [43, 7]) into a single signature representing the set.

QC Chains.

QCs are embedded within blocks, creating a certification reference from one block to another. For convenience, the macro returns the block referred to by the QC embedded in . A QC reference need not be a direct parent, but it must refer to an ancestor. We use to denote a QC reference, e.g., .

HotStuff commit decisions are based purely on analyzing QC chains in a graph of blocks. The following chain-structures will be used repeatedly in the protocol (and in casting other protocols as variants):

1-chain

A 1-chain implies that there are correct replicas that voted for at .

2-chain

A 2-chain implies that there are correct replicas that voted for , and hence, received the 1-chain .

3-chain

A 3-chain implies that there are correct replicas that voted for , and hence, they received the 2-chain .

5 HotStuff

In a nutshell, the HotStuff blockchain protocol has a three-step quorum-certificate (QC) Commit Rule, depicted in Figure 2 below.

Figure 2: A commit sequence on HotStuff blockchain.

The HotStuff Commit Rule.

A block is considered a committed decision if there is a 3-chain headed by , , where the first three blocks form a direct ancestry .

The HotStuff Preferred Block Rule.

A replica has a Preferred Block determined by the highest 1-chain it received. Let be the highest 1-chain a replica received. Then the Preferred Block, , for the replica is . A replica votes for a block only if it extends a branch from its Preferred Block, i.e., .

HotStuff Protocol.

The HotStuff protocol is event-driven, responding to the arrival of two types of messages, proposals and votes. Both types of messages carry the highest-QC of the sender, denoted , at the time of sending. Upon receiving either a proposal or a vote message (votes are collected if the replica itself is the next proposer), a replica first updates the highest-QC it knows (). It then proceeds to process the event-handling logic as follows:

Proposal Handling (onReceiveProposal): Upon receiving a proposal message , replica checks if the proposal extends a branch from the Preferred Block, , and it has not voted at this height or higher already. Once it votes for , updates its to the height of .

A replica also checks whether the Commit Rule defined above holds on the branch with the new proposal. That is, it checks whether the new tail of the branch forms a Commit 3-chain with some yet uncommitted head . In case of a new Commit decision, commits all the blocks from the last committed block to .

Vote handling (onReceiveVote): Upon receiving a positive vote message from , proposer collects the vote until it can form a QC for some block.

HotStuff Pacemaker.

A Pacemaker is a mechanism that guarantees progress after GST. The Pacemaker achieves this through two ingredients.

The first one is “synchronization”, bringing all correct replicas, and a unique proposer, into a common height for a sufficiently long period. The usual synchronization mechanism in the literature [17, 12, 9] is for replicas to increase the count of ’s they spend at larger heights, until progress is being made. A good way to deterministically elect a leader is to use a rotating proposer scheme in which all correct replicas keep a predefined proposer schedule and rotate to the next one when the proposer is demoted.

Second, a Pacemaker needs to provide the proposer with a way to choose a proposal that does not conflict with the Preferred Branch of any correct proposer. This is done so that indeed, all correct proposers will vote for it.

In HotStuff, the second task is rather easy. A proposer only needs to collect messages from a set of replicas to discover the highest QC held by any correct replica. It then chooses to extend a branch from the Preferred Block determined by it.

Once the Pacemaker thinks it is a good time to propose, it invokes onPropose extending a branch from its Preferred Block. A proposal includes the highest QC known to the proposer by collecting votes, and a command to execute.

It is worth noting that even if a replica invokes the onPropose arbitrarily, or selects a parent and a QC capriciously, and against any scheduling delays, safety is always guaranteed. Therefore, safety is entirely decoupled from liveness.

5.1 HotStuff Full Specification

The full HotStuff pseudo-code is provided in Figure 3. We first list the data structures maintained by replicas and their initial values. We then proceed to describe the protocol logic in an event-driven style, responding to two message types and to the Pacemaker initiating proposals.

HotStuff Data Structures.

Each replica keeps track of the following main state variables:

  • // genesis block known by all correct replicas. To bootstrap, it contains a hard-coded QC for itself.

  • // mapping from a block to its votes.

  • // mapping from a block to its known QC.

  • // height of last voted block.

  • // last executed block.

  • // tail of the highest 1-chain.

Pseudo-code for replica 1:// begin: rules specific to 3-step HotStuff in framework 2:function getPref() 3:function checkCommit 4:     // check for a Commit 3-chain 5:      6:      7:      8:     if ( then 9:          onCommit(); return true 10:     elsereturn false       11:// end 12:// begin: generic HotStuff framework logic 13:procedure finishQC() 14:      15:procedure onCommit() 16:     if  then 17:           18:                 19:procedure update() 20:     if  then 21:                 22:     if checkCommit then       23:procedure onReceiveProposal() 24:     update() 25:     if  then 26:           27:           28:          send(nextProposer(), )       29:procedure onReceiveVote() 30:     update() 31:     if  then return       32:     // collect votes for 33:      34:     if  then finishQC()       35:procedure onPropose() 36:      37:     // send to all replicas, including itself 38:      39:// end

Figure 3: The full specification for HotStuff.

QC

cmd

QC

cmd

Figure 4: and both getting committed (impossible).

6 HotStuff Proof of Safety

Lemma 1.

Let and be two conflicting blocks such that , then they cannot both have valid quorum certificates.

Proof.

Suppose they can, so both and receive votes, among which there are at least honest replicas voting for each block, then there must be an honest replica that votes for both, which is impossible because and are of the same height. ∎

Lemma 2.

Let and be two conflicting blocks. Then they cannot both become committed, each by an honest replica.

Proof.

We prove this important lemma by contradiction. Let and be two conflicting blocks at different heights. Assume during an execution, becomes committed at some honest replica via the QC 3-chain ; likewise, becomes committed at some honest replica via the QC 3-chain . By Lemma 1, since each of the blocks have QC’s, then w.l.o.g., we assume , as shown in Figure 4.

There exists at least one honest replica, say , that has voted for both and . Due to the monotonicity of , must first vote for and then . When votes for , its preferred block is . According to the voting rule at line 25, could only vote for if ’s preference for has been changed.

This happens if there exists a QC for a block higher than at least causing the preference changed from to another block.

We assume the first change of preferred block is from to . Formally, we define the following predicate for any block :

We can now set the first switching point :

For example, itself could potentially be such , if no other lower eligible blocks exist.

Since has its QC contained in some block, there are votes for . Then, there is an honest replica, say , that votes for and then for ( could be , but not necessarily). But at the time of voting for , has already known seen , and its preferred block has been . Since conflicts with , will not vote for according to the voting rule, leading to a contradiction. ∎

Theorem 3.

Let and be any two commands where is executed before by some honest replica, then any honest replica that executes must executes before .

Proof.

Denote by the block that carries , carries . From Lemma 1, it is clear the committed blocks are at distinct heights. Without loss of generality, assume . The commit of are are triggered by some and , where and . According to Lemma 2, must not conflict with , so does not conflict with . Then , and when any honest replica executes , it must first executes by the logic in onCommit. ∎

6.1 Remarks

In order to shed insight on the tradeoffs taken in the HotStuff design, we explain why certain constraints are necessary for safety.

Why monotonic vheight?

Suppose we change the voting rule such that a replica does not need to vote monotonically, as long as it does not vote more than once for each height. The weakened constraint will break safety. For example, a replica can first vote for and then . Before learning about , it first delivers , assuming is preferred, and vote for . When it eventually delivers , it will flip to the branch led by because it is eligible for being preferred, and is higher than . Finally, the replica will also vote for , causing the commit of both and .

Why direct parent?

The direct parent constraint is used to ensure the equality used in the proof, with the help of Lemma 1. Suppose we do not enforce the rule for commit, so the commit constraint is weakened to instead of (same for ). Consider the case where . Chances are, a replica can first vote for , and then discover to switch to the branch by , but it is too late since could be committed.

7 HotStuff Proof of Liveness

For a deterministic replication system, liveness is the property that the system will make progress after GST. In a rotating proposer replication system, liveness is proven by showing that any correct proposer after GST is guaranteed to make some progress. To capture this, we make a distinction between two different properties:

Plausible Liveness: After GST, any correct proposer, once designated, needs to wait for the maximum network delay in order to guarantee he can create a proposal that will make progress.

Optimistic Responsiveness: After GST, any correct proposer, once designated, needs to wait just for the first responses in order to guarantee he can create a proposal that will make progress.

Clearly Optimistic Responsiveness is a stronger and more desirable property than Plausible Liveness. Optimistic Responsiveness allows the system to move at the speed of the network instead of always waiting the full network delay each round.

Before we prove HotStuff responsiveness, lets quickly review the Plausible Liveness statement and proof for Casper. Indeed Casper proves that no matter what the system configuration is, there exists some block such that if a proposer suggests this block then this block can have a chain of QC. The proof is quite simple: just look at the block with a QC that has the greatest height and the greatest height in which any correct replica (aka validator, in Casper) made a vote. Then a block extending with height can have a chain of QC (without violating any Casper commandment). The reason this argument provides only Plausible Liveness is that the new correct proposer may need to wait for all correct validators in order to learn about the block with a QC that has the greatest height. In particular, it could be that only one correct validator has block as its Preferred Block and the adversary can cause this validator to be as slow as the maximum network delay.

The crux of our improved Optimistic Responsiveness is the following property of our 3-chain protocol:

Lemma 4.

If any correct replica has block as its Preferred Block then there exists at least correct replicas that have a QC on in a block .

Proof.

If a correct replica has block as its Preferred Block then it means it has a 2-chain . Since the QC in for contains votes, this means that at least correct replicas have voted for . Each correct replica that voted on must have seen the 1-chain . ∎

We now show that a correct proposer needs to collect just responses in order to choose a block that is guaranteed to make progress after GST.

Theorem 5.

A correct proposer that waits for just responses has enough information to propose a block that is guaranteed to be able to make progress after GST.

Proof.

Let be the Preferred Block in the highest 1-chain by any correct replica. From the Lemma above it follows that any set of responses must intersect with the correct replicas that have the highest 1-chain, i.e., . Hence the correct proposer will follow the branch led by and extend with a new block , and by the Preferred Block Rule all correct replicas will set their Preferred Block to or will already have as Preferred Block and this will allow all correct replicas to vote for . ∎

8 Model Checking HotStuff

A feature of HotStuff is its economy of mechanism, involving only a single message type for proposing, a single message type for voting, and simple Commit and Preferred Branch Rules. At the same time, the reasoning for its safety is subtle and deceptively simple—not unlike that for other designs that have subsequently been identified as problematic (e.g., [1]). This combination of features prompted us to explore model checking to confirm the safety of HotStuff. To do so, we implemented a model of the protocol in Promela, a modeling language for concurrent systems, and model-checked the protocol using Spin [26].

Promela provides facilities that make it well-suited for our task. In particular, Promela provides support for message channels that implement point-to-point, FIFO, asynchronous communication between processes that run concurrently. In our model, each replica is its own process, and the leader is a separate process from the replicas. Even if in practice, the leader is one of the replicas, it is cleaner in our model to separate these roles into separate Promela processes, since the data structures and logic associated with each are mostly disjoint. (The few exceptions are represented as global state in the Promela model.) Within this structure, we implemented (correct) replicas to behave per the algorithm described above.

We were concerned only with validating safety, which we expressed using the following assertion, confirmed every time any replica delivered a block: each other replica has delivered a set of blocks so far that is either a subset or a superset of those delivered so far by this replica. Because this assertion refers only to unordered sets, it could succeed in a single execution of the protocol in which replicas delivered blocks in different orders (e.g., if replica i delivers block 1 and then block 2, and then replica j delivers block 2 and then block 1). However, the presence of such a protocol execution would imply that there is another protocol execution in which this assertion would fail (e.g., where replica i delivers block 1 and then replica j delivers block 2). We thus adopted this weaker assertion because it could be tested using fast bit operations on a compact indicator of blocks delivered; specifically, delivered[i] & ~delivered[j] == 0 indicates that delivered[i] is a subset of delivered[j], where delivered[i]

is a bit vector with a

-th bit of 1 if replica i delivered block , and a -th bit of 0 otherwise (and similarly for delivered[j]).

Model checking protocols designed to tolerate Byzantine faults is a challenging task; modeling Byzantine behavior allows, by definition, a wide range of behaviors by a faulty node, and model checking should explore all of them, in principle. Our model captures the behaviors that we believe are of most risk to safety, allowing a Byzantine-faulty leader to propose blocks with arbitrary parent blocks, to propose multiple blocks at the same height (equivocation), and to propose each block to an arbitrary subset of the replicas. In addition, a faulty replica will return a vote for any block it receives. These degrees of freedom, together with the concurrent interleavings that must be accounted for in any system involving concurrent processes, already renders the model checking to be challenging for even small numbers of replicas and blocks.

While admittedly an imperfect count of model complexity, our model consists of 157 lines containing at least one semicolon (;) or arrow (->), the statement separators (vs. terminators) in Promela. We have leveraged the “swarm” capabilities [27] of Spin to model check a four-replica (one faulty) plus one (faulty) leader model in which the leader proposes up to seven blocks. The model-checking run leveraged eight Intel Xeon Gold 6144 cores (3.50GHz) and 200 gigabytes of memory over the course of two weeks. Our analysis used bitstate hashing, a lossy state-compression technique that risks loss of coverage but reduces memory consumption [24, 25]. As such, the completion of this analysis with no errors does not itself guarantee correctness. However, the failure of Spin to find any safety violation already provides a strong degree of confidence in our design. We are continuing our analysis with Spin in the hopes of reporting a complete checking of our model for limited systems sizes in the next version of this paper.

(a) 1-chain (DLS, 1988) (b) 2-chain (PBFT, 1999) (c) 2-chain w/ delay (Tendermint, 2016)
(d) 2-chain w/ delay (Casper, 2017) (d) 3-chain (HotStuff, 2018)
Figure 5: Commit Rules

9 Protocols in the Lens of HotStuff

To further illustrate the generic applicability of the HotStuff framework, in this section, we articulate the Commit Rule and Preferred Block Rule of several BFT protocols, and compare them with HotStuff.

Figure 5 provides a birds-eye view of the Commit Rules of five protocols we consider, including HotStuff.

In a nutshell, the Commit Rule in DLS [17] is one-step, allowing a block to be committed only by its own proposer. The Commit Rules in PBFT [12], Tendermint [8] and Casper [10] are almost identical, and consist of a two-step quorum-certificate (QC). More specifically, in PBFT, Tendermint, and Casper, a block is considered a committed decision if there is a 2-chain , where is a direct parent of . PBFT and Tendermint additionally require that be a direct parent of . In Casper, this requirement is relaxed, and the QC inside does not need to refer to a direct parent. HotStuff is distinguished from all of these methods by requiring a 3-chain, where the first three blocks are linked via direct parent/child links.

9.1 Dls

The simplest Commit Rule is a 1-chain where is a direct parent of , i.e., . Modeled after Dwork, Lynch, and Stockmeyer (DLS) [17], the first known asynchronous Byzantine Consensus solution, this rule is depicted in Figure 5(a). The Preferred Block, which is called locked in DLS, is simply the highest block a replica voted for. Unfortunately, this Preferred Block rule may easily lead to a deadlock if at some height, a proposer equivocates, and two correct replicas vote for conflicting proposals at that height. Seeing the conflicting Preferred Blocks, other replicas have no way to know if either one received votes, and we are stuck.

DLS resolves this deadlock by letting only one designated proposer per height reach a Commit Decision by the 1-chain Commit Rule. Only the proposer itself is harmed if it has equivocated. Replicas give up their Preferred Block (unlock) if they obtain evidence of conflicting proposals. The unlocking step occurring at the end of each height in DLS turns out to be fairly complex and expensive. Together with the fact that only the proposer for a height can decide, in the best scenario where no fault occurs and the network is timely, DLS requires proposer rotations, and message transmissions, per single decision. While it broke new ground in demonstrating a safe asynchronous protocol, DLS was not designed as a practical solution.

9.2 Pbft

The Commit Rule in PBFT [12, 13] consists of a 2-chain of direct parents , see Figure 5(b). With a two-step Commit Rule, a replica uses the head of the highest 1-chain it voted for as a Preferred Block. Conflicting Preferred Blocks at the same height are simply not possible, as each Preferred Block has a QC.

If fewer than correct replicas know about a highest 1-chain, a proposer might not know about it even if it collects information from replicas. This can lead to a situation where two correct proposers have Preferred Blocks on conflicting branches, as depicted in Figure 6(a). To get “unstuck”, PBFT allows a replica to move from its Preferred Block to the proposer’s branch when it is safe to do so (Figure 6(b)). To this end, a PBFT proposal carries a proof that any higher 1-chain is not voted on by replicas, hence it cannot become part of committed decision. This proof is quite involved, as explained below.

(a) conflicting Preferred Blocks (b) Preferred Block abandoned
Figure 6: Preferred Blocks in PBFT

The “Vanilla” version of PBFT, which has been open-sourced [12] and adopted in several follow up works [6, 30], has communication complexity per proposer replacement. A new proposer proof contains a set of messages collected from reporting the highest 1-chain each member voted for. Each 1-chain contains a QC, hence the total cost is . Harnessing signature combining methods from [41, 11], SBFT [22] reduces this cost to by turning each QC to a single value.

In the PBFT variant in  [13], which we will name “Strawberry”, a proposer proof contains the highest 1-chain the proposer collected from the quorum only once. It also includes one signed value from each member of the quorum, proving that it did not vote for a higher 1-chain. Broadcasting this proof incurs communication complexity . Note that whereas the signatures on the QC may be combined into a single value, the proof as a whole cannot be reduced to constant size because messages from different members of the quorum may have different values.

In both the Vanilla and the Strawberry variants, a correct member moves to the proposer’s block/branch even if the highest 1-chain it voted conflicts with it and has a higher QC! The power of explicitly including a quorum of messages is that a correct proposer can force its proposal to be accepted during period of synchrony. The cost is quadratic communication per proposer replacement.

9.3 Tendermint and Casper

Having explained PBFT in detail, describing Tendermint [8, 9] and Casper [10]

follows easily. Tendermint has a 2-chain Commit Rule identical to PBFT, and Casper has a 2-chain rule in which the second direct-parent requirement is relaxed. That is, in Casper, the Commit Rule consists of a 2-chain

, where is a direct parent of , i.e., , but is not necessarily a direct parent of . Figure 5(c,d) depicts the Commit Rules for Tendermint and Casper, respectively.

Importantly, the Preferred Block Rule in Tendermint and Casper embodies a leap in performance over PBFT. The rule always favors the highest known 1-chain, a replica simply never moves to the proposer branch unless that branch has the highest 1-chain known to the replica. Because honest replicas may not vote for a proposer’s block, to guarantee progress a new proposer and all replicas must obtain the highest 1-chain by waiting the maximum network delay. Otherwise, if the protocols only progress based on responses from the first 1-chain without waiting, there could be an infinite execution path. For example, suppose it takes the maximum delay for the highest block in a 1-chain to be known by the proposer and the proposer broadcasts a new block before that. Then it is possible that fewer than nodes will vote for the new block because more than correct nodes prefer the branch led by and thus the proposer cannot get a QC for . This process can repeat indefinitely in such a bad scenario.

As for the communication footprint, casting Tendermint and Casper into the HotStuff network, a proposer could collect and disseminate the highest 1-chain using linear communication only, same as HotStuff. However, crucially, due to the extra QC step, HotStuff does not require the proposer to wait the maximum network delay.

10 Evaluation

We have implemented the HotStuff protocol as a library in roughly 4K lines of C++ code. Most noticeably, the core consensus logic specified in the pseudocode consumes only around 200 lines. Thanks to the notion of a Pacemaker, we decouple safety and liveness handling entirely, simplifying each. In this section, we will first examine baseline throughput and latency by comparing to a state-of-art system, BFT-SMaRt [6]. We then focus on the message cost for view changes to see our advantages in this scenario.

10.1 Setup

We conducted our experiments on Amazon EC2 using c5.4xlarge instances. Each instance had 16 vCPUs supported by Intel Xeon Platinum 8000 processors. All cores sustained a Turbo CPU clock speed up to 3.4GHz. We ran each replica on a single VM instance, and so BFT-SMaRt, which makes heavy use of threads, was allowed to utilize 16 cores per replica, as in their original evaluation [6]. The network bandwidth on each machine was 3500Mbps, and we did not limit bandwidth in any runs of either system.

Our prototype implementation of HotStuff uses secp256k1 for all digital signatures in both votes and quorum certificates. BFT-SMaRt uses hmac-sha1 for MACs (Message Authentication Codes) in the messages during normal operation and uses digital signatures in addition to MACs during a view change.

All results for HotStuff reflect end-to-end measurement from the clients. For BFT-SMaRt, we used the micro-benchmark programs ThroughputLatencyServer and ThroughputLatencyClient from the BFT-SMaRt website (https://github.com/bft-smart/library). The client program measures end-to-end latency but not throughput, while the server-side latency measures both throughput and latency. We used the throughput results from servers and the latency results from clients.

10.2 Base Performance

We first measured throughput and latency in a setting commonly seen in the evaluation of other BFT replication systems. We ran 4 replicas in a configuration that tolerates a single failure, i.e., , while varying the operation request rate until the system saturated. This benchmark used empty (zero-sized) operation requests and responses and triggered no view changes; we expand to other settings below.

Figure 7: Throughput vs. latency with different choices of batch size, 4 replicas, 0/0 payload. Figure 8: Throughput vs. latency with different choices of payload size with 4 replicas, batch size 400.

Figure 7 depicts three batch sizes for both systems, 100, 400, and 800, though because these systems have different batching schemes, these numbers mean slightly different things for each system. BFT-SMaRt drives a separate consensus decision for each operation, and batches the messages from multiple consensus protocols. Therefore, it has a typical L-shaped latency/throughput performance curve. HotStuff batches multiple operations in each block, and in this way, mitigates the cost of digital signatures per decision. However, above operations per batch, the latency incurred by batching becomes higher than the cost of the crypto. Despite these differences, HotStuff achieves comparable performance to BFT-SMaRt for all three batch sizes.

For batch sizes of 100 and 400, the lowest-latency HotStuff point provides latency and throughput that are comparable to the latency and throughput simultaneously achievable by BFT-SMaRT at its highest throughput. Recall that HotStuff makes use of the first phase of the next decision to piggyback the final phase of the previous one, and so requires a moderate rate of submitted operations to achieve its lowest-latency point. Or, in other words, reducing the rate of submitted operations further only increases latency. In practice, this operation rate could be ensured by generating dummy blocks to finalize a decision when there are insufficient outstanding operations to do so naturally. However, for these runs we did not generate dummy blocks, but instead just ensured that operations were submitted at rate sufficient to demonstrate low latency, which also happened to roughly maximize throughput, as well.

Figure 8 depicts three client request/reply payload sizes (in bytes) of 0/0, 128/128, and 1024/1024, designated by “p0”, “p128”, and “p1024”. At payload size of 1024 bytes, the reduced communication complexity of HotStuff due to its linear protocol and use of signatures already (slightly) outperforms BFT-SMaRt.

Below we evaluate both systems in more challenging situations, where the performance advantages of HotStuff will become more pronounced.

10.3 Scalability

To evaluate the scalability of HotStuff in various dimensions, we performed three experiments. For the baseline, we used zero-size request/response payloads while varying the number of replicas. The second evaluation repeated the baseline experiment with 1024-byte request/response payloads. The third test repeated the baseline (with empty payloads) while introducing network delays between replicas that were uniformly distributed in 5ms

0.5ms or in 10ms 1.0ms, implemented using NetEm (see https://www.linux.org/docs/man8/tc-netem.html).

The first setting is depicted as the “p0” curves in Figure 9 (throughput) and Figure 10 (latency). BFT-SMaRt shows (slightly) better scaling in both measures than HotStuff. This is mainly due its faster MAC computations. In the future, we plan to reduce the cryptographic computation overhead in HotStuff by using (i) a faster signature scheme and (ii) employing threshold signatures.

The second setting with payload size 1024 bytes is denoted by “p1024” in Figure 9 (throughput) and Figure 10 (latency). Due to its quadratic bandwidth cost, BFT-SMaRt performed worse than HotStuff in both.

The third setting is shown in Figure 11 (throughput) and Figure 12 (latency) as “5ms” or “10ms”. Again, due to the larger use of communication in BFT-SMaRt, HotStuff consistently outperformed BFT-SMaRt in both cases.

Figure 9: Throughput vs. number of nodes with payload size 0/0 and 1024/1024. Figure 10: Latency vs. number of nodes with payload size 0/0 and 1024/1024.
Figure 11: Throughput vs. number of nodes with inter-replica latency 5ms and 10ms. Figure 12: Latency vs. number of nodes with inter-replica latency 5ms 0.5ms or 10ms 1.0ms.

10.4 View Change

To evaluate the communication complexity of proposer replacement, we counted the number of signature or MAC verifications performed within BFT-SMaRt’s view-change protocol. Our evaluation strategy was as follows. We injected a view change into BFT-SMaRt every one thousand blocks. We instrumented the BFT-SMaRt source code to count the number of verifications upon receiving and processing messages within the view-change protocol. Beyond communication complexity, this measurement underscores the cryptographic computation load associated with transferring these authenticated values.

Figure 13 shows the number of extra authenticators (MACs or signatures) processed for each view change. The figure depicts separate counts for MACs and for signatures, and shows that BFT-SMaRt uses super-linear numbers of both. HotStuff does not require extra authenticators for view changes and so is omitted from the graph.

Evaluating the real-time performance of proposer replacement is tricky. First, BFT-SMaRt got stuck when triggering frequent view changes; our authenticator-counting benchmark had to average over as many successful view changes as possible before the system got stuck, repeating the experiment many times. Second, the actual elapsed time for proposer replacement depends highly on timeout parameters, on the Pacemaker, and on the proposer-election mechanism. It is therefore impossible to provide a meaningful comparison.

Figure 13: Number of extra authenticators used for each BFT-SMaRt view change.

11 Conclusion

In this paper, we introduced HotStuff, a practical, quorum-based, 3-step BFT consensus protocol that does not distinguish the normal operation and view change at its core, using an append-only tree-like data structure. It has linear communication cost with Optimistic Responsiveness and provides a generic framework that bridges the classical BFT family with blockchains. Despite its simplicity, the HotStuff prototype implementation achieves comparable performance to a state-of-art BFT replication system.

References

  • [1] Ittai Abraham, Guy Gueta, Dahlia Malkhi, Lorenzo Alvisi, Ramakrishna Kotla, and Jean-Philippe Martin. Revisiting fast practical Byzantine fault tolerance. CoRR, abs/1712.01367, 2017.
  • [2] Ittai Abraham, Guy Gueta, Dahlia Malkhi, and Jean-Philippe Martin. Revisiting fast practical Byzantine fault tolerance: Thelma, velma, and zelma. CoRR, abs/1801.10022, 2018.
  • [3] Ittai Abraham and Dahlia Malkhi. The blockchain consensus layer and BFT. Distributed Computing column of the Bulletin of the EATCS, http://bulletin.eatcs.org/index.php/beatcs/article/view/506, fall, 2017.
  • [4] Ittai Abraham, Dahlia Malkhi, Kartik Nayak, Ling Ren, and Sasha Spiegelman. Solida: A cryptocurrency based on reconfigurable Byzantine consensus. In OPODIS, December, 2017.
  • [5] Michael Ben-Or. Another advantage of free choice (extended abstract): Completely asynchronous agreement protocols. In Proceedings of the Second Annual ACM Symposium on Principles of Distributed Computing, PODC ’83, pages 27–30, New York, NY, USA, 1983. ACM.
  • [6] Alysson Bessani, João Sousa, and Eduardo E. P. Alchieri. State machine replication for the masses with BFT-SMaRt. In Proceedings of the 44th IEEE/IFIP International Conference on Dependable Systems and Networks, DSN ’14, pages 355–362, Washington, DC, USA, 2014. IEEE Computer Society.
  • [7] Dan Boneh, Ben Lynn, and Hovav Shacham. Short signatures from the weil pairing. J. Cryptol., 17(4):297–319, September 2004.
  • [8] Ethan Buchman. Tendermint: Byzantine fault tolerance in the age of blockchains. https://atrium.lib.uoguelph.ca/xmlui/handle/10214/9769, Thesis, 2016, University of Guelph.
  • [9] Ethan Buchman, Jae Kwon, and Zarko Milosevic. The latest gossip on BFT consensus. ArXiv, https://arxiv.org/abs/1807.04938, 2018.
  • [10] Vitalik Buterin and Virgil Griffith. Casper the friendly finality gadget. CoRR, abs/1710.09437, 2017.
  • [11] Christian Cachin, Klaus Kursawe, and Victor Shoup. Random oracles in constantinople: Practical asynchronous Byzantine agreement using cryptography. J. Cryptol., 18(3):219–246, July 2005.
  • [12] Miguel Castro and Barbara Liskov. Practical Byzantine fault tolerance. In Proceedings of the 3rd Symposium on Operating Systems Design and Implementation, OSDI ’99, pages 173–186, Berkeley, CA, USA, 1999. USENIX Association.
  • [13] Miguel Castro and Barbara Liskov. Practical Byzantine fault tolerance and proactive recovery. ACM Trans. Comput. Syst., 20(4):398–461, November 2002.
  • [14] Allen Clement, Manos Kapritsos, Sangmin Lee, Yang Wang, Lorenzo Alvisi, Mike Dahlin, and Taylor Riche. Upright cluster services. In Proceedings of the 22nd ACM Symposium on Operating Systems Principles, SOSP ’09, pages 277–290, New York, NY, USA, 2009. ACM.
  • [15] Danny Dolev and Rüdiger Reischuk. Bounds on information exchange for Byzantine agreement. J. ACM, 32(1):191–204, January 1985.
  • [16] Danny Dolev and H. Raymond Strong. Polynomial algorithms for multiple processor agreement. In

    Proceedings of the Fourteenth Annual ACM Symposium on Theory of Computing

    , STOC ’82, pages 401–407, New York, NY, USA, 1982. ACM.
  • [17] Cynthia Dwork, Nancy Lynch, and Larry Stockmeyer. Consensus in the presence of partial synchrony. J. ACM, 35(2):288–323, April 1988.
  • [18] Ittay Eyal, Adem Efe Gencer, Emin Gün Sirer, and Robbert Van Renesse. Bitcoin-ng: A scalable blockchain protocol. In Proceedings of the 13th Usenix Conference on Networked Systems Design and Implementation, NSDI’16, pages 45–59, Berkeley, CA, USA, 2016. USENIX Association.
  • [19] Michael J. Fischer, Nancy A. Lynch, and Michael S. Paterson. Impossibility of distributed consensus with one faulty process. J. ACM, 32(2):374–382, April 1985.
  • [20] Juan Garay and Leonardos Kiayias. The bitcoin backbone protocol: Analysis and applications. In EUROCRYPT, pages 281–310, 2015.
  • [21] Yossi Gilad, Rotem Hemo, Silvio Micali, Georgios Vlachos, and Nickolai Zeldovich. Algorand: Scaling Byzantine agreements for cryptocurrencies. In Proceedings of the 26th Symposium on Operating Systems Principles, SOSP ’17, pages 51–68, New York, NY, USA, 2017. ACM.
  • [22] Guy Golan-Gueta, Ittai Abraham, Shelly Grossman, Dahlia Malkhi, Benny Pinkas, Michael K. Reiter, Dragos-Adrian Seredinschi, Orr Tamir, and Alin Tomescu. SBFT: a scalable decentralized trust infrastructure for blockchains. CoRR, abs/1804.01626, 2018.
  • [23] T. Hanke, M. Movahedi, and D. Williams. DFINITY technology overview series – consensus system. https://dfinity.org/pdf-viewer/library/dfinity-consensus.pdf, January 2018.
  • [24] G. J. Holzmann. An improved reachability analysis technique. Software, Practice and Experience, 18(2):137–161, 1988.
  • [25] G. J. Holzmann. An analysis of bitstate hashing. Formal Methods in System Design, 13(3):289–307, November 1998.
  • [26] G. J. Holzmann. The Spin Model Checker: Primer and Reference Manual. Addison-Wesley, 2004.
  • [27] G. J. Holzmann, R. Joshi, and A. Groce. Swarm verification techniques. IEEE Transactions on Software Engineering, 37(6), November-December 2011.
  • [28] J. Katz and C. Koo. On expected constant-round protocols for Byzantine agreement. Journal of Computer and System Sciences, 75(2):91–112, February 2009.
  • [29] Eleftherios Kokoris-Kogias, Philipp Jovanovic, Nicolas Gailly, Ismail Khoffi, Linus Gasser, and Bryan Ford. Enhancing bitcoin security and performance with strong consistency via collective signing. CoRR, abs/1602.06997, 2016.
  • [30] Ramakrishna Kotla, Lorenzo Alvisi, Mike Dahlin, Allen Clement, and Edmund Wong. Zyzzyva: Speculative Byzantine fault tolerance. ACM Trans. Comput. Syst., 27(4):7:1–7:39, January 2010.
  • [31] Leslie Lamport. Time, clocks, and the ordering of events in a distributed system. Commun. ACM, 21(7):558–565, July 1978.
  • [32] Leslie Lamport. The part-time parliament. ACM Trans. Comput. Syst., 16:133–169, May 1998.
  • [33] Leslie Lamport, Robert Shostak, and Marshall Pease. The Byzantine generals problem. ACM Trans. Program. Lang. Syst., 4(3):382–401, July 1982.
  • [34] Andrew Miller, Yu Xia, Kyle Croman, Elaine Shi, and Dawn Song. The honey badger of BFT protocols. In Proceedings of the 2016 ACM Conference on Computer and Communications Security, CCS ’16, pages 31–42, New York, NY, USA, 2016. ACM.
  • [35] Santoshi Nakamoto. Bitcoin: A peer-to-peer electronic cash system. https:// bitcoin.org/bitcoin.pdf, December 2008.
  • [36] Rafael Pass, Lior Seeman, and Abhi Shelat. Analysis of the blockchain protocol in asynchronous networks. In EUROCRYPT, pages 643–673, 04 2017.
  • [37] Rafael Pass and Elaine Shi. Hybrid consensus: Efficient consensus in the permissionless model. IACR Cryptology ePrint Archive, 2016:917, 2016.
  • [38] Rafael Pass and Elaine Shi. Thunderella: Blockchains with optimistic instant confirmation. In EUROCRYPT, pages 3–33, 2018.
  • [39] M. Pease, R. Shostak, and L. Lamport. Reaching agreement in the presence of faults. J. ACM, 27(2):228–234, April 1980.
  • [40] HariGovind V. Ramasamy and Christian Cachin. Parsimonious asynchronous Byzantine-fault-tolerant atomic broadcast. In Proceedings of the 9th International Conference on Principles of Distributed Systems, OPODIS’05, pages 88–102, Berlin, Heidelberg, 2006. Springer-Verlag.
  • [41] Michael K. Reiter. The Rampart toolkit for building high-integrity services. In Selected Papers from the International Workshop on Theory and Practice in Distributed Systems, pages 99–110, London, UK, UK, 1995. Springer-Verlag.
  • [42] F. B. Schneider. Implementing fault-tolerant services using the state machine approach: A tutorial. ACM Comput. Surv., 22(4):299–319, 1990.
  • [43] Victor Shoup. Practical threshold signatures. In Proceedings of the 19th International Conference on Theory and Application of Cryptographic Techniques, EUROCRYPT’00, pages 207–220, Berlin, Heidelberg, 2000. Springer-Verlag.
  • [44] Yee Jiun Song and Robbert van Renesse. Bosco: One-step byzantine asynchronous consensus. In Distributed Computing, 22nd International Symposium, DISC 2008, Arcachon, France, September 22-24, 2008. Proceedings, pages 438–450, 2008.