Optimal Good-case Latency for Byzantine Broadcast and State Machine Replication

03/29/2020 ∙ by Ittai Abraham, et al. ∙ Duke University VMware 0

This paper investigates Byzantine broadcast (BB) protocols with optimal good-case latency under synchrony and weaker variants of synchrony. One of them most important applications of BB is to implement Byzantine fault-tolerant (BFT) state machine replication (SMR), also known as blockchains recently. The traditional latency metric of BB focuses on the number of lock-step rounds needed in the worst case or expected case. We observe that the traditional latency metric fails to capture what's important in practice for two reasons. First, practical synchronous BFT SMR do not run in lock-step rounds. Second, practical SMR protocols make progress only when an honest leader is in charge. Thus, motivated by recent progress in synchronous BFT SMR, we study the good-case latency of BB, i.e., the precise latency to commit measured in time (as opposed rounds) when the sender is honest. We propose the first synchronous BB protocol with optimal good-case latency. This closes the gap between the upper/lower bounds on good-case latency left open in the previous work by Abraham et al. <cit.>. To make the synchronous model more practical, we extend our protocol to handle two weaker network models, named mobile link failures and mobile sluggish faults. By providing a new lower bound in the mobile link failure model, we show that our protocols in these weak models also achieve optimal good-case latency. Finally, to demonstrate the applicability to the target application, we turn all our BB protocols into BFT SMR protocols with minimum modifications and guarantee the same good-case latency and tolerance to weaker synchrony variants.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Byzantine broadcast (BB) is a fundamental problem in distributed computing. It lets a designated sender broadcast a value to all non-faulty replicas who should output the same message, even if up to Byzantine replicas may behave arbitrarily. One of the most important practical applications of BB is to implement Byzantine fault-tolerant (BFT) state machine replication (SMR) [schneider1990implementing], which provides clients with the illusion of a single non-faulty server, by ensuring that all honest replicas agree on the same sequence of client inputs.

BB fundamentally requires synchrony and is impossible in partial synchrony or asynchrony. It is well known that synchronous protocols have a higher fault tolerance (up to one-half Byzantine faults) [schneider1990implementing] compared to asynchronous or partially synchronous protocols (up to one-third) [castro1999practical] for BFT SMR.111Even though BB can tolerate any number of Byzantine faults [lamport1982byzantine, dolev1983authenticated], BFT SMR can only tolerate one-half [schneider1990implementing] even with synchrony and authentication. Since this paper studies BB with SMR as the application in mind, we focus on BB with minority faults. However, BB and synchronous SMR have been considered impractical for a long time for several reasons.

Firstly, to take advantage of the synchrony assumption, most theoretical works on Byzantine broadcast or Byzantine agreement (BA) assume lock-step execution [lamport1982byzantine, dolev1983authenticated, dolev1990early, katz2006expected], where replicas start and end each round at the same time. As a result, the latency of BB and BA protocols is typically measured by the round complexity. Such a lock-step assumption simplifies the protocol design but it is considered impractical because it not only is hard to enforce but also leads to poor performance. A synchronous protocol requires a known upper bound for the maximum message delay. To be safe under worst-case network conditions, has to be picked conservatively, i.e., much larger than the actual (unknown) message delay bound. If a protocol that runs in lock steps, it must allocate time for every step. Then, most of the time is wasted on waiting for the next round to start rather than doing any useful work. Only recently, synchronous BFT SMR protocols deviated from the lock-step approach [hanke2018dfinity, synchotstuff]. In these protocols, most of the steps are non-blocking, namely, replicas move to the next step as soon as enough messages are received from the previous step.

Secondly, classic BB protocols [dolev1983authenticated] tend to optimize their worst-case latency. Because the worst-case number of rounds required is for tolerating faults [fischer1982lower] and is typically assumed to be linear in , any BB protocol will inevitably have a poor worst-case latency as increases. However, in contrast, BFT SMR protocols typically care about the good case, in which a stable honest leader stays in charge and drives consensus on many decisions. While a line of work studies expected-case [feldman1988optimal, katz2006expected, abraham2019synchronous] latency, this metric is still very different from the good-case latency in SMR, because it essentially analyzes the expected number of times the protocol changes its leader.

Another relatively minor disconnect lies in the “life cycle” of the protocol. Theoretical BB considers consensus on a single value and require all parties to halt or terminate after agreeing on this single value. In contrast, practical SMR protocols are intended to run forever; replicas commit or decide on an ever-growing sequence of values.

Motivated by the above considerations, we argue that a new model for BB is needed for it to better capture practice. First, lock-step execution should not be assumed and latency should be measured in time as opposed to rounds. Furthermore, for a more accurate characterization of latency, we adopt the separation between the conservative bound and the actual (unknown) bound as suggested in [pass2017hybrid]. Finally, instead of the traditional worst-case latency to terminate, we use the good-case latency to commit as the main metric,

Definition 1 (good-case latency).

The good-case latency of a Byzantine broadcast protocol is the time for all honest replicas to commit when the sender is honest.

The main goal of our paper is to develop BB protocols with optimal good-case latency and apply them to BFT SMR protocols. We remark that several differences still exist between BFT SMR and our practical BB formulation. For example, BB has a validity requirement while BFT SMR leaves the validity check for the application layer; BB agrees on a single value while BFT SMR keeps committing new values; BB can achieve a higher resilience () than BFT SMR () [schneider1990implementing]. However, these remaining differences are no longer fundamental since we show that our BB protocols can be easily modified into SMR protocols while maintaining the same good-case latency and support for weaker models. Thus, we believe the new BB formulation strikes a nice balance between clean theoretical abstractions and faithful practical implications. This allows us to focus on BB for the most of the paper to simplify the presentation. This paper presents the following main results.

Byzantine broadcast protocol with optimal good-case latency under synchrony. A key contribution of this paper is the first Byzantine broadcast protocol with optimal good-case latency under a synchronous model that separates the assumed network delay and the actual network delay for a more precise latency characterization. By proposing a protocol called -BB with good-case latency , we refute the conjecture made in [synchotstuff] on the lower bound, and close the gap between the upper bound and lower bound on good-case latency for Byzantine broadcast.

Byzantine broadcast protocols with optimal good-case latency under weaker models. To make the synchronous model more practical, we consider two types of additional faults suggested in the literature so that the synchrony bound need not hold for all messages and parties at all times. The first type of faults, named mobile link failures [schmid2009impossibility, biely2011synchronous], assume a certain fraction of send and receive links can be down at each replica. The second type of faults, named mobile sluggish faults [chan2018pili, guo2019synchronous, synchotstuff], model the slow connection of some honest replicas whose message sending and receiving does not respect the assumed delay bound . We develop Byzantine broadcast protocols with optimal good-case latency that can tolerate the above faults respectively. The optimal good-case latency remains with mobile sluggish faults and becomes under mobile link failures by a new lower bound we prove.

Byzantine fault-tolerant state machine replication protocols. Based on the Byzantine broadcast protocol -BB, we build a state machine replication protocol named -SMR that has good-case latency , optimal resilience , and does not require lock-step execution. The -SMR protocol can be extended to tolerate mobile sluggish faults and mobile link failures, using similar techniques from the BB protocols for weaker models.

2 Preliminary

We consider replicas in a reliable, authenticated all-to-all network, where up to replicas can be malicious and behave in a Byzantine fashion.

Assumed message delay bound and actual message delay bound . In this paper, we consider a synchronous system, where the message delay is bounded. Let denote the actual upper bound of the message delay in the network, so any message will be delivered within time after being sent. The parameter is unknown to the protocol designer and used only in the model definition, not in any protocol. The synchronous model assumes a known upper bound for the message delay , i.e., . In practice, the parameter is usually conservative for safety reasons, and thus . To put the new model into context, we remark on its natural connection to partial synchrony: in partial synchrony, there is also an unknown message delay bound but the protocol designer does not know any upper bound on .

Non-lock-step and clock skew.

Although we assume synchrony, unlike most synchronous protocols that require lock-step execution (replicas start and end each round at the same time), our protocols do not proceed in a lock-step fashion for BB under an honest sender, and BFT SMR under an honest leader. We assume that for BB, all replicas have clock skew at most

, i.e., they start the BB instance at most apart from each other. The bounded clock skew can be guaranteed via any clock synchronization protocol, such as [dolev1995dynamic, abraham2019synchronous]. The above clock synchronization protocols can also handle bounded clock drifts. For simplicity, we assume there’s no clock drift.

Signature and Public-key Infrastructure. We assume standard digital signatures and public-key infrastructure (PKI), and the Byzantine replicas cannot break these cryptographic assumptions. We use to denote a signed message by replica . It is well known that under synchrony with signatures and public-key infrastructure, Byzantine fault-tolerant state machine replication can be solved with [schneider1990implementing], and Byzantine broadcast can be solved with [lamport1982byzantine, dolev1983authenticated]. Since our Byzantine broadcast protocol in Section 3 is used as the building block for the BFT SMR protocol, it only tolerates minority faults. Therefore, without loss of generality, in this paper we assume that unless mentioned explicitly.

For the most part of the main paper, we present results for the problem of Byzantine broadcast, defined as follows.

Definition 2 (Byzantine Broadcast).

A Byzantine broadcast protocol provides the following three guarantees.

  • Agreement. If two honest replicas commit value and respectively, then .

  • Termination. All honest replicas eventually commit.

  • Validity. If the designated sender is honest, then all honest replicas commit on the same value it proposes.

To simplify the protocol, our Byzantine broadcast protocol in Section 3 uses a Byzantine agreement protocol as a primitive, defined as follows.

Definition 3 (Byzantine Agreement).

A Byzantine agreement protocol provides the following three guarantees.

  • Agreement. If two honest replicas commit value and respectively, then .

  • Termination. All honest replicas eventually commit.

  • Validity. If all honest replicas have the same input value, then all honest replicas commit on the value.

In Section 6 and the Appendix, we extend the results for Byzantine broadcast and develop efficient protocols for Byzantine fault-tolerant state machine replication, formally defined as follows.

Definition 4 (Byzantine Fault-tolerant State Machine Replication [schneider1990implementing]).

A Byzantine fault-tolerant state machine replication protocol commits client requests as a linearizable log to provide a consistent view of the log akin to a single non-faulty server, providing the following two guarantees.

  • Safety. Honest replicas do not commit different values at the same log position.

  • Livenss. Each client request is eventually committed by all honest replicas.

3 Byzantine Broadcast with Optimal good-case latency under Synchrony

We first present a simple Byzantine Broadcast protocol with optimal good-case latency under synchrony tolerating faults. Recall that the synchrony assumption states that a message sent at time by any replica arrives at another replica by time where is a known maximum network delay. We denote the actual network message delay by . Our protocol incurs a latency of when the designated sender is honest. This is optimal when due to the known lower bound on the good-case latency for synchronous BB [synchotstuff]. Despite its simplicity, our protocol closes the gap for synchronous Byzantine Broadcast protocols and refutes a conjectured lower bound in Abraham et al. [synchotstuff].

3.1 Intuition

Figure 1: Graphical representation of the intuition for the conjectured lower bound.

We start by presenting the rationale of the conjectured lower bound [synchotstuff] and how our protocol disproves it. The conjecture argues that two periods are needed for the following reasons. First, now that we have departed from lock-step execution, two honest replicas may be “out-of-sync” by time. For example, a Byzantine leader quickly delivers its first message to one replica but takes time to deliver its first message to another replica. Intuitively, this causes a lagging replica to reach a point in the protocol time after a leading replica. Second, any message sent by an honest replica can take up to a time to arrive at another replica by the synchrony assumption. In order to make sure that the sender did not equivocate (i.e., send different values to different replicas), a leading replica needs to wait for time before it can hear from a lagging replica what received from the sender (cf. Figure 1). Lastly, replicas seem to have no way to tell whether they are leading or lagging, so all replicas wait for time to ensure the absence of a sender equivocation.

Note that the above intuition of the conjecture has an implicit assumption: if the sender equivocates, we want all replicas, leading or lagging, to detect sender’s equivocation. This is where the intuition of the conjecture errs and what our protocol relies on to get below : we will make equivocation detection asymmetric. To elaborate, if we reduce the waiting period to , a lagging replica can still learn from a leading replica about what receives from the sender, but not vice versa. In other words, a lagging replica can still detect sender equivocation (if there’s any) but the most leading replica may not. Hence, the most leading replica may commit a value despite that the sender equivocated. But note that, all other honest replicas have detected equivocation and hence, have not committed. As long as we carefully craft the rest of the protocol to make all other honest replicas eventually commit the same value as the leading replica, the protocol is safe. Also note that this is the Byzantine-sender scenario, so the rest of the protocol does not have to meet the deadline.

3.2 -BB Protocol under Synchrony

Initially, every replica starts the protocol at the same time , and sets .

  1. Propose. The designated sender with input value sends to all other replicas.

  2. Ack. Upon receiving the first signed proposal from the sender or through some other replica, forward the proposal to all other replicas, and set vote-timer to and start counting down.

  3. Vote. When vote-timer reaches , if the replica does not receive an equivocating value proposed by , it broadcasts a vote in the form of .

  4. Commit. If the replica collects distinct signed votes at time , (i) if , it broadcasts these votes, sets and commits , (ii) if , it sets .

  5. Byzantine Agreement. At time , invoke an instance of Byzantine agreement with as the input. If not committed, commit on the output of the Byzantine agreement.

Figure 2: -BB Protocol under the Synchronous Model

We now describe the -BB presented in Figure 2. For simplicity, we first assume that all replicas start the protocol at the same time , and our results can be easily extended to the case where there is clock skew (see the discussion later in this section). Initially, all replicas set their locked value to be some default value . The sender first proposes a value in the form of a signed message (Step 1). When any replica receives its first valid proposal signed by the sender , it forwards the proposal as an acknowledgment (Step 2). Due to the non-lock-step execution of our protocol, it is possible that a replica receives the signed proposal from the proposal forwarded by some other replica first. Then, after the replica forwards the proposal, it locally starts a timer called vote-timer to wait for time period. If the timer expires and the replica does not receive any equivocating proposal (signed by ) for a value , it broadcasts a vote message in the form of (Step 3). Once the replica gathers distinct vote messages for value within time , it broadcasts these vote messages, sets locked value and commits the value . If the distinct vote messages for value are received later than , the replica only sets locked value without committing the value (Step 4). As will be proved in Theorem 1, if the sender is honest, then every honest replica will receive vote messages within time. Even if the leader is not honest, if some replica commits to a value by time , since it forwards the vote messages to all replicas, all replicas will lock on the committed value within time. Finally, at time , all the replicas initiate an instance of Byzantine agreement with their locked value as the input, and any replica that has not committed yet will commit on the output of the agreement (Step 5).

Before presenting the correctness proof, we make some observations about the protocol.

Why does it suffice to wait for time before sending vote? Consider any two honest replicas and who receive equivocating values at time and at time respectively. Without loss of generality, suppose . Observe that ’s forwarded proposal will arrive at by , which is within the waiting period of . Thus, will not vote and will not commit. This argument holds for any two pairs of honest replicas, and hence, in case of equivocation, only the value first received by any honest replica may potentially be committed by a set of honest replicas at Step 4. Now these committed replicas will forwards the distinct votes to all replicas no later than time . All other honest replicas will receive these by time and will lock on the committed value. Thus, in the Byzantine agreement protocol, all honest replicas will start with the same value, and the output of BA will be the committed value to ensure agreement among all honest replicas.

Remark on the Byzantine agreement Primitive. We can plug in any Byzantine agreement protocol that tolerates faults and satisfies standard Byzantine agreement definition (Definition 3). Note that the latency of the Byzantine agreement protocol does not affect the good-case latency of -BB protocol, since if the leader is honest, then all honest replicas can commit the value at Step (4) of the protocol before invoking the Byzantine agreement. For the same reason, it is fine to plug in a lock-step BA protocol since the poor latency of lock-step BA does not affect the good-case latency. Protocol -BB proceeds in a non-lock-step fashion under an honest leader. With foresight, we also note that when we apply this protocol to implement SMR, the BA will be replaced by (roughly speaking) subsequent iterations of Step (1)–(4) where each iteration uses a new leader.

When a clock skew exists. Now we show that how to extend the results to the case when there exists a clock skew so that replicas may start the protocol at times at most apart from each other. Since any replica may start the protocol at most earlier than the sender, it will receive vote messages within if the sender is honest, and therefore the forwarded vote messages reaches all honest replicas within at their local time. Therefore, the parameter in the conditions at Step 4 is replaced with , and the parameter at Step 5 is replaced with . Due to clock skew, replicas may invoke the Byzantine agreement at Step 5 at times at most away from each other. Therefore, the BA primitive also need to tolerate up to clock skew. For instance, any lock-step BA can do so by setting each round duration to be and use a clock synchronization algorithm [dolev1995dynamic, abraham2019synchronous] to enforce the lock-step synchrony. Since the value of is unknown, we can assume as the worst-case clock skew.

3.3 Correctness of the Protocol -Bb

Lemma 1.

If an honest replica commits value at Step 4, then (i) no honest replica commit any other value at Step 4, and (ii) all honest replicas set before invoking the Byzantine agreement at Step 5.

Proof.

Suppose an honest replica commits at Step 4. Then forwards the proposal at time , and sends the vote message at time .

First, we show that for any value , there does not exist distinct signed vote messages for . For the sake of contradiction, suppose there exists vote messages for . Then at least one vote for comes from an honest replica . Let denote the time when forwards the proposal. If , then the forwarded proposal from containing the value will reach no later than , which will prevent from sending the vote message, a contradiction. If , then similarly the forwarded proposal from containing the value will reach no later than , which will prevent from sending the vote message, a contradiction. The same argument applies for any honest replica that may have voted for . Hence, there does not exist distinct signed vote messages for . Since there is no vote messages for any value , no honest replica commits to any other value at Step 4.

For part (ii), since an honest replica commits at time , and forwards the votes for to all replicas, all honest replicas receives votes no later than time . Since there is no votes for other value , all honest replicas set before invoking the Byzantine agreement in Step 5. Due to the assumption that all replicas start the protocol at the same time, the Byzantine agreement is also invoked simultaneously by all honest replicas. ∎

Theorem 1.

Protocol -BB satisfies agreement, termination, validity, and has good-case latency .

Proof.

Agreement. If all honest replicas commit at Step 5, then due to the agreement condition of the Byzantine agreement, all honest replicas commit on the same value. If some honest replicas commit at Step 4, let denote the first honest replica that commits, and let denote the committed value. By Lemma 1, no other value is committed by any honest replica at Step 4, and all honest replicas set before invoking the Byzantine agreement at Step 5. Therefore, at Step 5, the inputs of all honest replicas are the same. Then by the validity condition of the Byzantine agreement protocol, the output of the agreement is also . Any honest replica that does not commit at Step 4 will commit on value at Step 5.

Termination. At time , all honest replicas invoke an instance of Byzantine agreement. Due to the termination condition of Byzantine agreement, all honest replicas eventually commit and terminate.

Validity. If the designated sender is honest, all honest replicas are able to commit the proposal at Step 4. After proposing the value, the proposal is received by all honest replicas after at most time. Then, all honest replicas wait for time before sending the vote messages. Finally, the vote messages reach all honest replicas after at most time, leading to all honest replicas to commit on the proposal within time at Step 4.

Good-case latency. When the sender is honest, the proposal takes time to arrive at all replicas. Then, all honest replica will wait for time before sending the vote message. Finally, the above vote messages reach all honest replicas after time, and all honest replicas commit on the sender’s proposal. Therefore, the any proposed value will be committed in time. ∎

3.4 Optimality of the Protocol -Bb

Abraham et al. [synchotstuff] propose a lower bound of on the good-case latency for Byzantine broadcast (Definition 2) as follows.

Theorem 2 (Lower bound on good-case latency [synchotstuff]).

There exists no Byzantine broadcast protocol that simultaneously satisfy the following:

  • agreement, termination and validity as in Definition 2;

  • tolerates Byzantine faults;

  • all honest replicas commit in less than time if the designated sender is honest 222The original Theorem statement in [synchotstuff] claims the termination time cannot be less than time, but the proof holds for the commit time as well..

The proof of this lower bound follows from a minor adaptation of the partial synchrony bound of Dwork et al. [dwork1988consensus]. Essentially, if a protocol commits faster than , then it did not take advantage of synchrony, and is thus subject to the same bound as partially synchronous protocols.

By Theorem 1, the good-case latency of our protocol -BB is , which is optimal according to the above lower bound when .

4 Byzantine Broadcast with Optimal good-case latency under Mobile Link Failures

In this section, we consider the case where communication channels between some pairs of honest replicas are controlled by the adversary so that the messages in these channels may be delayed or lost. Previous literature models the above network aberrations as mobile link failures [schmid2009impossibility, biely2011synchronous] that allow a constantly changing subset of faulty links at any honest replica. Our protocol -BB designed for synchrony fails to guarantee safety under even static link failures, since each honest replica expects to receive forwarded proposals from other honest replicas to check sender equivocation.

In this section, we make the following contributions. We first generalize the mobile link failure model for non-lock-step synchrony, where synchronous protocols do not execute in a lock-step fashion (Section 4.1). Then we propose a generic transformation to construct a BFT protocol that also tolerates mobile link failures from a BFT protocol under synchrony (Section 4.2). Applying the above transformation, we can obtain a Byzantine broadcast protocol -BB-MLF with a good-case latency under the mobile link failure model, from the protocol -BB under synchrony (Section 4.3). Next, we prove that is a lower bound for the good-case latency of any Byzantine broadcast protocol under the mobile link failure model (Section 4.4), showing the optimality of our protocol -BB-MLF when .

4.1 The Mobile Link Failure Model without Lock-step Execution

We model the communication channel between any two replicas as two directed links, and use to denote the directed link from replica to replica . Link is called a send link of replica , and a receive link for replica . Each replica has send links and receive links, including a link that connects itself.

Mobile link failures for lock-step synchrony [schmid2002formally, bielyoptimal, weiss2001consensus, schmid2009impossibility, biely2011synchronous]. Under lock-step synchrony, for each lock step (or round), a link is faulty if replica does not receive the message sent by replica in this round. The link failure under lock-step synchrony is mobile in the sense that the set of faulty links can change at each round. Since our protocol does not require lock-step execution, we generalize the mobile link failure model for non-lock-step synchrony as follows.

Definition of mobile link failures for non-lock-step synchrony. Each link is either faulty or non-faulty at any time point, and two directions of a link between any two replicas may fail independently, i.e., link may be non-faulty while link is faulty. If a link is non-faulty at time , then the message sent by replica at time will be received by replica at time . Otherwise, the message may be lost or delayed. Link failures are mobile in the sense that an adversary can control the set of faulty links, subject to the following constraints.

  1. [noitemsep,topsep=0pt]

  2. The adversary can corrupt up to send links and receive links at any replica, as long as .

  3. If a link turns from faulty to non-faulty at time , it continues to count towards the threshold of faulty links until .

We explain the necessity of the above two constraints. Constraint (1) is necessary for solving BB or BA under even static link failures [schmid2009impossibility], and hence also necessary under our definition of mobile link failures. Constraint (2) states that a recovered link remains on the adversary’s corruption budget for an additional time. Some constraint in this vein is necessary; otherwise, the adversary can drop all messages with a budget of a single link failure. Since we assume no lock-step execution, it is normal that no two honest nodes ever send messages at precisely the same time. Then, whenever there is a message sent via a link at time , the adversary can corrupt the link at time and immediately uncorrupt it. Also note that constraint (2) is consistent with the mobile link failure model under lock-step synchrony. In the lock-step model, the set of faulty links remain unchanged during a round, which has duration at least . Therefore, the mobile link failure model for lock-step synchrony [schmid2009impossibility, biely2011synchronous] is a special case of our generalized definition above.

4.2 A Generic Transformation to Tolerate Mobile Link Failures

In this section, we show that given a Byzantine fault-tolerant protocol under the synchronous model, there is a generic approach to transform it into a protocol that handles mobile link failures as defined in the previous section. In fact, it has been observed that under lock-step synchrony, a two-round simulation under the mobile link failure model can implement a one-round multicast over non-faulty links under the synchronous model (Corollary 1 of [schmid2009impossibility]). Here, we present a similar transformation for any non-lock-step synchronous protocols. The transformation applies to our Byzantine broadcast protocol -BB (Section 4.3), and our BFT SMR protocol -SMR later in Appendix A.

When any replica receives a message described in the protocol, it forwards the message only once to all other replicas except the sender. The Byzantine agreement primitive in Step 5 is applied with the generic transformation from Section 4.2 to tolerate mobile link failures.

Initially, every replica starts the protocol at the same time , and sets .

  1. Propose. The designated sender with input value sends to all other replicas.

  2. Ack. Upon receiving the first signed proposal from the sender or through some other replica, forward the proposal to all other replicas, and set vote-timer to and start counting down.

  3. Vote. When vote-timer reaches , if the replica does not receive an equivocating value proposed by , it broadcasts a vote in the form of .

  4. Commit. If the replica collects distinct signed votes at time , (i) if , it broadcasts these votes, sets and commits , (ii) if , it sets .

  5. Byzantine Agreement. At time , invoke an instance of Byzantine agreement with as the input. If not committed, commit on the output of the Byzantine agreement.

Figure 3: -BB-MLF Protocol under the Mobile Link Failure Model

A transformation to tolerate mobile link failures under non-lock-step synchrony. (i) when any replica receives a message described in the protocol, it forwards the message only once to all other replicas except the sender of the message, (ii) every timing parameter in the protocol is doubled, i.e., if the original protocol waits for time at some step, then the transformed protocol waits for at that step.

The following lemma shows that after the generic transformation above, any message sent by an honest replica will be received by all honest replicas after , implying a -delay simulation of the synchronous model. Therefore, a protocol for the mobile link failure model can be obtained from a protocol designed for the synchronous model.

Lemma 2.

If an honest replica sends a message at time , then all honest replicas receive the message by time .

Proof.

Suppose an honest replica sends a message at time and another honest replica does not receive the message by time . Let be the set of honest replicas that are connected to non-faulty send links of at time . Since the number of faulty send links at is at most , . According to the model of mobile link failures in Section 4.1, replicas in receive the message by time . Each replica in tries to relay the message to some time between and . In order for not to receive the message by , all these attempts have to fail. But in this case, by constraint (2), all these receive links of will be counted as faulty at time . This violates constraint (1), which states that less than receive links can be counted as faulty at any replica at any time. ∎

4.3 -BB-MLF Protocol with Mobile Link Failures

As mentioned in Section 4.2, after applying the generic transformation to the protocol -BB, we obtain a Byzantine broadcast protocol -BB-MLF under the mobile link failure model, as presented in Figure 3. We can also claim that the good-case latency of the protocol -BB-MLF is optimal by a new lower bound of in Section 4.4.

Correctness. By Lemma 2, after the transformation, a synchronous model with message delay is simulated. The rest of the claims follow Lemma 1 and Theorem 1 of the -BB protocol in Section 3, by changing the time for message delay from to and timing parameter of the protocol from to . The transformed protocol has good-case latency doubled, i.e., from to .

4.4 Optimality of the Protocol -Bb-Mlf

In this section, we show that the good-case latency of our protocol is optimal, by showing a lower bound of on the good-case latency for any Byzantine broadcast protocol that tolerates mobile link failures. In fact, we can prove that even with static link failures, any Byzantine broadcast protocol cannot commit in less than time.

Theorem 3.

There exists no Byzantine broadcast protocol that simultaneously satisfy the following:

  • agreement, termination and validity as in Definition 2;

  • tolerates Byzantine faults;

  • tolerates at most static send link failures and static receive link failures at each replica where ;

  • all honest replicas commit in less than time if the designated sender is honest.

Proof.

Suppose for the sake of contradiction that such a protocol exists. Let there be replicas with Byzantine replicas. Divide the replicas into three groups , where and . All replicas are connected to each other, except that replicas in are disconnected to replicas in due to the link failure. The construction satisfies the link failure requirement since . Consider the following three scenarios.

In scenario 1, the sender is honest, and sends to all replicas. All replicas in are Byzantine. There is one Byzantine replica in , and the other replicas in are honest. The Byzantine replica in remain silent. The Byzantine replica follows the protocol except the following. (1) The replica pretends that no message is received from the leader. (2) The replica pretends that no message is received from the remaining honest replicas in , and sends no message to the honest replicas in . (3) For any messages received from the replicas in , replica pretends that the messages are received after a network delay (suppose Byzantine replicas know the actual network delay ). Replica also intentionally delays any messages it sends, to pretend that the network delay is . According to the validity condition, the honest replicas in and will commit .

Scenario 2 is the mirror case of scenario 1, specified as follows. The sender is honest, and sends to all replicas. All replicas in are Byzantine and one replica in is Byzantine. The Byzantine replicas in remain silent. The Byzantine replica follows the protocol except the following. (1) The replica pretends that no message is received from the leader. (2) The replica pretends that no message is received from the remaining honest replicas in , and sends no message to the honest replicas in . (3) For any messages received from the replicas in , replica pretends that the messages are received after a network delay (suppose Byzantine replicas know the actual network delay ). Replica also intentionally delays any messages it sends, to pretend that the network delay is . According to the validity condition, the honest replicas in and will commit .

In scenario 3, replicas in are Byzantine, and the remaining one replica in is honest. The sender is Byzantine, and sends to , to , both to Byzantine replicas in , and no value . The Byzantine replicas in behave to replicas in exactly the same as the honest replicas in in scenario 1, behave to replicas in exactly the same as the honest replicas in in scenario , and send no message to the honest replica in . Suppose that the network delay between honest replicas is in scenario 3, then we can claim that replicas in cannot distinguish scenario 1 and 3.

  • First, to replicas in , the Byzantine replica in behaves in scenario 1 exactly the same as the honest replica in behaves in scenario 3. In scenario 3, the network delay between honest replicas in and is , only messages sent by at time will be received by replicas in before they commit. Since any message from reaches the honest replica in at time in scenario 3, those messages from do not influence the messages that sends to before the replicas in commit. In scenario 1, the replicas in are silent, which also do not influence the messages that sends to before the replicas in commit. Also, the Byzantine replica in in scenario 1 pretends that no message is received from the sender or from the honest replicas in , and it pretends that the communication with replicas in has the maximum network delay . Therefore, to replicas in , the Byzantine replica in in scenario 1 behaves exactly the same as the honest replica in in scenario 3.

  • Also by construction, to replicas in , the Byzantine replicas in in scenario 3 behave exactly the same as the honest replicas in in scenario 1.

Therefore, replicas in cannot distinguish scenario 1 and 3, and they will commit in scenario 3. Similarly, replicas in cannot distinguish scenario 2 and 3. Therefore the replicas in will commit in scenario 3. However, this violates agreement since honest replicas commit different values. Therefore, such protocol does not exist. ∎

5 Byzantine Broadcast with Optimal good-case latency under Mobile Sluggish Faults

In this section, we consider the mobile sluggish fault model [chan2018pili, guo2019synchronous, synchotstuff] that models the temporary violation of the message delay bound at some honest replicas at a given time. Under this model, we show a Byzantine broadcast protocol with optimal good-case latency named -BB-MSF, based on protocol -BB under synchrony. The synchronous model assumes that each message sent by honest replicas can reach any honest replica within time. Such requirement is crucial for the correctness of our protocol -BB in Section 3, where each honest replica expects to receive forwarded proposals from other honest replicas to check sender equivocation. In practice, such unforeseen aberrations in the network may happen to any honest replica during the execution of the protocol, especially for BFT SMR protocols that are designed to keep commit values. The techniques to handle mobile sluggish faults for -BB also apply to our BFT SMR protocols (Section 6 and Appendix A, B).

5.1 The Mobile Sluggish Fault Model

In the mobile sluggish fault model [chan2018pili, guo2019synchronous, synchotstuff], an honest replica is either sluggish or prompt at a given time. Moreover, the set of sluggish replicas can change over time. The messages sent or received by sluggish replicas may be delayed in the network, while the prompt replicas follow the message delay bound. Formally, if an honest replica is prompt at time , then any message sent by at time will be received by any honest replica that is prompt at time . Intuitively, in the mobile sluggish model, any message sent by a sluggish replica would satisfy the actual message delay bound when the replica becomes prompt.

Following the literature [synchotstuff], we use to denote the total number of faults, to denote the number of honest but sluggish replicas, and to denote the number of Byzantine replicas. The set of sluggish replicas is mobile in the sense that it can change at any instance of time. We assume the number of honest and prompt replicas is always at any time, which is necessary to solve Byzantine agreement or broadcast under the mobile sluggish fault model [guo2019synchronous, synchotstuff]. Therefore, the total number of replica is , and the number of honest and prompt replicas at any time is at least . Without loss of generality, we assume that .

5.2 -BB-MSF Protocol with Mobile Sluggish Faults

In Figure 4, we proposed a Byzantine broadcast protocol named -BB-MSF that can tolerate mobile sluggish faults with optimal good-case latency under an honest sender.

The leader-based approach. The -BB-MSF protocol follows a leader-based approach (or the Phase-King paradigm). While this may appear to be a major difference from -BB, we remark that it is in fact not one. The protocol could be rewritten to run step (1)-(4) once and then invoke a BA primitive that tolerates mobile sluggish faults. The sluggish-tolerant BA should have the following validity condition: if at least honest replicas input , then the agreement output must be . However, since no sluggish-tolerant BA currently exists, we have to implement our own, and the leader-based protocol starting from the second view is essentially our implementation of sluggish-tolerant BA. The leader of the first view is the designated sender. Leader of subsequent views are selected in a round-robin fashion. Each view has a fixed duration of . For simplicity, we assume that all replicas enter the same view at the same time. The technique to handle clock skew in -BB applies here as well.

Certified value and certificate ranking. Because the -BB-MSF protocol adopts the leader-based approach, we also adopt the notion of certified values standard in the leader-based approach. A value is certified by distinct signed vote1 messages from the same view , which forms a certificate . Certificates are ranked by the view number, i.e, ranks higher than if . A highest certified value is the value that has a certificate of the highest rank. The protocol requires the leader of each view to propose the highest certified value it learned at the end of the previous view, except for the first view where the designated sender is the leader.

The following protocol is for replica in view with leader . Initially, , is the designated sender of the Byzantine broadcast. Every replica starts the protocol at the same time , and no value is certified.

  1. Propose. The leader sends to all other replicas. If view , then is the input value of and . Otherwise, is the highest certified value from the status messages and is the corresponding certificate, or if no value is certified.

  2. Ack. Upon receiving the first signed proposal , if is the highest certified value known to the replica, forward the proposal to all other replicas and broadcast an ack in the form of .

  3. Timer. Upon receiving distinct signed ack messages , set vote-timer to and start counting down.

  4. Vote1. When vote-timer reaches , if the replica does not receive an equivocating value proposed by , it sends a vote1 to all other replicas in the form of .

  5. Vote2. Upon receiving distinct signed vote1 messages , form a certificate for the value . If the replica does not receive an equivocating value proposed by , broadcasts and a vote2 in the form of .

  6. Commit. If the replica collects distinct signed vote2 messages at time , it broadcasts these vote2 messages and commits .

  7. New-view. At time , pick the highest certified value with certificate , or if no value is certified. (Ignore any value certified in view that is received after this point.) Move to the next view with the new leader , and reset local clock . Send the leader a status message . The leader waits for time to collect the status messages.

Figure 4: -BB-MSF under the Mobile Sluggish Fault Model

Protocol description. Now we explain the key difference between the -BB-MSF protocol and the -BB protocol: the extra steps of ack and vote to tolerate mobile sluggish faults. Once the leader proposes (Step 1), replicas forward and ack the leader’s proposal if the value is the highest certified value known to the replica (Step 2). To start the vote-timer at step , the replica needs to receive distinct signed ack messages (Step 3). When the timer expires, the replica first broadcasts a vote1 message if no equivocating values are received during the time window (Step 4). Upon receiving a vote1 certificate of vote1 messages, the replica broadcasts another vote2 message and the vote1 certificate, if no equivocating values are received (Step 5). A value is committed, once vote2 messages for the value is collected within time (Step 6). The parameter is sufficient for all honest replicas to commit on an honest leader’s proposal (see the termination proof of Theorem 5). At time , all replicas move to the next view and set their clock to (Step 7). To help the new leader, any replica sends a status message containing the highest certified value known to the replica and the corresponding certificate. The new leader waits for time to receive the status messages, before broadcasting the proposal.

Why require ack/vote1 messages? Here we give some intuitive explanations on why the protocol requires ack/vote1 messages to proceed. Under the synchronous model, when an honest replica sends an ack or vote message at time , it is guaranteed that all other honest replica can receive the message at time . However, with mobile sluggish faults, the honest replica may be sluggish and the message cannot be received by other replicas. Thus, any protocol that relies on such a single message will fail, such as the protocol -BB that relies on proposal forwarding to detect equivocation. An natural idea is to rely on messages, then at least one of the message is from an honest and prompt replica. Therefore, we replace the each Ack and Vote step in -BB with two steps in -BB-MSF (Step 2-5), to ensure that an honest replica proceeds only after ack/vote1 messages are received.

5.3 Correctness of the Protocol -Bb-Msf

Lemma 3.

Two different values will not be certified in the same view.

Proof.

Suppose for the sake of contradiction that two values are certified in view . Then there is a set of replicas that sends vote1 for , and a set of replicas that sends vote1 for . Let be the earliest point in time that some honest replica in sends vote1 for . The definition of is valid since at least one replica in is honest. Also let be the earliest point in time that some honest replica in sends vote1 for .

Without loss of generality, assume . According to the protocol, at time , the honest replica receives ack messages for and thus at least replicas have sent ack for at . Since there are at least honest and prompt replicas at any time, at least one of the above replicas that sends ack for is honest and prompt at time . Similarly, at least one replicas in are honest and prompt at time , and this replica should receive the ack message for which prevents it from sending vote1 for since equivocating values are received. This contradicts the fact that is certified by vote1 messages from replicas in , hence two different values will not be certified in the same view. ∎

Lemma 4.

If an honest replica commits a value in a view , then no other value will be certified in any view .

Proof.

Suppose that an honest replica commits at time in view . For view , by Lemma 3, no other value is certified in view . Now we prove that for any view , no other value can be certified.

We first prove that a set of at least honest replicas receive the vote1 certificate before entering the new view . Since the replica commits , receives valid vote2 messages from a set of replicas within by step . Since the number of honest and prompt replicas is at any time instance, there exists at least one honest and prompt replica at time . Since replica is honest and prompt at time and sends vote2 no later than time , by the definition of the honest and prompt, replica ’s vote2 message and vote1 certificate will reach the set of honest and prompt replicas at time , before these replicas enter the new view.

These honest replicas have as the highest certified value at the beginning of view . Therefore, if the new leader proposes any value in view , these honest replicas will not ack or vote, and will not get enough vote1 messages to be certified. Thus, these honest replicas will still have as the highest certified value at the beginning of view . By a simple introduction, in any future views , only value will be certified. ∎

Recall that for Byzantine broadcast (Definition 2), termination requires all honest replicas to eventually commit a value. Then, with at least one honest replica being sluggish and never commit, by definition the termination cannot be guaranteed. Thus, the termination is guaranteed once all honest replicas stay prompt for a sufficient long time. For the same reason, the good-case latency is also defined as the time for all honest replicas to commit if the sender is honest and all honest replicas stay prompt. Similarly, we show that validity condition can hold if all honest replicas stay prompt. Actually, the validity condition can hold with only the sender being honest and prompt at time , after changing the duration of each view from to where is the number of sluggish replicas. However, such design deviates our purpose for BFT SMR protocols because a practical BFT protocol will keep each view short and simply replace a sluggish sender/leader; hence, we will not go into the details of this approach.

Theorem 4.

Protocol -BB-MSF satisfies agreement, termination once all honest replicas stay prompt for a sufficient long time, validity if all honest replicas stay prompt, and has a good-case latency of if all honest replicas stay prompt.

Proof.

Agreement. Without loss of generality, suppose an honest replica that commits a value first in some view . By Lemma 4, no other value can be certified at any honest replica in any view . Therefore, no honest replica will send vote2 messages for any other value, and there is not enough vote2 for any value other than to be committed. Therefore only value can be committed.

Termination. We show that once all honest replicas stay prompt for a sufficient long time, all honest replicas eventually commit. Due to the round-robin leader election, eventually an honest leader is in charge.

Once the leader is honest, all honest replicas commit the leader’s proposal in the current view. The waiting period at Step 7 is sufficient for the new leader to collect all status messages from the honest replicas, and thus an honest leader is able to propose the highest certified value among all honest replicas, for which all honest replicas will ack and vote. Now we show that any honest replica is able commit within time, since (1) the leader waits for to collect status messages, (2) the proposal takes at most to reach all honest replicas, (3) the ack messages take at most to reach all honest replicas, (4) any honest replica waits for before sending vote1, (5) the vote1 messages take at most to reach all honest replicas, and (6) the vote2 messages take at most to reach all honest replicas, The time period above in total is . Since an honest leader also does not equivocate, all honest replicas will commit the value within .

Validity. If the designated sender is honest, it sends its input value to all replicas. If all honest replicas are prompt, then by the above proof of termination, all honest parties will commit the value .

Good-case Latency. Recall that the sender is the first leader. When the sender is honest and all honest replicas are prompt, its proposal will take time to reach all replicas. Then, the ack messages take to reach all honest replicas. After receiving the ack messages, all honest replica will wait for time before sending the vote1 message. The vote1 messages take to reach all honest replicas, and trigger them to send vote2 messages. Finally, the above vote2 messages reach all honest replicas after time, leading to commit at all honest replicas. Therefore, the any proposed value will be committed in time. ∎

5.4 Optimality of the Protocol -Bb-Msf

Since the mobile sluggish fault model is strictly weaker than the synchronous model, the lower bound on the good-case latency under synchrony in Section 3.4 also applies to any Byzantine broadcast protocol under the mobile sluggish fault model. Therefore, by Theorem 4, the good-case latency of our protocol -BB-MSF is optimal, assuming .

6 Byzantine Fault-tolerant State Machine Replication under Synchrony and Weaker Models

Based on the techniques from Section 3, 4 and 5 for Byzantine broadcast, we can develop similar results for BFT SMR protocols.

  • Under synchrony, we propose a leader-based BFT SMR protocol named -SMR that has commit latency under an honest leader, optimal resilience and does not require lock-step execution. The protocol proceeds in views, consists a steady state and a view-change. In the steady state, the leader of the current view is responsible for making progress, and the protocol adopts the techniques from protocol -BB in Section 3.2 to wait for time before voting. If the leader behaves maliciously or does not make progress, the replicas will blame the leader and start the view-change protocol to replace the leader.

  • Under the mobile link failure model introduced in Section 4.1, we can obtain a BFT SMR protocol with commit latency under an honest leader, directly by applying the generic transformation from Section 4.2 to protocol -SMR. Under the mobile sluggish fault model introduced in Section 5.1, we can increase the robustness of protocol -SMR to tolerate mobile sluggish faults with the techniques from Section 5.2, at a cost of increasing the commit latency to .

Due to space constraints, we present the results for BFT SMR under synchrony in Appendix A and the extension to the weaker model in Appendix B.

7 Related Work

Byzantine fault-tolerant protocols. Byzantine fault-tolerant protocols, first proposed by Lamport [lamport1982byzantine], have received significant amount of attention for several decades. For Byzantine broadcast, Dolve-Strong protocol [dolev1983authenticated] is a deterministic -round protocol with communication complexity. A sequence of effort has been made on randomized protocol for reducing the round complexity and message complexity [ben1983another, rabin1983randomized, katz2006expected], and most efficient solutions for both Byzantine agreement and broadcast are proposed by Abraham et al. [abraham2019synchronous] with expected constant round and expected quadratic communication complexity. The state-of-the-art BFT SMR protocol Sync HotStuff [synchotstuff] improves the latency to under both the synchronous model and the mobile sluggish fault model. As a comparison, our protocol improves the latency to , which is optimal for both cases.

Weaker network models. The mobile link failure model. For models that has no restriction on the number of failed links, Santoro and Widmayer [santoro1989time] show that the consensus is unsolvable with even a single process suffering from such link failure. Thus, the authors in [schmid2002formally] introduce a mobile link failure model with constraints on the number of send and receive link failures. There has been a sequence of efforts on adapting existing consensus algorithms for the mobile link failure model [bielyoptimal, weiss2001consensus], and results on the lower bounds on the required number of processes and rounds [schmid2009impossibility]. The model and results above assume protocols of lock-step execution, and our results in Section 4 do not pose such assumption. The mobile sluggish fault model. Recently, Guo et al. consider a new model that allows the bound on the message delay to be violated for a set of honest replicas under synchrony, to better capture the reality when some honest replicas are partitioned or offline due to network misbehavior. This type of faults is later called sluggish, and considered in both PiLi [chan2018pili] and Sync HotStuff [synchotstuff] to introduce more fault-tolerance to the BFT SMR protocols. In this paper, we introduce techniques to handle both mobile link failures and mobile sluggish faults for our BB and BFT SMR protocols, in Section 5, 4 and 6.

8 Conclusion and Future Work

We propose investigating the non-lock-step models and the good-case latency metric for Byzantine broadcast, as they better capture what matters in practical BFT SMR protocols. We propose the first Byzantine broadcast protocol with optimal good-case latency. We extend the protocol to two weaker network models with mobile sluggish faults and mobile link failures, achieving optimal good-case latency in the respective models. Lastly, we apply the Byzantine broadcast results to Byzantine fault-tolerant state machine replication, with similar latency guarantees and tolerance of mobile sluggish faults and mobile link failures.

The transformation for mobile link failures, while optimal in terms of latency, brings a blowup to communication complexity because each message between a two replicas is now relayed by a group of other replicas. While it may be difficult to come up with a generic transformation that preserves the communication complexity, it may be possible to directly design protocols in the mobile link failure model to avoid this communication blowup.

Currently, the mobile link failure model describes slow and lossy links while the mobile sluggish model describe slow nodes (replicas), and there is no clear way to unify them. An interesting future direction is to come up with an even weaker model that captures both slow/lossy links and slow/lossy nodes, and to design protocols in the unified model.

References

Appendix A State Machine Replication under Synchrony

In this section, inspired by the techniques from protocol -BB, we construct a Byzantine fault-tolerant state machine replication protocol -SMR that has commit latency , optimal resilience and does not require lock-step execution in the steady state. The protocol -SMR improves the latency of the state-of-the-art BFT SMR protocol Sync HotStuff [synchotstuff] by a factor of , from to .

Notations. Here we first summarize several terminologies that will be used in our protocol.

  • Block format, block extension and equivocation. In state machine replication, clients’ requests are usually batched into blocks, and the protocol outputs a chain of blocks where is the block at height . Each block has the following format where is a batch of new client requests and is the hash digest of the previous block at height . We say that a block extends another block , if is an ancestor of according to the hash chaining where . We define two blocks and equivocate one another if they are not equal and do not extend on another. The block chaining simplifies the protocol in the sense that once a block is committed, its ancestors can also be committed.

  • Certificate, certificate ranking. A quorum certificate is a set of signatures on a block by a quorum of replicas, which consists of replicas out of replicas for synchronous BFT SMR protocols. We use to denote a certificate for in view , consisting of distinct signed vote messages for block . Certified blocks are ranked first by the views that they are created and then by their heights, that is, blocks with higher views have higher ranks, and blocks with higher height have higher ranks if the view numbers are equal. We use to denote a blame certificate in view , consisting of distinct blame messages in view .

Since the clients’ requests are batched into blocks, the BFT SMR protocol achieves safety if honest replicas always commit the same block for each height , and liveness if all honest replicas keep committing new blocks. When it is clear in the protocol, a replica broadcasting a message is the same as sending the message to all other replicas. Note such broadcast is different from Byzantine broadcast unless specified explicitly.

a.1 -SMR under Synchrony

In this section, we present protocol -SMR, in Figure 5. -SMR protocol takes the stable leader approach that proceeds in views, which consists a steady state, and a view-change to replace a Byzantine leader. In the steady state, the leader of the current view is responsible for making progress, where view is a integer that increments after each view-change. The leader of a view can be simply the replica where is the view number and is the number of replicas. If the leader behaves maliciously or does not make progress, the replicas will blame the leader and start the view-change protocol to replace the leader.

Steady State Protocol for Replica

Let be the current view number and replica be the leader of the current view. The leader proposes a block every time, where is a parameter.

  1. Propose. The leader sends to all other replicas, where is a height- block, containing a batch of new client requests and a hash digest of block . For the first proposal in a new view after a view-change, is the highest certified block received from the status messages. Otherwise, is the last block proposed by .

  2. Ack. Upon receiving the first valid proposal for a height- block, once it extends the highest certified block, forward the proposal to all other replicas, set to and start counting down.

  3. Vote. When reaches , if no equivocating blocks signed by are received, send a vote to all other replicas in the form of .

  4. Commit. If the replica observes distinct vote messages for block , and does not receive equivocating blocks signed by or a blame certificate , it broadcasts the vote certificate containing votes, and commits with all its ancestors.

View-change Protocol for Replica

Let and be the leader of view and respectively.

  1. Blame. If less than proposals are forwarded in time in view , or less than valid blocks are committed in time in view , broadcast . If equivocating blocks signed by are received, broadcast and the two equivocating blocks.

  2. New-view. Upon gathering distinct messages, form a blame certificate , broadcast , and quit view (abort all vote-timer(s) and stop forwarding/voting in view ). Wait for time and enter view . Upon entering view , send the new leader a message where is certificate for the highest certified block . (Ignore any block certified in view that is received after this point.) If the replica is the new leader , wait for another time upon entering view .

Figure 5: -SMR under the Synchronous Model

In the steady state of the protocol, the leader of the current view can propose blocks chained by hashes for every time where is a predefined parameter. The steady state of -SMR is similar to Step 1-4 of protocol -BB, and uses similar techniques such as proposal forwarding and the waiting time before voting. For the first proposal after a view-change, the proposed block must extend the highest certified block known to the leader (from the status messages received). Other proposed block extends the block from the previous height (Step 1). Once any replica receives the first valid proposal for height- from the leader , it forwards the proposal as an acknowledgement of the proposal if extends the highest certified block known to the replica (Step 2). The reason that a replica only acks for the highest known certified block is to ensure that any certified block always extends the committed block. Due to the non-lock-step nature of our protocol, it is possible that a replica receives the proposal from the re-proposal of some other replica first. In this case, the replica also accept the proposal if it is valid. Then, after the replica forwards the proposal, it locally starts a timer called vote-timer to wait for time (Step 2). When the timer expires and the replica does not receive any equivocating block signed by , it broadcast a vote message in the form of (Step 3). Once the replica gathers valid vote messages to form a certificate for the block without receiving a blame certificate (containing blame messages) or equivocating blocks, it forwards the vote certificate and commits the block (Step 4). The forwarding of vote certificate notifies other replicas to update their highest certified block before entering the next view.

When the leader is Byzantine, it can deviate from the protocol by either stalling or equivocating. To ensure liveness, the view-change protocol will be triggered when a quorum of replicas discover the malicious behaviors of the current leader. If the current leader does not keep proposing valid blocks quick enough, any replica that does not ack or commit blocks in time will blame the leader by sending a blame message. The time window of / is sufficient for the proposals from an honest to be acked/committed (see the proof of Theorem 6). Note that the above time windows only occur during the view-change, and does not affect the latency of the steady state protocol when a honest leader is in charge. If the leader equivocates, when any replica receives the equivocating blocks, it will also blame the leader by broadcasting a blame message and the pair of equivocating blocks. Once the replica collects blame messages, it forms and broadcasts a blame certificate, and quits the current view . Quitting the view means the replica aborts the timer and stops forwarding or voting in view . After quitting view , the replica waits for a period before entering the next view . The waiting period is for receiving the vote certificate from other replicas. Then, when entering the new view, the replica sends the certificate of the highest known certified block to the new leader, and returns to the steady state. If a replica is the new leader, it needs to wait for another time period in order to receive the status messages sent by all honest replicas during view-change.

a.2 Correctness of Protocol -Smr

We say a block is committed directly, if the replica commits at Step 4 by receiving but no equivocating blocks or . A block is committed indirectly, if is committed because it is the ancestor of a directly committed block.

Lemma 5.

If an honest replica directly commits a block in view , then a certified block that ranks no lower than must equal or extend .

Proof.

Recall that a certified block with the certificate ranks no lower than if either (i) and , or (ii) . Suppose that an honest replica directly commits in view . Suppose that forwards the proposal at time , sends the vote message at time , and commits at time where .

First we prove that any block with certified in view must equal or extend . Suppose for the sake of contradiction that some equivocating block is certified in view . Then at least one vote for comes from an honest replica . Let denote the time when forwards the proposal. If , then the forwarded proposal from containing the block will reach no later than , which will prevent from sending the vote message, causing contradiction. If , then similarly the forwarded proposal from containing the block will reach no later than , which will prevent from sending the vote message, also causing contradiction. Therefore, no other equivocating block is certified in view .

Now we show that any block certified in view must equal or extend .

We first show that all honest replicas receive before entering the new view . Since the honest replica directly commits block and broadcasts the certificate at time , all honest replicas receive the certificate no later than . Suppose for the sake of contradiction that some honest replica enters the next view before time . According to the protocol, the honest replica must have received blame messages before time due to the waiting window during view-change. The replica also broadcasts the blame certificate before time according to the protocol. The blame certificate will reach replica before time and prevent from committing, which is a contradiction. Therefore, all honest replicas enter the new view no earlier than , and receive the certificate before entering the new view .

Together with the claim that any block certified in view must equal or extend , the highest certified block at any honest replicas equals or extends when entering view . In view , according to the protocol, all honest replicas will only ack and vote for blocks that extend the highest certified block. Thus, only blocks that extend can be certified in view , and the highest certified block at any honest replicas still equals or extends . By simple induction, in any future view , the highest certified block at any honest replicas must equal or extend , therefore only blocks that equal or extending can be certified. ∎

Theorem 5 (Safety).

Honest replicas always commit the same block for each height .

Proof.

Suppose two blocks and are committed at height at any two honest replicas. Suppose is committed due to being directly committed in view , and is committed due to being directly committed in view . Without loss of generality, suppose , and for , further assume that . Since is directly committed and is certified and ranks no lower than , by Lemma 5, must equal or extend . Thus, . ∎

Theorem 6 (Liveness).

All honest replicas keep committing new blocks.

Proof.

If the leader is honest, a view-change will not occur and all honest replicas keep committing new blocks. By waiting for time before entering the new view, an honest leader is able to receive the status messages from all honest replicas, because any honest replica may be at most later to receive the blame certificate to enter the new view, and the status message takes at most to reach the leader. Therefore, the honest replica is able to propose a block extending the highest certified block among all honest replicas, and all honest replicas will ack and vote for the proposal. Now we show that any honest replica is able to forward proposals in time and commit blocks within time. After entering a new view, a time period of is sufficient to forward the first proposal, and is sufficient for the first block to get committed, since (1) the leader may be at most later to enter the new view, (2) the leader waits for to collect the status message after entering the new view, (3) the block takes at most to reach all honest replicas, which triggers the proposal forwarding, (4) any honest replica waits for before sending vote, (5) the vote messages take at most to reach all honest replicas. Then after the first block, there should be one block proposal from the leader in every time that gets forwarded or committed in a pipeline fashion. Thus any honest replica should be able to forward proposals within time and commit blocks within time. Any honest leader has sufficient time and does not equivocate. Thus, any honest leader will not be blamed by any other honest replica, and a view-change will not occur.

On the other hand, any Byzantine leader will be replaced by a view-change if it sends equivocating blocks, or does not propose the blocks quickly enough. More specifically, if the leader sends equivocating blocks, then all honest replicas will learn the equivocation and thus send blame messages to trigger a view-change. If the leader does not propose the blocks quickly enough so that all honest replicas send blame messages, then a view-change is triggered. If at least one honest replica is keep committing in time, the vote certificate broadcasted by this replica when it commits can lead all honest replicas to keep committing new blocks, unless some honest replica gathers a blame certificate which will lead all honest replicas to perform the view-change.

Theorem 7 (Good-case Latency).

If the leader is honest, every proposed block will be committed in time after being proposed.

Proof.

When the leader is honest, it proposal will take time to reach all replicas. Then, all honest replica will wait for time before sending the vote message. Finally, the above vote messages reach all honest replicas after time, leading to commit at all honest replicas. Therefore, the any proposed block will be committed in time if the leader is honest. ∎

Appendix B State Machine Replication under Weaker Models

In this section, we extend protocol -SMR to tolerate mobile link failures introduced in Section 4.1, and mobile sluggish faults introduced in Section 5.1.

b.1 -SMR-MLF with Mobile Link Failures

As mentioned in Section 6, we can directly obtain a BFT SMR protocol -SMR-MLF with commit latency that can tolerate mobile link failures by applying the generic transformation in Section 4.2. For brevity, we omit the details of the protocol.

b.2 -SMR-MSF with Mobile Sluggish Faults

In this section, we present a Byzantine fault-tolerant state machine replication protocol -SMR-MSF under the mobile sluggish model, as in Figure 6. To tolerate mobile sluggish faults, we apply the techniques from Section 5.2. More specifically, each Ack/Vote/Blame step in -SMR protocol is replaced by two steps in the new protocol, namely Ack, Timer, Vote1, Vote2, Blame1 and Blame2. The second step (Timer/Vote2/Blame2) can proceed after the replica receives ack/vote1/blame1 messages.

Since now each Ack/Vote/Blame step becomes two steps, we specify which vote or blame message the certificate consists of. Let denote the certificate for height- block in view , consisting of valid vote1 messages for block . Let denote the blame certificate in view , consisting of valid blame1 messages.

Steady State Protocol for Replica

Let be the current view number and replica be the leader of the current view. The leader proposes a block every time, where is a parameter.

  1. Propose. The leader sends to all other replicas, where is a height- block, containing a batch of new client requests and a hash digest of block . For the first proposal in a new view after a view-change, is the highest certified block known to . Otherwise, is the last block proposed by .

  2. Ack. Upon receiving the first valid proposal for a height- block, once it extends the highest certified block, forward the proposal to all other replicas, broadcast an ack in the form of .

  3. Timer. Upon receiving distinct ack messages , set to and start counting down.

  4. Vote1. When reaches , if no equivocating blocks signed by are received, send a vote1 to all other replicas in the form of .

  5. Vote2. Upon receiving distinct vote1 messages , form a vote1 certificate . If no equivocating blocks signed by or a blame certificate is received, broadcasts vote1 certificate and a vote2 in the form of .

  6. Commit. If the replica observes distinct vote2 messages for block , and does not receive equivocating blocks signed by or blame certificate , broadcasts these vote2 messages and commits with all its ancestors.

View-change Protocol for Replica

Let and be the leader of view and respectively.

  1. Blame1. If less than ack messages are triggered in time in view , or less than valid blocks are committed in time in view , broadcast . If equivocating blocks signed by are received, broadcast and the two equivocating blocks.

  2. Blame2. Upon receiving valid blame1 messages , broadcast a blame1 certificate of blame1 messages, and a blame2 in the form of .

  3. New-view. Upon gathering distinct messages, broadcast these messages and quit view (abort all vote-timer(s) and stop acking/voting in view ). Wait for time and enter view . Upon entering view , send the new leader a message where is certificate for the highest certified block . (Ignore any block certified in view that is received after this point.) If the replica is the new leader , wait for another time upon entering view .

Figure 6: -SMR-MSF under the Mobile Sluggish Fault Model

Correctness of Protocol -SMR-MSF

We first prove a key claim for protocol -SMR-MSF, similar to Lemma 5 for the protocol -SMR under synchrony.

Lemma 6.

If an honest replica directly commits a block in view , then a certified block that ranks no lower than must equal or extend .

Proof.

Recall that a certified block with the certificate ranks no lower than if either (i) and , or (ii) . Suppose that an honest replica directly commits in view .

First we prove that any block with certified in view must equal or extend . Suppose for the sake of contradiction that some equivocating block is certified in view , then there is a set of replicas that sends for . Since the replica commits , it receives valid vote2 messages, which implies that at least one honest replica has sent vote2 after receiving vote1 messages. Let the set of replicas that sends vote1 messages above be , and let be the earliest time point that some honest replica in sends vote1. The definition of is valid since at least one replica in is honest. Also let be the earliest time point that some honest replica in sends .

  • Suppose that . According to the protocol, at time point , the honest replica receives messages for and thus replicas have sent at . Since there are honest and prompt replicas at any time point, at least one of the above replicas that sends is honest and prompt at time . Similarly, at least one of the replicas in is honest and prompt at time , and this replica should receive the message and not sending vote1 since equivocating blocks are received. This is a contradiction, hence no equivocating block can be certified in this case.

  • Suppose that . Similar to the previous case, at time point , at least one honest and prompt replica sends ack. Again, at least one honest and prompt replica in should receive the ack message and not send . This contradicts the assumption that is certified, and hence no equivocating block can be certified in this case.

Now we show that any block certified in view must equal or extend .

We first show that honest replicas receive before entering the new view . Since the honest replica directly commits , receives distinct vote2 messages from a set of replicas. Let denote the earliest time point such that there exist a replica satisfying the following: (i) is honest and prompt at time and (ii) sends vote2 before or at time . The above definition of time is valid, since the latest time point when an honest replica in sends vote2 satisfies the above two conditions, which means the candidate set for is non-empty. Since replica is honest and prompt at time and sends vote2 before or at time , by the definition of the honest and prompt, replica ’s vote2 message and vote1 certificate will reach the set of all honest and prompt replica at time . We will prove that is the set of honest replicas that receive before entering the new view . Suppose for the sake of contradiction, any honest replica enters the new view at time where before receiving . According to the protocol, the honest replica must have received blame2 messages at time point due to the waiting period before entering the next view. Among the replicas that send blame2 above, at least one replica is honest and prompt at time . According to the protocol, receives the blame1 certificate before time and forwards the blame1 certificate to all other replicas. Now consider the honest and prompt replica at time , since