With the emergence of the Blockchain use case, designing a scalable geo-replicated byzantine tolerant state machine replication (SMR) system that supports hundreds of nodes is becoming one of the challenging problems in distributed computing. The core of every byzantine SMR system is the byzantine agreement problem, which was first introduced four decades ago (Pease et al., 1980) and has been intensively studied since then (Cachin et al., 2000, 2001; García-Pérez and Gotsman, 2018; Crain et al., 2018; Singh et al., 2009; Cachin et al., 2016; King and Saia, 2016, 2011; Malkhi et al., 2019). The bottleneck in geo-replicated scalable SMR systems is the network communication, and thus a substantial effort in recent years was invested in search for a linear byzantine agreement (Gueta et al., 2019; Yin et al., 2019; Buchman et al., 2018; Naor et al., 2019) protocol.
To circumvent the FLP (Fischer et al., 1985) result that states that deterministic asynchronous agreement protocols are impossible, most SMR solutions (Castro and Liskov, 1999; Gueta et al., 2019; Yin et al., 2019; Bessani et al., 2014; Kotla et al., 2007) assume eventually synchronous communication models and provide safety during asynchronous periods but are able to guarantee progress only after the global stabilization time (GST).
Therefore, it is quite natural that state-of-the-art byzantine agreement protocols (Gueta et al., 2019; Yin et al., 2019; Buchman et al., 2018; Naor et al., 2019) focus on reducing communication cost after GST, while putting up with the potentially unbounded cost beforehand. For example, Zyzzyva (Kotla et al., 2007) and later SBFT (Gueta et al., 2019) use threshold signatures and collectors to reduce the quadratic cost induced by the all-to-all communication in each view of the PBFT (Castro and Liskov, 1999) protocol. HotStuff (Yin et al., 2019) leverages ideas in Tendermint (Buchman et al., 2018) to propose a linear view-change mechanism, and Naor et al. (Naor et al., 2019) propose an algorithm to synchronize parties between views with a linear cost after GST in failure-free runs. However, none of these algorithms bounds the number of views executed before GST, and thus none of them bounds its communication cost in the worst case scenario.
We claim in this paper that agreement algorithms designed for the eventually synchronous model are not the best fit for scalable SMR systems. That is, ensuring linear communication cost after GST worth nothing to the overall SMR performance if it requires a possibility unbounded communication beforehand.
Vulnerability of the eventually synchronous model
In our first contribution we prove the following lower bound that captures the inherent vulnerability of algorithms designed in the eventually synchronous communication model:
Theorem 1 ().
There is no eventually synchronous deterministic byzantine agreement protocol with a bounded communication cost before GST even in failure-free runs.
Eventually synchronous communication models capture arbitrarily long asynchronous periods that are followed by “long enough” synchronous periods. So algorithms in this model are designed to provide progress with a small communication cost during synchronous periods, but might be forced to pay unbounded cost to tolerate the asynchronous periods. We propose a new approach that forgo the eventually synchronous assumptions. Specifically, we optimistically consider all runs as synchronous and then treat all non-synchronous runs as asynchronous.
Our goal in this paper is to develop an optimistic protocol that adapts to network conditions and failures to guarantee termination with an optimal communication cost under all scenarios. To this end, our optimistic protocol first runs an efficient synchronous algorithm, which guarantees termination in synchronous runs with an optimal communication cost. Then, in case the run is not synchronous, our protocol uses an asynchronous algorithm for fallback. The idea behind this approach is to move to the asynchronous fallback only after paying an equivalent communication cost to that of the fallback algorithm, in which case there is no point to wait for synchrony in a hope for a low cost.
Lower and upper boubnds for adaptive synchronous byzantine agreement.
Dolev and Reischuk (Dolev and Reischuk, 1985) prove that there is no deterministic protocol that solves synchronous byzantine agreement with less than communication cost, where is the failure threshold. We generalize their result for runs with failures and prove the following lower bound:
Theorem 2 ().
Any synchronous deterministic byzantine agreement protocol has a communication cost of .
Our first positive result is an asymptotically tight upper bound – we present a synchronous protocol with communication cost of that stops after rounds in the worst case and guarantees early decision after rounds. The question of early decision/stopping is a classical problem in distributed computing (Dolev et al., 1990; Keidar and Rajsbaum, 2003; Abraham and Dolev, 2015; Garay and Moses, 1993). Due to (Dolev et al., 1990) and (Keidar and Rajsbaum, 2003) no protocol can decide or stop in less than rounds and the protocols in (Garay and Moses, 1993; Abraham and Dolev, 2015) showed how to achieve optimal early decision and stopping with a polynomial communication cost in the full information model.
To the best of our knowledge, our protocol is the first to use cryptography to achieve asymptotically optimal communication cost and early decision. Our protocol does not provide early stopping, but we argue that optimal communication complexity and early decision are favorable for scalable SMR systems. That is, parties move to the next slot once they decide in the current one and the overall system throughput is determined by the communication cost of the agreement protocol in every slot and not by the time the last message is sent.
Optimal optimistic byzantine agreement.
Our main contribution is an optimistic byzantine agreement protocol with an asymptotically optimal communication cost under all network conditions and failure scenarios. Our protocol guarantees termination in all synchronous runs with a communication cost of , and in all other runs it provides termination with probability with a communication cost of in expectation. The protocol combines a synchronous part with an asynchronous fallback, where for the synchronous part we use our optimal synchronous algorithm and for the fallback we use a variant of the optimal asynchronous byzantine agreement of VABA (Abraham et al., 2019).
The first challenge in combining both parts is to make sure that parties do not move to the more expensive asynchronous part (fallback) unless necessary for termination, while not paying more than in synchronous runs. The difficulty here is twofold: first, parties cannot always distinguish between synchronous and asynchronous runs. Second, they cannot distinguish between honest parties that complain that they did not decide (due to asynchrony) in the first part and byzantine parties that complain because they wish to increase the communication cost by moving to the asynchronous fallback. To deal with this challenge, we implement a Help&tryHalting procedure in which parties try to avoid the fallback part by helping complaining parties learn the decision value and move to the fallback only when the number of complaints indicates that the run is not synchronous. This way, each byzantine party in a synchronous run cannot increase the communication cost by more than , where is the total number of parties.
The second challenge in the optimistic protocol is to combine both parts in a way that guarantees safely. That is, since some parties may decide in the synchronous part and others in the asynchronous fallback, we need to make sure they decide on the same value. To this end, we use the leader-based view (LBV) abstraction, defined in (Spiegelman and Rinberg, 2019), as a building block for both parts. The LBV abstraction captures a single view in a view-by-view agreement protocol such that one of its important properties is that a sequential composition of them preserves safety. For optimal communication cost, we adopt techniques from (Yin et al., 2019) and (Abraham et al., 2019) to implement the LBV abstraction with an asymptotically linear cost .
Our synchronous protocol operates up to sequentially composed pre-defined linear LBV instances, each with a different leader. In order to achieve optimal (adaptive to the number of actual failures) cost, leaders invoke their LBVs only if they have not yet decided. In contrast to eventually synchronous protocols, the synchronous part is designed to provide termination only in synchronous runs. Therefore, parties do not need to be synchronized before views, but rather move from one LBV to the next in pre-defined times. As for the asynchronous fallback, we use the linear LBV building block to reconstruct the VABA (Abraham et al., 2019) protocol in a way that forms a sequential composition of LBVs, which in turn allows a straight-forward sequential composition with the synchronous part.
In summary, the paper makes the following contributions:
We prove that no deterministic byzantine agreement protocol can bound its communication cost before GST in eventually synchronous runs even in the failure-free case.
We propose a new approach to design agreement protocols for practical SMR systems that forgo eventual synchrony.
We prove that the communication cost of any deterministic byzantine agreement protocol is at least even in synchronous runs.
We present the first deterministic synchronous byzantine agreement protocol with an asymptotically optimal communication complexity of .
We present the first optimistic byzantine agreement protocol that guarantee termination in all synchronous runs with optimal communication cost and provide termination with probability with a cost of , in expectation, in all non-synchronous runs.
As previous practical solutions (Castro and Liskov, 1999; Gueta et al., 2019; Yin et al., 2019; Bessani et al., 2014; Kotla et al., 2007; Miller et al., 2016), we consider a byzantine message passing peer to peer model with a set of parties and a computationally bounded adversary that corrupts up to of them. Parties corrupted by the adversary are called byzantine and may arbitrary deviate from the protocol. Other parties are honest.
Communication and runs.
We assume a global clock, visible to all parties, that perfectly measures time and a known to all parameter . The communication links are reliable but controlled by the adversary, i.e., all messages sent among honest parties are eventually delivered, but the adversary controls the delivery time. A run of a protocol is eventually synchronous if there is a global stabilization time (GST) after which all message sent among honest parties are delivered within a time. A run is synchronous if GST occurs at time 0, and asynchronous if GST never occurs.
The Agreement problem.
The Agreement problem exposes an API to propose a value and to output a decision from some domain . We are interested in protocols that never compromise safety and thus require the following property to be satisfied in all runs:
Agreement: All honest parties that decide, decide on the same value.
Due to the FLP result (Fischer et al., 1985), no deterministic agreement protocol can provide safety and liveness properties in all asynchronous runs. Therefore, in this paper we consider protocols that guarantee (deterministic) termination in all synchronous and eventually synchronous runs, and provides a probabilistic termination in asynchronous ones:
Termination: All honest parties eventually decide.
Probabilistic-Termination: All honest parties decide with probability 1.
As for validity, for the lower bounds, in order to strengthen them as much as possible, we consider the binary case, which is the weakest possible definition. For the upper bounds, we are interested in practical multi-valued protocols and thus consider the external validity property (Cachin et al., 2000, 2001), which is implicitly or explicitly considered in most practical byzantine agreement solutions we are aware of (Abraham et al., 2019; Castro and Liskov, 1999; Yin et al., 2019; Gueta et al., 2019; Kotla et al., 2007). Intuitively, with external validity, parties are allowed to decide on a value proposed by any party (honest and byzantine) as long as it is valid by some external predict. To capture the above and rule out trivial solutions such as simply deciding on some pre-defined externally valid value, we give a formal definition below. In both validity properties, honest parties decide only on values from .
Binary validity: The domain of valid values , and if all honest parties propose the same value , than no honest party decides on a value other than .
External validity: The domain of valid values is unknown to honest parties. In the beginning of every run, each honest party gets a value with a proof that such that all other honest parties can verify.
We define an optimistic Agreement protocol to be a protocol that guarantees Agreement and External validity in all runs, Termination in all synchronous and eventually synchronous runs, and Probabilistic-Termination in asynchronous runs.
We assume a computationally bounded adversary and a trusted dealer that equips parties with cryptographic schemes. For simplicity of presentation and in order to avoid the analysis of security parameters and negligible error probabilities, we assume that the following cryptographic tools are perfect:
Authenticated link. If an honest party delivers a messages from an honest party , then previously sent to .
Threshold signatures scheme. We assume that each party has a private function , and we assume 3 public functions: share-validate, threshold-sign, and threshold-validate. Informally, given “enough” valid shares, the function threshold-sign returns a valid threshold signature. For our algorithm, we sometimes require “enough” to be and sometimes . A formal definition can be found in (Abraham et al., 2019).
We denote by the actual number of corrupted parties in a given run and we are interested in optimistic Agreement protocols that utilize and the network condition to reduce communication cost. Similarly to (Abraham et al., 2019), we say that a word can contain a constant number of signatures and values, and each message contains at least word. To be able to reason about communication cost we require all honest parties to eventually halt by stop sending messages. The communication cost of a run is the number of words sent in messages among honest parties in . For every , let and be the sets of all synchronous and eventually synchronous runs with corrupted parties, respectively. The synchronous and eventually synchronous communication cost with failures is the maximal communication cost of runs in and , respectively. We say that the synchronous communication cost of a protocol A is if for every , its synchronous communication cost with failures is . The asynchronous communication cost of a protocol A is the expected communication cost of an asynchronous run of .
3. Lower Bounds
In this section we present two lower bounds on the communication complexity of deterministic byzantine agreement protocols in synchronous and eventually synchronous runs.
3.1. Eventually synchronous runs.
The following theorem exemplifies the inherent vulnerability of the eventually synchronous approach.
Theorem 1 0 ().
There is no deterministic byzantine agreement algorithm with bounded eventually synchronous communication cost even in failure-free runs.
Assume by a way of contradiction that there are such algorithms. Let be such algorithm with the best eventually synchronous communication cost with failures, and denote its communication cost by . Clearly, . Let be the set of all failure-free eventually synchronous runs of that have communication cost of . For every run let be the last message that is delivered in , let be the time at which it is delivered, and let be the party that sends . Now for every consider a run that is identical to up to time except is byzantine that acts exactly as in but does not send . Denote by the set of all such runs and consider two cases:
There is a run in which a message by an honest party is sent after time . Now consider a failure-free run that is identical to run except the delivery of is delayed. The runs and are indistinguishable to all parties that are honest in and thus some honest party send a message after time in as well. Therefore, the communication cost of is at least . A contradiction to the communication cost of .
Otherwise, we can construct an algorithm with a better eventually synchronous communication cost with failures than in the following way: operates identically to in all runs not in and for every run operates as except does not send . A contradiction to the definition of .
3.2. Synchronous runs.
We next prove a lower bound that applies even to synchronous byzantine agreement algorithms and is adaptive to the number of actual failures . The proof is a generalization of the proof in (Dolev and Reischuk, 1985), which was originally proved for the byzantine broadcast problem and considered the worst case scenario ().
The proof of the following Claim is straight forward and for space limitation is deferred to Appendix A.
Claim 1 ().
The synchronous communication cost with 0 failures of any byzantine agreement algorithm is at least .
The following Lemma shows that if honest parties send less than messages, then byzantine parties can prevent honest parties from getting any of them.
Lemma 0 ().
Assume that there is a byzantine agreement algorithm , which synchronous communication cost with failures is less than for some . Then, for every set of parties and for every set of values proposed by honest parties, there is a synchronous run s.t. some honest party does not get any messages in .
Let be a run in which all parties in are byzantine that ignore all messages they receive and act like honest parties that get no messages. By the assumption, there is a party that receives less than messages in . Denote the set of parties outside that send messages to in by and consider the following run :
Parties in are byzantine that act like in .
Parties in are byzantine. They do not send messages to , but other than that act as honest parties.
All other parties, including , are honest.
First, note that the number of byzantine parties in is . Also, since acts in as an honest party that do not receive messages, and all byzantine parties in act towards honest parties in () in exactly the same way as they do in , then honest parties in cannot distinguish between and . Thus, since they do not send messages to in they do not send in as well. Therefore, does not get any message in .
The next Lemma uses the previous one to show that parties that do not get messages cannot safely decide.
Lemma 0 ().
For any , there is no optimistic byzantine agreement algorithm which synchronous communication cost with failures is less than .
Assume by a way of contradiction such protocol which synchronous communication cost with failures is less than for some and pick a set of of parties. By Lemma 1, there is a run of s.t. some honest party does not get any messages regardless of the values honest parties propose. Now let s.t. . By Lemma 1 again, there is a run of s.t. some party does not get any messages regardless of the values honest parties propose. Since , we can repeat the above times and get that for every possible input (values proposed by honest parties) there is a set of parties s.t. for every party there is a run of in which is honest and does not get any messages. In particular, there exist such set for the case in which all honest party propose and a set for the case in which all honest parties propose . Since , there is a party . Therefore, by the Termination and Binary validity properties, there is a run in which does not get any messages and decides and a run in which does not any messages and decides . However, since and are indistinguishable to we get a contradiction.
Theorem 2 0 ().
Any synchronous deterministic byzantine agreement protocol has a communication cost of .
4. Asymptotically optimal optimistic Byzantine Agreement
In this section we present our optimistic byzantine agreement protocol, which synchrnous communication cost is tight to the lower bounds proven in Theorem 2. In a nutshell, our protocol safely combines a linear adaptive to failures approach that relies on synchrony for termination with a quadratic asynchronous fallback. For the ease of exposition, we construct our protocol in steps. First, in Section 4.1, we present the local state each party maintains and describe the linear leader-based view (linear LBV) building block, which is used by both parts of the protocol. Then, in Section 4.2, we describe a synchronous protocol, which communication complexity is asymptotically optimal in the failure threshold and the number of actual failures . Next, in Section 4.3, we show how the optimal quadratic asynchronous byzantine agreement protocol of (Abraham et al., 2019) can be reconstructed by using the linear LBV building blocks. Finally, in section 4.4, we show how to safely combine both protocols (synchronous and asynchronous) to get an optimistic protocol that achieves an asymptotically optimal communication complexity under all network conditions and failure scenarios. A formal correctness proof of the protocol appears in Appendix B.
4.1. General structure
The protocol uses many instances of the linear LBV building block, each of which is parametrized with a sequence number and a leader. Each party in the protocol maintains a local state, which is used by all LBVs and is updated according to their returned values. Section 4.1.1 presents the local state and Section 4.1.2 describes the linear LBV implementation. Section 4.1.3 discusses the properties guaranteed by a sequential composition of several LBV instances.
4.1.1. Local state
The local state each party maintains is presented in Algorithm 1. For every possible sequence number , stores the party that is chosen (a priori or in retrospect) to be the leader associated with . The COMMIT variable is a tuple that consists of a value , a sequence number s.t. was committed in the linear LBV that is parametrized with and , and a threshold signature that is used as a proof of it. The VALUE variable contains a safe value to propose and the KEY variable is used as a prove that VALUE is indeed safe. KEY contains a sequence number and a threshold signature that proves that no value other than VALUE could be committed in the linear LBV that is parametrized with and . The LOCK variable stores a sequence number , which is used to determine what keys are up-to-date and what are obsolete – a key is up-to-date if it contains a sequence number that is greater than or equal to LOCK.
4.1.2. Linear leader-based view
A detailed pseudocode of the linear LBV building block is given in Algorithms 2 and 3, and an illustration appears in figure 1. The linear LBV building block supports an API to start the view and wedge the view. Upon a invocation, the invoking party starts processing messages associated with the linear LBV that is parametrized with sequence number and a leader . When the leader invokes it initiates steps of leader-to-all and all-to-leader communication. In each of the first 3 steps, the leader sends its VALUE together with a threshold signature that proves the safety of the value for the current step and then waits to collect valid replies. A party that gets a message from the leader validates that the received value and proof are valid for the current step, then produces its signature share on a message that contains the value and the current step’s name, and sends the share back to the leader. When the leader gets valid shares, it combines them into a threshold signature and continues to the next step. After successfully generating the threshold signature at the end of the third step, the leader has a commit certificate which he sends together with its VALUE to all parties.
In addition to validating and share signing messages, parties also store the values and proofs they receive. The keyProof and lockProof variables store a tuple consisting of the value and the threshold signature received from the leader in the second and third steps, respectively, and commitProof stores the received value and the commit certificate. Whenever a party receives a valid commit certificate from the leader it returns its keyProof, lockProof, and commitProof, which are then used by the high level agreement protocol to update the LOCK, KEY, VALUE, and COMMIT variables in the local state.
As for the validation of the leader’s messages, parties distinguish between the first step message from the rest. In second step, third step, and commit certificate messages, parties simply check that the attached proof is a valid threshold signature on a message that contains the leader’s value and the previous step name of the current linear LBV instance. The first step message, however, is what links sequentially composed LBV instances. To develop intuition, let us first present the properties guaranteed by a single linear LBV instance:
Commit causality: If a party gets a valid commit certificate, then at least honest parties previously got a valid lockProof.
Lock causality: If a party gets a valid lockProof, then at least honest parties previously got a valid keyProof.
Safety: All valid keyProof, lockProof, and commit certificates obtained in the LBV have the same value.
The validation in the first step makes sure that the leader’s value satisfies the safety properties of the high-level byzantine agreement protocol that sequentially composes and operates several linear LBVs. The leader’s message in the first step contains its VALUE and KEY, where KEY stores the last (non-empty) keyProof returned by a previous linear LBV together with the LBV’s sequence number. When a party gets the first step’s message it first validates, by checking the key’s sequence number , that the attached key was obtained in an LBV instance that does not precede the one the party is locked on (the sequence number that is stored in the party’s LOCK variable). Then, the party checks that the threshold signature in the key is a valid signature (1) on a message that contains the leader’s value; and (2) was generated at the end of the first step (a valid keyProof) of the LBV instance that is parametrized with and LEADERS[sk]. Note that if the party is not locked () than a key is not required.
In order to be able to abandon an LBV instance with a byzantine leader that do not drive progress, parties use the API, which returns the current values of keyProof, lockProof, and commitProof. Moreover, to ensure the LBVs’ causality guarantees are propagated to the parties’ KEY, LOCK, and COMMIT variables, which are updated with the returned values, parties stop participating by ignoring all messages once a is invoked.
Note that the number of messages sent among honest parties in an LBV instance is . In addition, since signatures are not accumulated – leaders use threshold signatures – each massage contains a constant number of words, and thus the total communication cost of an LBV instance is words.
4.1.3. Sequential composition of LBVs.
As mentioned above, our optimistic byzantine agreement protocol is built on top of the linear LBV building block. The synchronous and the asynchronous parts of the protocol use different approaches, but at the end they both sequentially compose LBVs - the synchronous part of the protocol determines the composition in advance, whereas the asynchronous part chooses what instances are part of the composition in retrospect.
In a nutshell, a sequential composition of LBVs operates as follows: parties start an LBV instance by invoking and at some later time (depends on the approach) invoke and update their local states with the returned values. Then, they exchange messages to propagate information (e.g., up-to-date keys or commit certificates), update their local states again and start the next LBV. We claim that a high-level agreement protocol that sequentially composes several linear LBV instances and maintains the local state in Algorithm 1 has the following properties:
Agreement: all commit certificates in all LBV instances have the same value.
Conditional progress: for every LBV instance, if the leader is honest, all honest parties invoke startView, and all messages among honest parties are delivered before some honest party invokes wedgeView, then all honest parties get a commit certificate.
Intuitively, by the commit causality property of the linear LVB, if some party returns a valid commit certificate (commitProof) with a value in some LBV instance with sequence number , then at least honest parties return a valid lockProof and thus lock on (). Therefore, since the leader of the next LBV needs the cooperation of parties in order to generate threshold signatures, its first step’s message must include a valid keyProof that was obtained in the LBV instance with sequence number . By the safety property of the linear LBV, this keyProof includes the value and thus is the only value the leader can propose. The agreement property follows by induction.
As for conditional progress, we have to make sure that honest leaders are able to drive progress. Thus, we must ensure that all honest leaders have the most up-to-date keys. By the lock causality property, if some party gets a valid lockProof in some LBV, then at least honest parties get a valid keyProof in this LBV and thus are able to unlock all honest parties in the next LBV. Therefore, leaders can get the up-to-date key by querying a quorum of parties.
From the above, a byzantine agreement protocol can satisfy Agreement by sequentially composing LBVs. The challenge, which we address in the rest of this section, is how to sequentially compose LBVs in a way that satisfies Termination with asymptotically optimal communication complexity under all network conditions and failure scenarios.
4.2. Adaptive to failures synchronous protocol
In this section we describe a synchronous byzantine agreement protocol with an asymptotically optimal adaptive communication cost. Namely, the communication complexity of the protocol is , which is tight to the lower bound proven in Theorem 2. A detailed pseudocode is given in Algorithms 4 and 5, and an illustration appears in figure 2.
The protocol sequentially composes pre-defined linear LBV instances, each with a different leader, and parties decide whenever they get a commit certificate in one of them. To exploit synchrony, parties in the protocol use the shared global clock to coordinate their actions – meaning that all the and invocation times are predefined in a way that allows honest leaders to provide conditional progress, e.g., the first LBV starts at time 0 and is wedged at time . In addition, to make sure honest leaders can drive progress, each leader (except the first) learns the up-to-date key, before invoking , by querying all parties and waiting for a quorum of parties to reply.
Composing LBV instances may lead in the worst case to communication complexity – for every LBV instance. Therefore, to achieve the optimal adaptive complexity, honest leaders in our protocol participate (learn the up-to-date key and invoke ) only in case they have not yet decided. (Note that the communication cost of an LBV instance in which the leader does not invoke is 0 because other parties only reply to the leader’s messages.) For example, if the leader of the second LBV instance is honest and has committed a value in the first instance (its at time ), then no message is sent among honest parties between time and time .
Termination and communication complexity.
A naive approach to guarantee termination and avoid an infinite number of LBV instances in a leader based byzantine agreement protocols is to perform a costly communication phase after each LBV instance. One common approach is to reliably broadcast commit certificates before halting, while a complementary one is to halt unless receiving a quorum of complaints from parties that did not decide. In both cases, the communication cost is even in runs with at most one failure.
The key idea of our protocol is to exploit synchrony in order to allow honest parties to learn the decision value and at the same time help others in a small number of messages. Instead of complaining (together) after every unsuccessful LBV instance, each party has its own pre-defined time to “complain”, in which it learns the up-to-date key and value and helps others decide via the LBV instance in which it acts as the leader.
By the conditional progress property and the synchrony assumption, all honest parties get a commit certificate in LBV instances with honest leaders. Therefore, the termination property is guaranteed since every honest party has its own pre-defined LBV instance, which it invokes in case it has not yet decided. As for the protocol’s total communication cost, recall that the LBV’s communication cost is in the worst case and in case the leader is honest that decides not to participate since it has already decided. In addition, since all honest parties get a commit certificate in the first LBV instance with an honest leader, we get that the message cost of all later LBV instances with honest leaders is . Therefore, the total communication cost of the protocol is words – at most LBVs with byzantine leaders and LBV with an honest one that cost words each.
4.3. Asynchronous fallback
In this section we follow the framework of (Spiegelman and Rinberg, 2019) and use the linear LBV building block to reconstruct a variant of the optimal asynchronous byzantine agreement protocol of VABA (Abraham et al., 2019). Note that achieving an optimal asynchronous protocol is not a contribution of this paper but reconstructing the VABA protocol with our linear LBV building block allows us to safely combine it with our adaptive synchronous protocol to achieve an optimal optimistic one. In addition, we also improve the protocol of VABA in the following ways: First, parties in VABA (Abraham et al., 2019) never halt, meaning that even though they decide in expectation in a constant number of waves (rounds), they operate an unbounded number of them. We fix it by adding an auxiliary primitive, we cal help&tryHalting, in between two consecutive waves. Second, the VABA protocol guarantees probabilistic termination in all runs, whereas our version also guarantees standard termination in eventually synchronous runs. The full detailed pseudocode of our protocol appears in Algorithms 5, 6, 9, and 7.
On a high level, the idea in VABA (Abraham et al., 2019) that was later generalized in (Spiegelman and Rinberg, 2019) is the following: instead of having a pre-defined leader in every “round” of the protocol as most eventually synchronous protocols and our synchronous protocol have, they let leaders operate simultaneously and then randomly choose one in retrospect. This mechanism is implemented inside a wave and the agreement protocol operates in a wave-by-wave manner s.t. parties exchange their local states between every two conductive waves. To ensure halting, in our version of the protocol, parties also invoke the help&tryHalting procedure after each wave. See the tryPessimistic procedure in Algorithm 6 for pseudocode (ignore gray lines at this point) and Figure 3 for an illustration.
To implement the wave mechanism (Algorithm 6) we use our linear LBV and two auxiliary primitives: Leader election and Barrier synchronization (Algorithm 7). At the beginning of every wave, parties invoke, via , different LBV instances, each with a different leader. Then, parties are blocked in the Barrier synchronization primitive until at least LBV instances complete. (An LBV completes when honest parties get a commit certificate.) Finally, parties use the Leader election primitive to elect a unique LBV instance, wedge it (via ), and ignore the rest. With a probability of parties choose a completed LBV, which guarantees that after the state exchange phase all honest parties get a commit certificate, decide, and halt in the help&tryHalting procedure. Otherwise, parties update their local state and continue to the next wave. An illustration appears in figure 4.
Since every wave has a probability of to choose a completed LBV instance, the protocol guarantees probabilistic termination – in expectation, all honest parties decide after waves. In order to also satisfy standard termination in eventually synchronous runs, we “try synchrony” after each not successful wave. See the gray lines in Algorithm 6. Between every two conjunctive waves parties deterministically try to commit a value in a pre-defined LBV instance. The preceding help&tryHalting procedure guarantees that after GST all honest parties invoke in the pre-defined LBV instance with at most from each other and thus setting a timeout to is enough for an honest leader to drive progress. We describe the help&tryHalting procedure in the next section. For space limitation we omit the description of the Barrier and Leader-election primitives (Algorithm 7), which can be found in (Spiegelman and Rinberg, 2019).
The communication cost of the Barrier and Leader-election primitives, as well as that of LBV instances, is , which brings us to a total of cost for every wave. Since every wave have a probability of to choose a completed LBV, the protocol operates waves in expectation. Therefore, since the communication cost of state exchange and help&tryHalting is as well, we get that the total cost, in expectation, is words.
4.4. Optimal optimistic protocol: combine the pieces
In this section we combine the pieces of our optimal optimistic byzantine agreement protocol. In a high level, parties first optimistically try the synchronous protocol (of section 4.2), then invoke help&tryHalting and continue to the asynchronous fallback (of section 4.3) in case a decision has not been reached. A pseudocode is given in Algorithm 8 and an illustration appears in Figure 5.
One of the biggest challenges in designing protocols with several paths is to make sure that safety is always preserved a crossed them, meaning that parties must never decide differently even if they decide in different parts of the protocol. In our protocol, however, this is inherently not a concern. Since both parts use the linear LBV as a building block, we get safety for free – if we look on an execution of our protocol in retrospect and ignore all LBVs that were not elected in the asynchronous part, then the LBV instances in the synchronous part together with the elected ones in the asynchronous part form a sequential composition, which satisfies the Agreement property.
On the other hand, satisfying termination without sacrificing optimal adaptive complexity is a non-trivial challenge. Parties start the protocol by optimistically trying the synchronous part, but unfortunately, at the end of the synchronous part they cannot distinguish between the case in which the communication was indeed synchronous and all honest parties decided and the case in which some honest parties did not decide due to asynchrony. Moreover, honest parties cannot distinguish between honest parties that did not decide and thus wish to continue to the asynchronous fallback part and byzantine parties that want to move to the fallback part in order to increase the communication cost.
To this end, we implement the help&tryHalting procedure, which stops honest parties from moving to the fallback part in synchronous runs, with a communication cost of words. The idea is to help parties learn the decision value and move to the fallback part only when the number of help request indicates that the run is asynchronous.
The pseudocode of help&tryHalting is given in Algorithm 9 and an illustration appears in Figure 6. Each honest party that have not yet decided sends a singed helpRequest to all other parties. When an honest party gets an helpRequest, the party replies with its COMMIT value, but if it gets helpRequest messages, the party combines them to a threshold signature and sends it in a complain message to all. When an honest party gets a complain message, it echos it to all parties and continues to the fallback part.