ACE: Abstract Consensus Encapsulation for Liveness Boosting of State Machine Replication

11/24/2019 ∙ by Alexander Spiegelman, et al. ∙ 0

With the emergence of cross-organization attack-prone byzantine fault-tolerant (BFT) systems, so-called Blockchains, providing asynchronous state machine replication (SMR) solutions is no longer a theoretical concern. This paper introduces ACE: a general framework for the software design of fault-tolerant SMR systems. We first propose a new leader-based-view (LBV) abstraction that encapsulates the core properties provided by each view in a partially synchronous consensus algorithm, designed according to the leader-based view-by-view paradigm (e.g., PBFT and Paxos). Then, we compose several LBV instances in a non-trivial way in order to boost asynchronous liveness of existing SMR solutions. ACE is model agnostic - it abstracts away any model assumptions that consensus protocols may have, e.g., the ratio and types of faulty parties. For example, when the LBV abstraction is instantiated with a partially synchronous consensus algorithm designed to tolerate crash failures, e.g., Paxos or Raft, ACE yields an asynchronous SMR for n = 2f+1 parties. However, if the LBV abstraction is instantiated with a byzantine protocol like PBFT or HotStuff, then ACE yields an asynchronous byzantine SMR for n = 3f+1 parties. To demonstrate the power of ACE, we implement it in C++, instantiate the LBV abstraction with a view implementation of HotStuff – a state of the art partially synchronous byzantine agreement protocol – and compare it with the base HotStuff implementation under different adversarial scenarios. Our evaluation shows that while ACE is outperformed by HotStuff in the optimistic, synchronous, failure-free case, ACE has absolute superiority during network asynchrony and attacks.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

In practice, building reliable systems via state machine replication (SMR) requires resilience against all network conditions, including malicious attacks. The best way to model such settings is by assuming asynchronous communication links. However, due to the FLP result (fischer1982impossibility), deterministic asynchronous SMR solutions are impossible.

Two principal approaches are used to circumvent this result. The first is by assuming partial synchrony (dwork1988consensus), in which protocols are designed to guarantee safety under worst case network conditions, but are able to satisfy progress only during “long enough” periods of network synchrony. The vast majority of the protocols in this model follow the leader-based view-by-view paradigm due to their speed during synchronous attack-free network periods, and their relative simplicity. In fact, most deployed systems, several of which have become the de facto standards for building reliable systems (e.g., Paxos (lamport2001paxos), PBFT (pbft) and others (zookeeper; ongaro2014search; etcd; hotstuff; bessani2014state)), adopt this approach. The drawback of the partial synchrony model is fact that it fails to capture mobile networks attacks (santoro1989time), leaving the leader-based view-by-view algorithms vulnerable. For example, an attacker can prevent progress by adaptively blocking the communication of one party (the leader of the current view) at a time.

The second approach to circumvents the FLP impossibility is by employing randomization (ben1983another; rabin1983randomized)

. The most commonly used strategy is to always satisfy safety properties, but relaxing liveness to guarantee eventual progress with probability approaching

under all network conditions. Potentially, protocols designed for the asynchronous communication model can operate at network speed, but unfortunately, they are rarely deployed in practice due to their complexity and overhead, and are mostly the focus of theoretical academic work.

Main contribution.

In this paper we combine the best of both approaches. We present ACE, a simple generic framework for asynchronous boosting, which converts consensus (also called agreement) algorithms designed according the leader-based view-by-view paradigm in the partial synchrony model into randomized fully asynchronous SMR solutions. ACE is model agnostic – it has no model assumptions, and thus can be applied to any leader-based protocol in the byzantine or crash failure model. As a result, with ACE, a system designer can benefit twofold. On the one hand, from the experience gained in decades of leader-based view-by-view algorithm design and engineered systems, and on the other hand, from a robust asynchronous solution.

View-by-view paradigm.

ACE is general and applicable to a family of consensus leader-based view-by-view protocols designed for the partially synchronous communication model (lamport2001paxos; pbft; hotstuff). Such protocols divide executions into a sequence of views, each with a designated leader. Every view is then further divided into two phases. First, the leader-based phase in which the designated leader tries to drive progress by getting all parties to commit its value. Then, when parties suspect that the leader is faulty, whether it is really faulty or due to asynchrony or network attacks, they start the view-change phase in which they exchange information to safely wedge the current view and move to the next one.

Technical contribution.

ACE’s first contribution is providing a formal characterization of the leader-based view-by-view protocols by defining a leader-based view (LBV) abstraction, which encapsulates the core properties of a single view and provides an API that allows de-coupling of the leader-based phase from the view-change phase. In the view-by-view paradigm, view-change phases are triggered with timers: parties start a timer at the beginning of the leader-based phase in each view and if the timer expires before the leader drives progress, parties move to the view-change phase. Indeed, if we instantiate the LBV abstraction with an implementation of a view of some view-by-view protocol and operate a sequence of these LBVs, each time invoking the leader-based phase, then timeout, and then invoke the view-change phase, then we end up with a variant of the view-by-view protocol that the LBV is instantiated with. See illustration in Figure 1.

Figure 1. Using a sequence of LBV instances to reconstruct a partially synchronous leader-based view-by-view protocol.

ACE’s second technical contribution is a novel wave mechanism to control the trigger of view-change phases. Rather than using timers, the wave mechanism uses the API provided by the LBV abstraction to generically rearrange views in view-by-view protocols. The key idea is to compose several LBV abstractions in a way that allows progress at network speed during periods of asynchrony. This mechanism exploits a key property shared by all view-by-view protocols: If the leader of a view is correct and timers never expire, then eventually a decision will be made in this view.

ACE’s single-shot agreement protocol proceeds in a wave-by-wave manner, each wave operates as follows: Instead of running one LBV instance (as view-by-view protocols do), a wave runs LBV instances (the leader-based phase) simultaneously, each with a distinct designated leader. Then, the wave performs a barrier synchronization in which parties wait until a quorum of the instances have reached a decision. Recall that by the key property, all instances with correct leaders eventually decide so eventually the barrier is reached.

After the barrier is reached, one LBV instance is selected unpredictably and uniformly at random. The chosen instance “wins”, and all other instances are ignored. Then, parties use the LBV’s API to invoke the view-change phase in the chosen instance (only). The view-change phase here has two purposes. First, it boosts termination. If the chosen instance has reached a decision, meaning that a significantly large quorum of parties decided, then all correct parties learn this decision during the view-change phase. Second, as in every view-by-view protocol, the view-change phase ensures safety by forcing the leaders of the next wave to propose safe values.

The next wave enacts new LBV instances, each with a different leader that proposes a value according to the state returned from the view-change phase of the chosen instance of the previous wave. Note that since parties wait for a large quorum of LBV instances to reach a decision in each wave before randomly choosing one, the chosen LBV has a constant probability of having a decision, hence, together with the termination boosting provided by the view-change phase, we get progress, in expectation, in a constant number of waves.

As to SMR, ACE implements a variant in which parties do not proceed to the next slot before they learn the decision value of the current one, but once they move to the next one they stop participating in the current slot and garbage collect all the associated resources. Deferring next slots until the current decision is known is essential for systems in which the validity of a value for a certain slot depends on all previous decision values (e.g., Blockchains). ACE’s SMR solution uses an instance of our single-shot protocol for every slot together with a forwarding mechanism to help slow parties catch-up.

Applications.

ACE can take any view-by-view consensus protocol designed for the partially synchronous model and transform it into an asynchronous SMR solution. In order to instantiate ACE with a specific algorithm, e.g., PBFT (pbft) or Paxos (lamport2001paxos), one must take the core of the algorithm logic (a single view) and wrap it with the LBV’s API that provides an method to start the leader-based phase and a method to proceed to the view-change phase. We define the properties each LBV implementation has to satisfy, and argue that all existing leader-based view-by-view algorithms implicitly satisfy these properties. Therefore, instantiating ACE does not require new logic implementation beyond the engineering effort of providing its API. Furthermore, ACE’s modularity provides a clean separation of concerns between the Safety (provided by the LBV properties) and asynchronous Liveness (provided by the framework). With ACE, designing a new fully asynchronous SMR system based on an existing partially synchronous consensus protocol or updating a deployed system with a novel agreement protocol, only requires an LBV implementation.

An important feature of ACE is fairness. Due to the inherent unpredictable randomness in the election mechanism, ACE provides an equal chance for all correct parties to drive decisions during synchrony, and optimally bounds the probability to decide a value proposed by a correct party during asynchrony. Another important feature of ACE is its model agnostic, namely it does not add any model assumptions on top of the assumptions made by the instantiated protocols. As a result, when instantiated with a BFT protocol such as PBFT (pbft) we get an asynchronous byzantine state machine replication, and when instantiated with a crash-failure solution like Paxos (lamport2001paxos) or Raft (ongaro2014search) we get the first asynchronous SMR system tolerating any minority of failures.

In order to make ACE model agnostic, we encapsulate the barrier synchronization and the random election mechanisms into abstractions as well, denoted barrier and leader-election respectively, and define their properties. So that when instantiating our framework, one need also provide implementation of these abstractions for the assumed model. In Section 6 we give an implementation example for the byzantine model.

Complexity and performance.

To demonstrate ACE, we choose to focus on the byzantine model as this is the model considered by Blockchain systems and we believe that, due to their high stakes, Blockchain systems will benefit the most from a generic asynchronous SMR solution that can tolerate network attacks. We implement ACE’s algorithms in C++ and instantiate the LBV abstraction with a variant of HotStuff (hotstuff) – a state of the art BFT solution, which is currently being implemented in several commercial Blockchain systems (libra). We compare the ACE instantiation to the base (raw) HotStuff implementation in different scenarios. Our evaluation shows that while base HotStuff outperforms ACE (instantiated with HotStuff) in the optimistic, synchronous, failure-free case, ACE has absolute superiority during asynchronous periods and network attacks. For example, we show that byzantine parties can hinder progress in base HotStuff by targeting leaders with a DDoS attack, whereas ACE manages to commit values at network speed.

From a theoretical perspective, ACE generalizes the idea in VABA (vaba), the first asymptotically optimal asynchronous byzantine agreement protocol, and adds only factor in communication complexity over the leader-based phase in each view of the protocol it is instantiated with. Therefore, since no protocol can solve byzantine agreement with less than quadratic communication (dolev1983authenticated), ACE can leverage protocols with linear communication in the leader-based phases, like HotStuff (hotstuff), to reproduce the asymptotically optimal quadratic asynchronous byzantine agreement solution of VABA (vaba).

Roadmap.

The rest of the paper is organized as follows: Section 2 describes the model and formalizes the agreement and SMR problems. Section 3 gives an overview of the view-by-view paradigm, capturing its core properties and vulnerabilities. Section 4 defines ACE’s abstractions, and its algorithms are given in Section 5. Section 6 instantiates ACE and evaluates its performance. Finally, Section 7 discusses related work and Section 8 concludes.

2. Model and Problem Definitions

Section 2.1 formally define the model and the Agreement and SMR problems are defined in Section 2.2.

2.1. System Model

Communication.

We consider a peer to peer system with parties, of which may fail. We say that a party is faulty if it fails at any time during an execution of a protocol. Otherwise, we say it is correct. In a peer to peer system every pair of parties is connected with a communication link. A message sent on a link between two correct parties is guaranteed to be delivered, whereas a message to or from a faulty party might be lost. A link between two correct parties is asynchronous if the delivery of a message may take arbitrary long time, whereas a link between two correct parties is synchronous if there is a bound for message deliveries. In asynchronous network periods all links among correct parties are asynchronous, whereas during synchronous network periods all such links are synchronous.

A standard communication model assumed by algorithms that follow the view-by-view paradigm is the partially synchronous model111Sometimes referred as eventual synchrony in the literature. (dwork1988consensus). In this model, there is an unknown point in every execution, called global stabilization time (GST), which divides the execution into two network periods: before GST the network is asynchronous and after GST the network is synchronous. The partially synchronous model was defined to capture spontaneous network disconnections in wide-area networks, in which case it is reasonable to assume that asynchronous periods are short and synchronous periods are long enough for the protocols to make progress.

However, the partially synchronous model fails to capture malicious attacks that intentionally try to sabotage progress, and thus are not suitable for many current use cases (e.g., Blockchains). For example, one possible attack is the weakly adaptive asynchronous in which an attacker adaptively blocks one party at a time from sending or receiving messages (e.g., via DDOS). This results in a mobile asynchrony that moves from party to party, violating the GST assumption made by the partially synchronous model, and thus prevents progress from all leader-based view-by-view algorithms.

ACE, in contrast, assumes the fully asynchronous communication model, and thus progress in network speed under all network conditions and attacks as long as messages among correct parties are eventually delivered.

Failures, cryptography, and cetera.

As mentioned in the Introduction and explained in more detail below, ACE abstracts away specific model assumptions and implementation details into three primitives: Leader based view (LBV), leader-election, and barrier. In Section 4, we define the properties of these primitives and require that any leader-based view-by-view protocol that is instantiated into our framework satisfies them. To satisfy these properties, each protocol may have different model assumptions: for example, the relation between and , the type of failures that may occur (e.g., crash and byzantine), and cryptographic assumptions. ACE inherits the specific assumptions made by each of the protocols it is instantiated with, and adds nothing to them. In other words, whatever assumptions are made by the instantiated protocol in order to satisfy the abstractions’ properties, are exactly the assumptions under which ACE operates.

2.2. Problem Definition

We now formally define the problems ACE implements. We start with a fair validated single-shot agreement definition, and then define the SMR problem which is a generalization of the single shot agreement into a multi-shot problem.

Fair validated single-shot agreement.

The fair validated agreement (vaba; Cachin2000RandomOI; CachinSecure) is single-shot problem in which correct parties propose externally valid values and agree on one unique such value. The formal properties are given below:

  • Agreement: All correct parties that decide, decide on the same value.

  • Termination: If all correct parties propose valid values, then all correct parties decide with probability .

  • Validity: If a correct party decides an a value , then is externally valid.

Note that the agreement and termination properties are not enough by them self to guarantee real progress of any multi-shot agreement system (e.g., Blockchain) that is built on top of the single-shot problem. Without external validity, parties are allowed to agree on some pre-defined value (i.e., (mostefaoui2017signature), which is basically an agreement not to agree. Moreover, as long as a value satisfies the system’s external validity condition (e.g., no contradicting transactions in a blockchain system), parties may decide on this value even if it was proposed by a byzantine party. However, since high stake is involved and byzantine parties may try to increase the ratio of decision values proposed by them, we require an additional fairness property that is a generalization of the quality property defined in (vaba):

  • Fairness: The probability for a correct party to decide on a value proposed by a correct party is at least . Moreover, during synchronous periods, all correct parties have an equal probability of for their values to be chosen.

Intuitively, note that by simply following the protocol byzantine parties can have a probability of (recall that of the parties are byzantine) for their value to be chosen in every protocols even during synchronous periods. And since during asynchronous periods the adversary can, in addition, block of the correct parties, we get that byzantine parties can increase their probability to . Meaning that the fairness property we require is optimal.

Fair state machine replication.

A state machine replication 222Sometime referred to as atomic broadcast (CachinOPODIS) (SMR) is a generalization of a single-shot agreement problem into a multi-shot agreement system (lamport2001paxos). Informally, an SMR system agrees on a (possibly infinite) sequence of valid values. Formally, every correct party inputs with a (possible infinite) sequence of externally valid values, and outputs a sequence of tuples of the form , where for an arbitrary large and is an (externally valid) value. Note that each output event contains exactly one such tuple, and in in the rest of the paper we sometimes refer to the sequence number as a slot number. An implementation of a fair SMR must satisfies the following properties:

  • Integrity: For every , a correct party outputs at most one tuple .

  • Validity: For every , if a correct party outputs a tuple , then is externally valid.

  • Termination: For every , all correct parties eventually output a tuple for some with probability .

  • Agreement: For every , if two correct parties output and , then .

  • Fairness: For every , the probability for a correct party to output s.t. was proposed by a correct party is at least .

In order to capture the requirements made by systems like Blockchains in which the validity of a value proposed in slot depends on all the decision values from slots 1 to , we add an additional property to our SMR definition:

  • FIFO: For every , if a correct party outputs a tuple for some , then for every previously outputted for some .

Moreover, while from a theoretical point of view we usually only care about the total resources consumed by correct parties up to the point when they all decide, ignoring the resources used from this point on. From a practical point of view, systems wish to garbage collect all resources allocated for a specific slot immediately after a decision for this slot has been made. Since it is not always straight forward to map resources into slots in which they are consumed, we capture the above by requiring following property:

  • Strong halting: For any arbitrary large , eventually all resources are de-allocated.

Note that the standard way to achieve halting is by the so called state transfer in which parties reliably broadcasts the decision value of each slot to all parties before de-allocating all the associated resources and moving to the next slot. This approach requires quadratic communication and is the approach we take in this paper. Dolev and Strong (dolev1983authenticated) have shown that even in a synchronous setting this quadratic communication is unavoidable, so at least asymptotically, this does not introduce a new overhead.

3. The View-by-View Paradigm

Many (if not all) practical agreement and consensus algorithms operate a leader-based view-by-view paradigm, designed for partially synchronous models, including the seminal work of Dwork et al. (dwork1988consensus) pioneering the approach, and underlying classical algorithms like Paxos (lamport2001paxos), Viewstamped-Replication (oki1988viewstamped), PBFT (pbft), and others (zyzzyva; ongaro2014search).

Protocols designed according to the view-by-view paradigm advance in views. Every view has a designated leader that proposes a value and tries to convince other parties to decide on it. In order to tolerate faulty leaders from halting progress forever, parties use timers to measure leader progress; if no progress is made they demote the leader, abandoning the current view and proceeding to the next one.

The main problem with this approach is that a faulty leader that does not send any messages is indistinguishable from a correct leader with asynchronous links. Therefore, protocols implementing this approach are not able to guarantee progress during asynchronous periods or weakly adaptive asynchronous attacks since parties advance views before correct leaders are able to drive decisions.

3.1. Core properties

Safety.

Perhaps the most important property of algorithms designed according to the view-by-view paradigm is their ability to satisfy safety during arbitrary long asynchronous periods. This is achieved via a careful view-change mechanism that governs the transition between views. View-change consists of parties wedging the current view by abandoning the current leader, and exchanging information about what might have committed in the view (the closing state of the view). In the new view, parties participate in the new leader’s phase only if it proposes a value that is safe in accordance with the closing state.

Liveness.

Algorithms that rely on leaders to drive progress cannot guarantee progress during asynchronous periods since they cannot distinguish between faulty leaders and correct ones with asynchronous links. During asynchronous periods, messages from the current leader may be delivered only after parties timeout and move to the next view regardless of how conservative the timeouts are set.

However, all these algorithms share an important property that our framework utilizes: for every view, if the leader of the view is correct and no correct party times out and abandons this view, then all correct parties decide in this view.

3.2. Practical Vulnerabilities

Deploying view-by-view algorithms requires tuning the leader timeouts. On the one hand, aggressive timeouts set close to the common network delay might cause correct leaders to be demoted due to spurious delays, and destabilize the system. On the other, conservative timeouts implies delayed actions in case of faulty leaders. It further opens the system to possible attacks by byzantine leaders that slow system progress to the maximum possible without triggering a timeout.

Another attack on the progress of leader-based protocols is the weak adaptive asynchrony in which an attacker blocks communication with the leader of each view until the view expires, e.g., via distributed denial-of-service attack. Last, a carefully executed adaptive asynchrony attack can cause a fairness bias. Some leaders (possibly byzantine) may be allowed to progress and commit their values, whereas an attacker blocks communication with other designated (possibly all correct) leaders. In Section 6, we demonstrate the above attacks, and show that ACE is resilient against them.

4. Framework abstractions

ACE provides “asynchronous boosting” for partially synchronous protocols designed according to the leader-based view-by-view paradigm. In a nutshell, ACE takes such a protocol, encapsulates a single view of the protocol into a leader-based view (LBV) abstraction that provides API to avoid timeouts, composes LBVs into a wave of instances running in parallel, interjects auxiliary actions in between successive waves, and chooses one LBV instance retrospectively at random. Detailed description is given in the next section. Section 4.1 defines the Leader based view (LBV) abstraction and Section 4.2 introduces auxiliary abstractions utilized by ACE.

4.1. Encapsulating view-based agreement protocols

As explained above, each view in a leader-based view-by-view algorithm consists of two phases: First, all parties wait for the leader to perform the leader-based phase to drive decision on some value , and then, if the leader fails to do it fast enough, parties switch to the view-change phase in which they wedge the current leader and exchange information in order to get the closing state of the view. To decide when to switch between the phases, existing algorithms use timeouts, which prevent them from guaranteeing progress during asynchronous periods. Therefore, in order to boost asynchronous liveness, ACE replaces the timeout mechanism with a different strategy to switch between the phases. To this end, the LBV abstraction exposes an API with two methods, and , where starts the first phase of the view (leader-based), and switches to the second (view-change). By exposing API with these two methods, we remove the responsibility of deciding when to switch between the phases from the view (e.g., no more timeouts inside a view) and give it to the framework, while still preserving all safety guarantees provided by each view in a leader-based view-by-view protocol.

Every instance of the LBV abstraction is parametrized with the leader’s name and with an identification , which contains information used by the high-level agreement algorithm built (by the framework) on top of a composition of LBP instances. The method gets no parameters and returns a tuple , where is either a value or ; and is the closing state of the instance, which consists of all the necessary information required by the specific implementation of the abstraction (e.g., a safe value for a leader to propose and a validation function all parties use to check if the proposed value is safe). The method gets the “closing state” that was returned from in the preceding LBV instance (or the initial state in case this is the first one), and outputs a value . Intuitively, the returned value from both methods is the “decision” that was made in the LBV instance, but as we explain below, the high-level agreement algorithm might choose to ignore this value.

The safety of view-by-view algorithms strongly relies on the fact that correct parties start a new view with the closing state of the previous one. Otherwise, they cannot guarantee that correct parties that decide in different views decide on the same value. Therefore, when we encapsulate a single view in our LBV abstraction and define its properties, we consider only executions in which the LBV instances are composed one after another. Formally, we say that the LBV abstractions are properly composed by a party in an execution if invokes the of the first instance with some fixed initial state (which depends on the instantiated protocol), and for every instance , invokes its with the state output of of instance . In addition, we say that the LBV abstractions are properly composed in an execution if they are properly composed by all correct parties. Figure 2 illustrates LBV’s API and its properly composed execution.

Figure 2. A properly composed execution: The method of instance gets the state output of the method of instance .

The formal definition of the LBV abstraction is as follows:

Definition 0 ().

A protocol implements an LBV abstraction if the following properties are satisfied in every properly composed execution that consists of a sequence of LBV instances:

Liveness:
  • Engage-Termination: For every instance with a correct leader, if all correct parties invoke and no correct party invokes , then invocations by all correct parties eventually return.

  • Wedge&Exchange-Termination: For every instance, if all correct parties invoke then all by correct parties eventually return.

Safety:
  • Validity: For every instance, if an or
    invocation by a correct party returns a value , then is externally valid.

  • Completeness: For every instance, if invocations by correct parties return, then no invocation by a correct party returns a value .

  • Agreement: If an or invoked in some instance by a correct party returns a value and some other or invoked in some instance by a correct party returns then .

Note that during the view-change phase in most leader-based protocols, parties send the closing state only to the leader of the next view. However, in ACE, since we run concurrent LBV instances, each with a different leader, we need all parties to learn the closing state after returns. Moreover, as mentioned above and captured by the Completeness property, we use to also boost decisions in order to guarantee that if the retrospectively chosen LBV instance successfully completed the first (leader-based) phase, than all correct parties decide at the end of its second phase. Therefore, when encapsulating the view-change mechanism of a leader-based protocol into the method, a small change has to be made in order to satisfy the above properties. Instead of sending the closing state only to the next leader, parties need to exchange information by sending the closing state to all parties and wait to receive such messages. No change is needed to the first phase of the encapsulated leader-based protocol since all the required properties for are implicitly satisfied.

4.2. Auxiliary abstractions

We now define two additional abstractions required by ACE. Similarly to the LBV abstraction, each instance is parametrized with an identification , and the implementation details and model assumptions are abstracted away.

Barrier.

The barrier abstraction is used to synchronize between parties, ensuring that correct parties wait for each other before progressing. The abstraction exposes two API methods: and . Below we define the properties each barrier implementation must satisfy:

Definition 0 ().

An implementation of the Barrier abstraction must satisfy the following properties:

  • B-Coordination: No invocation by a correct party returns before correct parties invoke .

  • B-Termination: If all correct parties invoke then all invocations by correct parties eventually return.

  • B-Agreement: If some invocation by a correct party returns, then all invocations by correct parties eventually return.

Leader-election.

Our Leader-election abstraction is similar to the one defined in (vaba), which exposes one operation, , to elect a unique leader. The formal properties are given below.

Definition 0 ().

An implementation of the Leader-election abstraction must satisfy the following properties:

  • L-Termination: If correct parties invoke , then all invocations by correct parties return.

  • L-Agreement: All invocations of by correct parties that return, return the same party.

  • L-Validity: If an invocation of by a correct party returns, it returns a party with probability for every .

  • L-Unpredictability: The probability of the adversary to predict the returned value of an invocation by a correct party before any correct party invokes is at most .

In Section 6 we give example implementations of these abstractions in the byzantine failure model.

5. Framework algorithms

In this section we present ACE’s asynchronous boosting algorithms, which are built on top of the abstractions defined above. We first present in Section 5.1 an algorithm for an asynchronous single-shot agreement, and then, in Section 5.2 we show how to turn it into an asynchronous SMR. For completeness, in Section 5.3, we show how to use the LBV abstraction to reconstruct a variant of the base partially synchronous view-by-view algorithm the LBV is instantiated with.

5.1. Asynchronous fair single-shot agreement

The pseudocode for the asynchronous single-shot agreement protocol appears in Algorithm 1 and a formal correctness proof is given in Section 5.1.1. An invocation of the protocol (SS-propose) gets an initial state and identification , where the initial state contains all the initial specific information (including the proposed value) required by the leader-based view-by-view protocol instantiated in the LBV abstraction.

1:upon  SS-propose(id,S) do
2:       ;
3:      while true do
4:            
5:            
6:            if  and did not decide before then
7:                 decide             
8:            
9:                   
10:
11:procedure wave()
12:      for all  do
13:            invoke non-blocking       
14:      
15:      
16:      return
17:
18:upon   returns  do
19:      send “ID, engage-done” to party
20:
21:upon  receiving “ID,engage-done” messages  do
22:      invoke
Algorithm 1 Asynchronous single-shot agreement.

The protocol proceeds in a wave-by-wave manner. The state is updated at the and of every wave and a decision is made the first time a wave returns a non-empty value. In every wave, each party first invokes the operation in LBV instances, each with a different leader. Each invocation gets the state obtained at the end of the previous wave or the initial state if this is the first wave.

Then, parties invoke and wait for it to return. Recall that by the B-Coordination property, returns only after correct parties invoke . When an invocation in an LBV instance with leader returns, a correct party sends an “engage-done” message to party , and whenever a party gets such messages it invokes . Denote an LBV instance as successfully completed when correct parties completed the first phase, i.e., their returned, and note, therefore, that a correct party invokes only after the LBV instance in which it acts as the leader was successfully completed. Thus, a invocation by a correct party returns only after LBV instances successfully completed.

Figure 3. Asynchronous single-shot algorithm. The chosen LBVs, which are marked in green, are properly composed.

Next, when the returns, parties elect a unique leader via the leader-election abstraction, and further consider only its LBV instance. Note that since parties wait until LBV instances have successfully completed before electing the leader, with a constant probability of the parties elect a successfully completed instance333This can be improved to in the byzantine case with if parties attach completeness proofs to messages., and even an adaptive adversary has no power to prevent it.

Finally, all parties invoke in the elected LBV instance to wedge and find out what happened in its first phase, using the returned state for the next wave and possibly receiving a decision value. By the Completeness property of LBV, if a successfully completed LBV instance is elected, then all invocations by correct parties return and thus all correct parties decide in this wave. Therefore, after a small number of waves all correct parties decide in expectation. Note that the sequence of chosen LBV instances form a properly composed execution, and thus since parties return only values returned from chosen LBVs, our algorithm inherits its safety guarantees from the leader-based protocol the LBV is instantiated with. An illustration of the algorithm appears in Figure 3.

5.1.1. Correctness Proof

We prove that Algorithm 1 satisfies Agreement, Termination, Validity, and Fairness properties in the asynchronous communication model. We start by proving the Agreement and Validity properties.

Observation 1 ().

The chosen LBV instances in Algorithm 1 form a properly composed execution.

Lemma 0 ().

Algorithm 1 satisfies Validity and Agreement.

Proof.

By the code, correct parties only decide on values returned from a method invoked on one of the chosen instances. Therefore, by Observation 1, the Validity and Agreement properties follow from the Safety properties required by the LBV abstraction.

We now prove that Algorithm 1 satisfies Termination. We start by showing that no honest party is stuck forever in a wave.

Claim 1 ().

Consider a wave that all correct parties start, then at least one invocation by a correct party eventually returns.

Proof.

Assume by way of contradiction that no () invocations by correct party eventually returns in wave . Thus, no correct party ever invoke in wave . Therefore, since all correct parties invoke in all LBV instances in wave , we get by the Engage-Termination property that all propose invocation by correct parties eventually return, and thus all correct parties eventually send engage-done messages to all correct leaders. Hence, all correct leaders eventually get engage-done messages, and thus eventually invoke . The contradiction follows from the B-Termination property.

Claim 2 ().

For every wave , if all correct parties start wave , then all correct parties eventually complete wave .

Proof.

By Claim 1, some invocation by a correct party eventually returns in wave . Thus, by the B-Agreement property, all invocations by correct parties eventually return in wave , and thus all correct parties eventually invoke in wave . By the L-Termination property, all invocations by correct parties eventually return in wave . Therefore, the Claim follows from the Wedge&Exchange-Termination property.

The next corollary follows by inductively applying Claim 2.

Corollary 0 ().

For every , all parties eventually complete wave in Algorithm 1.

For the rest of the proof we say that an LBV instance is completed if at least innovations by correct parties previously returned. We next bound the probability to choose a leader of a completed LBV instance.

Claim 3 ().

For every wave , at least LBV instances with correct leaders are completed before some correct party invokes .

Proof.

Consider some correct party that invokes at wave . By the code, its invocation was previously returned, and thus by the B-Coordination property, at least correct parties previously invoked . Therefore, at least correct parties received engage-done messages, at least of which are from correct parties. Since correct parties send engage-done messages to party only after their invocation in the LBV instance in which acts as leader returns, at least LBV instances with correct leaders are completed before some correct party invokes .

Claim 4 ().

Consider a wave that all parties start, the probability for all correct parties to decide at wave is at least .

Proof.

By Corollary 2, all correct parties invoke in view and, by the L-Agreement property, all invocations return the same leader. By Claim 3, at least LBV instances are completed before some correct party invokes at view . Therefore, by the L-Validity and L-Unpredictability properties, we get that the probability to choose a leader of a completed LBV instance is at least is . By the Completeness property of the LBV abstraction, if a completed LBV instance is chosen, then no invocation by correct party returns . Therefore, by the code, all correct parties decide at the end of wave with probability .

Lemma 0 ().

Algorithm 1 satisfies Termination.

Proof.

By Corollary 2, correct parties are never stuck indefinitely in any view, and thus all honest parties eventually start all views. Therefore, by Claim 4, all honest parties eventually decide with probability 1. Moreover, the expected number of views after which all correct party decide is .

Lemma 0 ().

Algorithm 1 satisfies Fairness.

Proof.

By the Completeness property of the LBV abstraction, if a completed instance is chosen, than all correct parties decide at the end of the wave. By Claim 3, for every wave , at least LBV instances with correct leaders are completed before a unique instance is chosen in the wave. Therefore, even if all LBV instances with faulty parties have been completed as well, the probability to decide on value proposed by a correct party is at least . Moreover, during synchronous periods, all instances with correct parties complete with equal probability, and thus by the L-validity, all correct parties have an equal probability for their values to be chosen.

5.2. Asynchronous fair state machine replication

The pseudocode for the asynchronous fair state machine replication appears in Algorithm 2. The parameter , passed to the SMR-propose

invocation, is a vector consisting of an initial state for each slot. We use an instance of the asynchronous single-shot agreement to agree on the value of each slot, and thus the SMR algorithm inherits the Integrity, Fairness, Validity and Agreement properties from the single-shot one. In order to satisfy FIFO, parties do not advance to the next slot until they learn the decision of the current one, and in order to satisfy Halting they de-allocate all resources associated with the current slot and abandon the slot’s single-shot algorithm once they move. Note that abandoning a single-shot algorithm might violate its Termination because it relies on the participation of all correct parties. Therefore, to satisfy the Termination of the SMR, we use a forwarding mechanism to reliably broadcast the decision value before abandoning the current slot and moving to the next one.

Since ACE is model agnostic, parties do not explicitly use threshold signatures and decision proofs in the forwarding mechanism. Instead, in every slot, correct parties waits until they either decide on a value (via the slot’s single-shot agreement) or receive decide” messages from parties claiming they have decided on a value . In the second case, the receiving party knows that at least one correct party decided even in the byzantine failure model. Then, to make sure that all correct parties eventually finish waiting even though some might have moved to the next slot already, parties use the barrier abstraction that provides the required guarantee with its B-Coordination property. Note that although the above forwarding mechanism works for both byzantine and crash failures models, in the latter case the forwarding mechanism can be simplified: a party only needs to echo its decision value before it moves to the next slot. So in this case, the gray lines in the pseudocode can be dropped.

1:upon  SMR-propose(S) do
2:      
3:      for every  do
4:            
5:            ;       
6:      while true do
7:            
8:            
9:            wait until
10:            send “slot, ” to all parties
11:            
12:            
13:            free all resources associated with
14:            output       
15:
16:upon  decide  do
17:      ;
18:
19:upon receiving “slot, ” from party  do
20:      
21:      if  s.t.  then
22:            ;       
23:
Algorithm 2 Asynchronous fair state machine replication.

5.3. Partially Synchronous view-by-view Agreement

For completeness, we show how to use the LBV abstraction to reconstruct a variant of the leader-based view-by-view partially synchronous base Agreement algorithm that the LBV abstraction is instantiated with. To this end, we assume the base algorithm provides two methods: getLeader and getTimeout. These methods should implement the logic used by the base algorithm to map designated leaders to views and set their timeouts, respectively. In particular, getLeader(v) gets a view and returns a party , and getTimeout(v, S) gets a view together with a state and returns a timeout.

The pseudocode appears in Algorithm 3. The protocol proceeds in views. Timeouts are used in order to demote a leader who was unable to drive progress. At the beginning of every view, parties first get the leader and the timeout of the current view, and then invoke in the leaders LBV instance and a timer to monitor the leaders progress. If the invocation returns before the timer expires, then is decided. In any case, whether the invocation returns or the timer expires, a is invoked in order to update the state and safely proceed to the next view.

Note that the algorithm forms a properly composed execution of the LBV instances, and thus Validity and Agreement are trivially satisfied by the Safety properties required by the LBV abstraction. As any protocol in the partially synchronous model, the termination of the algorithm requires a long enough synchronous period in which all correct parties execute the same view.

1:upon  ES-propose(id,S) do
2:      ;
3:      while true do
4:            
5:            
6:            invoke
7:            invoke a timer to expire in
8:            wait for timer to expire or to return
9:            if  returned  then
10:                 decide             
11:            
12:                   
13:
Algorithm 3 Reconstruction of base partially synchronous single-shot agreement: protocol for party .

6. ACE Instantiation

There are many possible ways to instantiate the ACE framework. We choose to evaluate ACE in the byzantine failure model with parties and a computationally bounded adversary due to the attention it gets in the Blockchain use-case. For the LBV abstraction, we implement a variant of HotStuff (hotstuff). For the leader-election we implement the protocol in (vaba; Cachin2000RandomOI), and for the Barrier we give an implementation that operates in the same model. All protocols use a BLS threshold signatures schema (boneh2001short) that requires a setup, which can be done with the help of a trusted dealer or by using a protocol for an asynchronous distributed key generation (kokorisbootstrapping).

Our evaluation compares the performance of ACE’s SMR instantiated with HotStuff, we refer to as ACE HotStuff, with the base HotStuff SMR implementation. To compare apples to apples, the base HotStuff and ACE HotStuff share as much code as possible. We present raw performance comparisons during synchronous periods, and demonstrate the performance during asynchrony and under adversarial attacks.

We proceed to describe our implementation of the ACE building blocks in Section 6.1, and in Section 6.2, we describe the environmental setup and performance measurements.

6.1. Implementation

We implemented all algorithms in C++, and made use of a BLS threshold signature (boneh2001short) implementation provided in (concordOpenSource). Communication is done over TCP to provide reliable links. We next describe our LBV, barrier, and leader election implementations. The communication complexity of a single LBV is linear and that of the barrier and leader-election is quadratic, leading to an expected total quadratic communication, for each slot.

Lbv.

The instantiation of the LBV abstraction is the four-step view by view algorithm of HotStuff (hotstuff). In the leader-based phase of each view in HotStuff, a leader drives a decision in four steps of communication s.t. in every step the leader sends a signed message (with the proposed value) to all parties, which in turn verify, sign, and send it back to the leader. To ensure linear communication, the leader utilizes threshold signatures (boneh2001short) to provide concise proofs of a quorum of signatures which are used for verification. If parties timeout before the leader completes all four steps, they start the view-change phase in which they send the closing state, that consists of the messages they received in this view to the leader of the next one.

To encapsulate the HotStuff protocol in the LBV abstraction we did the following: when a party invokes , it begins participating in the leader’s four-step protocol as described above, and whenever it gets the last step’s message, it returns the value therein. To verify the safety of the first step message, it uses the information in the state parameter passed to . Note that if is invoked before returns, then never returns (). In this case, parties behaves as if their timeouts expired in the original HotStuff with the following change: instead of sending the closing state only to the next leader, parties send it to all parties () since all parties act as leaders in the next wave. When a party gets such closing states, it updates the state parameter accordingly and outputs it. In addition, if at least one received closing states contains a valid four step’s message, then the party also outputs the value therein. Otherwise, it outputs . Note that we do not describe the specific logic of the four-step protocol and the structure of the state – an interested reader is referred for more details to (hotstuff).

Since the implementation keeps the safety logic of HotStuff, the Validity and Agreement of LBV are satisfied. Since implements the HotStuff leader’s phase, then if the leader is correct and no correct party invoke , then eventually all parties get the leaders four step’s message and return, thereby satisfying -Termination. The -Termination is satisfied since it returns after receiving closing states. Af for Completeness, if an invocation returns a value, than the invoking party gets the four step’s message. Therefore, if invocation by correct parties return, then all correct parties gets at least one closing state with the four step’s message during , and return a value .

Barrier.

When a party invokes , it broadcasts a message with its signature share for this barrier identification. When a party invokes , it waits for the first of the following two events. If it receives valid shares, it combines the shares into a threshold signature, sends it to all other parties, and returns. If it receives a correct threshold signature, it forwards it to all other parties and returns.

Since a invocation does not return before receiving shares, then at least correct parties previously invoked , thus satisfying B-Coordination. B-Termination is satisfied since if all correct parties send shares to all other parties, then every correct party receives shares, and can generate a valid threshold signature. Due to the forwarding of threshold signatures, if one correct party gets a threshold signature and returns, then all correct parities eventually get it and return, satisfying B-Agreement.

Leader-election.

The leader election is similar to that of (vaba) and (Cachin2000RandomOI). When a party invokes , it signs the instance’s identification and broadcasts its share. When a correct party collects valid shares, it combines them to a threshold signature, hashes the signature to get a pseudo random value, and returns the value modulo to get a random leader.

As the threshold signature is generated from valid shares, all correct parties eventually generate it provided that at least correct parties invoke , satisfying L-Termination. Due to the nature of threshold cryptography, the generated threshold signature is the same for all correct parties (boneh2001short), and since the hash and modulo functions are deterministic, all parties agree on the electing leader, satisfying L-Agreement. For formal analysis of L-Validity and L-Unpredictability please refer to (Cachin2000RandomOI; cachin2001secure; vaba).

6.2. Evaluation

We compare the SMR of ACE HotStuff with that of base HotStuff. In Section 6.2.1 we present the tests’ setup. Then, in Section 6.2.2, we measure ACE’s overhead during failure-free synchronous periods, and in Section 6.2.3 we demonstrate ACE’s superiority during asynchrony and attacks.

6.2.1. Setup

We conducted our experiments using instances on AWS EC2 machines in the same data center. We used between 1 and 16 virtual machines, each with 4 replicas. The duration of every test was seconds, and every test was repeated 10 times. The size of the proposed values is bytes. The latency is measured starting from when a new slot has begun until a decision is made. The throughput is measured in one of two ways. In tests where we altered the number of replicas, the throughout is the total number of bytes committed, divided by the length of the test. In tests where we show the throughout as a function of time, we aggregate the number of committed bytes in second intervals. We did not throttle the bandwidth in any run, rather we altered the transmission delays between the machines, using  (netem).

6.2.2. ACE’s overhead

The first set of tests compare ACE HotStuff performance with that of base HotStuff under optimistic, synchronous, faultless conditions. Figure 4 depicts the latency and throughput. The delay on the links was measured to be under . The latency increases with the growth in the number of replicas since each replica must handle an equal growth in the number of messages. Furthermore, as ACE HotStuff has a larger overhead than base HotStuff, the latency grows faster.

Figure 5 shows the latency and throughput with different delays added to the links. The latency of ACE HotStuff is twice that of base HotStuff. This is expected, as ACE is expected to execute 1.5 waves per slot, leading to 1.5x the latency. Add on the additional barrier, leader election abstraction and we arrive at 2x reduction in performance.

These tests show that the cost of using ACE is about 2x reduction in performance in the optimistic case. In the next tests we argue this cost is sometimes worth paying, as liveness of partially synchronous algorithms can be easily affected.

(a) Latency
(b) Throughput
Figure 4. Optimistic case with no network delay.
(a) Latency
(b) Throughput
Figure 5. Optimistic case under different network delays.

6.2.3. ACE’s superiority

From here on we choose a configuration of 32 replicas and set the transmission delay to be ms unless specified otherwise. The second set of tests compare ACE HotStuff and base HotStuff in adverse conditions concerning message delays. These tests manipulate two factors, the transmission delays (controlled via NetEm (netem)), and the view timeout strategy.

Periods of asynchrony.

The first test sets base HotStuff view timers to a fixed constant of ms, the time needed for a commit assuming a ms transmission delay. The test measures the performance drop during a short period in which transmission delays are increased, simulating asynchrony. For the first third of the test the network delay is ms, for the next third the delay is ms, and finally the delay returns to ms.

Figure 6 compares the throughput of ACE HotStuff and base HotStuff. While the network delay is ms, base HotStuff outperforms ACE HotStuff. However, once the network delay begins to fluctuate, the throughput of base HotStuff goes to since no leader has enough time to drive progress. ACE HotStuff only sees a drop in throughput proportional to the delay, meaning that it continue to progress at network speed.

figurec

Figure 6. Throughput with a fluctuating transmission delay.
Weak adaptive asynchrony attack.

Note that since the views in base HotStuff are leader-based, byzantine parties (or any other adversarial entity) can achieve the same “asynchronous” effect presented above by only slowing down the leaders. In the next test we demonstrate the above using a distributed denial of service (DDoS) attack, in which leaders are flooded with superfluous requests in an attempt to overload them and delay their progress in the leader-based phase.

Figure 7 compares the throughput of ACE HotStuff and base HotStuff, where the attack starts at the halfway mark of the test. The byzantine parties coordinate their attack by adaptively choosing a single correct party and flooding it with superfluous requests. In base HotStuff, byzantine parties target correct leaders (byzantine leaders are making progress). In ACE HotStuff, there is no designated leader, therefore the byzantine parties choose an arbitrary correct party to attack. Our logs show that in base HotStuff progress is mainly made in views where byzantines parties are leaders. If they would not drive progress, the throughput would drop near .

figurec

Figure 7. Throughput under DDoS attack.
Long conservative timeouts.

The previous two scenarios operated base HotStuff with a fixed aggressive view timer, which was based on the expected network delay. This caused premature timer expiration during periods of increased delays (due to asynchrony or attacks). One might think that a possible solution can be to set a very long timeouts that will never expire, thus letting the base HotStuff protocol progress in network speed. However, the downside of conservative timers is that byzantine parties can perform a silent attack on the protocol’s progress by not driving views when they are leaders, forcing all parties to wait for the long timeouts to expire.

The next test evaluates base HotStuff with a conservative view timer of second, fixed to be much higher than expected needed to commit a view, under the silent attack starting at the half way mark. Figure 8 presents the results. Before the attack, base HotStuff indeed progresses in network speed, but during the attack, the throughput drops significantly since a few consecutive byzantine leader might stall progress for seconds. In ACE HotStuff we see a much smaller drop, but more fluctuation. This is due to the fact that byzantine leaders do not drive progress in their LBV instances, and thus the expected number of waves until a decision is now higher.

figurec

Figure 8. Throughput with conservative timeouts under byzantine silence attack.
Adjusting timeouts.

As the scenarios above demonstrate, neither being too aggressive nor being too conservative works well for base HotStuff during asynchrony or attacks. Therefore, in practice, when HotStuff is deployed it typically adjusts timers during execution according to progress or lack of it. The most common method (used also by PBFT (pbft) and SBFT (sbft)) is to increase timeouts whenever timers expires too early, and decrease them whenever progress is made in order to try to learn the network delay and adapt to it’s dynamic changes. To test this method, we implement an adaptive version, starting with a delay of . If a timeout is reached in a view before a decision is made we set the next view’s timeout to . Otherwise, the next view’s timeout is set to .

We evaluate this method against the following attack that combines insights from the previous ones. The results are shown in Figure 9. In the second half of the experiment, byzantine parties perform a DDoS attack on correct leaders, causing the view timers to increase, and then perform the silence attack (in views they act as leaders) to stall progress as much as possible. As expected, base HotStuff throughput drops to almost zero, whereas ACE HotStuff continues driving decisions. Same as in the previous test, ACE HotStuff suffers from fluctuation due to the probability to choose a byzantine leader that did not made progress in its LBV instance. Another interesting phenomenon is the x2 performance drop of base HotStuff before the attack begins compared to previous tests. This is due to the timeout adjustment mechanism, which reduces the timers after every successful view, resulting in a too short timeout in every second view.

figurec

Figure 9. Throughput with adjusting timeouts under a combination of DDoS and silence attacks.

While the timer adjustment algorithm can be further enhanced, it is an arms race against the adversary – for each method, there is an adversarial response. In addition, although this evaluation is focused on HotStuff, the only ingredient of the algorithm that is under attack is the timeout, hence the evaluation exemplifies the weakness of all leader-based view by view algorithms. Therefore, our evaluation suggests that the overhead of ACE in the optimistic case is worth paying when high availability is desired under all circumstances.

7. Related work

The agreement problem was first introduced by Pease et al. (pease1980reaching) almost 40 years ago, and has received an enormous amount of attention since then (canetti1993fast; abd2005fault; yin2003separating; pbft; zyzzyva; amir2006scaling; martin2006fast; li2007beyond; amir2007customizable; clement2009making; amir2011prime; miller2016honey; liu2016xft; duan2014bchain; garay1998fully). One of the most important results is the FLP (fischer1982impossibility) impossibility, proving that deterministic solutions in the asynchronous communication models are impossible. Below we describe work that was done to circumvent the FLP impossibility, present two related frameworks that were previously proposed for the agreement problem, compare our SMR definition to other systems in the literature, and discuss alternative fairness definitions.

Agreement in the partial synchrony model.

A practical approach to circumvent the FLP impossibility is to consider the partial synchrony communication model (oki1988viewstamped; ongaro2014search; sbft; hotstuff; zyzzyva), which was first proposed by Dwork et al. (dwork1988consensus) and later used by seminal works like Paxos (lamport2001paxos) and PBFT (pbft). As explained in detail in Section 3, protocols designed for this model never violate safety, but provide progress only during long enough synchronous periods. Despite their limitations, they are widely adopted in the industry due to their relative simplicity compared to the alternatives and their performance benefits during synchronous periods. For example, Casandra (cassandra), Zookeeper (zookeeper), and Google’s Spanner (spanner) implement a variant of Paxos (lamport2001paxos), and VMware’s Concord (concord), Facebook’s Libra (libra) and IBM’s Hyperledger (hyperledger), implement SBFT (sbft), HotStuff (hotstuff) and PBFT (pbft), respectively.

Agreement in the asynchrony model.

As first shown by Ben-Or (ben1983another) and Rabin (lehmann1981advantages), the FLP impossibility result does not stand randomization. Meaning that the randomized version of the Agreement problem, which guarantees termination with probability , can be solved in the asynchronous model provided that parties can flip random coins. The algorithms in (ben1983another; lehmann1981advantages) are very inefficient in terms of time and message complexity, and there has been a huge effort to improve it over the years. Some considered the theoretical full information model, in which the adversary is computationally unbounded, and showed more efficient algorithms that relax the failure resilience threshold (kapron2010fast; king2013byzantine). These are beautiful theoretical results but too complex to implement and maintain.

A more practical model for randomized asynchronous agreement is the random oracle model in which the adversary is computationally bounded and cryptographic assumptions (like the Decisional Diffie–Hellman (diffie1976new)) are valid. In the context of distributed computing, this model was first proposed by Cachin et al. (Cachin2000RandomOI; CachinSecure). In (CachinSecure) they proposed an almost optimal algorithm for the agreement problem. A variant of this algorithm was later implemented in Honeybadger (miller2016honey) and Beat (duan2018beat), which are the first academic asynchronous SMR systems. The protocol in (CachinSecure) is optimal in terms of resilience to failures and round complexity, but has an inefficient communication cost. Improving the communication cost was an open problem for almost 20 years, until it was recently resolved in VABA (vaba). ACE borrows a lot from VABA (vaba). In fact, ACE can be seen as a generalization of the approach introduced in VABA of letting parties progress in parallel and then retrospectively choosing one.

Frameworks for agreement.

There are two previously proposed agreement frameworks (guerraoui2010next; lamport2009vertical) that we are aware of.

The next 700BFT (guerraoui2010next) framework proposes an approach to compose different byzantine SMRs. They observed that no byzantine SMR can outperform all others under all circumstances, and introduce a general way for a system designer to switch between implementations whenever the setting changes. They defined Abstract, which is an abortable SMR abstraction, that captures the progress and safety requirements from a partially synchronous SMR, and provides guidance on how multiple Abstract instances should be composed. Our work is very different from theirs. While they defined an abstraction in order to compose different SMR view-by-view implementations to achieve better performance in the partially synchronous model, our LBV abstraction provides an API to decouple the leader-based phase from the view-change phase in each view, which in turn allows us to compose LBV instances in a novel way that avoids leader demotions via timeouts and boost liveness in asynchronous networks.

Vertical Paxos (lamport2009vertical) is a class of consensus algorithms that separates the mechanism for reaching agreement from the one that deals with failures. The idea is to use a fast and small quorum of parties to drive agreement, and have an auxiliary reconfiguration master to reconfigure this quorum whenever progress stalls. The protocol for agreement relies on the participation of all parties in the dedicated quorum, and thus stalls whenever some party fails. The master is emulated by a bigger quorum, which uses an agreement protocol to agree on reconfiguration, and thus can tolerate failures.

State machine replication.

Paxos (lamport2001paxos) (crash-failure model) and PBFT (pbft) (byzantine model) were the first to show how to build an SMR from a single-shot agreement problem. In both cases, similarly to ACE, parties use a single-shot agreement instance to agree on the value of every slot, but contrary to our algorithm, they do not satisfy the FIFO property since they do not make sure all parties learn the decision value before moving to the next slot. For some applications, e.g., Blockchains, this can be crucial since the validity of a value sometimes depends on decision values of previous slots (bitcoin).

Moreover, for practical reasons, systems implementing the Paxos and PBFT algorithms use periodic checkpoints (concord; libra; bessani2014state), in which parties exchange all the decision values made since the last checkpoint in order to free resources associated with these slots. The cost of these checkpoints is quadratic in the number of decision values and since better than quadratic communication per decision is impossible (dolev1983authenticated) we decide to avoid these checkpoints. Instead, we formally define the FIFO and Strong halting properties and perform a quadratic forwarding mechanism after each slot. It is important to note that this is a design choice; ACE’s abstractions can be used in a similar way to build SMR with different guarantees.

Fairness.

Although the Agreement and SMR problems have been studied for many years, the question of fairness therein was only recently asked, and we are aware of only few solutions that provide some notion of it (amir2011prime; miller2016honey; lev2019fairledger; vaba). Prime (amir2011prime) extends PBFT (pbft) to guarantee that values are committed in a bounded number of slots after they first proposed, and FairLedger (lev2019fairledger) uses batching to ensures that all correct party commits a value in every batch. However, in contrast to ACE, both protocols are able to guarantee fairness only during synchronous periods. Honeybadger (miller2016honey)

is an asynchronous protocol that, similarly to FairLedger, batches values proposed by different parties and commits them together atomically. It probabilistically bounds the number of epochs (and accordingly the number of slots) until a value is committed, after being submitted to

parties. The VABA (vaba) protocol does not use batching, and provides a per slot guarantee that bounds the probability to choose a value proposed by a correct party during asynchronous periods. ACE provides similar fairness guarantees during asynchrony, but also guarantees equal chance for each correct party during synchrony.

8. Discussion

In this paper we introduced ACE: a general model agnostic framework for boosting asynchronous liveness of any leader-based SMR system designed for the partially synchronous model. The main ingredient is the novel LBV abstraction that encapsulates the properties of a single view in leader-based view-by-view algorithms, while providing an API to control the scheduler of the two phases, leader-based and view-change, in each view. Exploiting this separation, ACE provides a novel algorithm that composes LBV instances in a way that avoids timers and provides a randomized asynchronous SMR solution.

ACE is model agnostic, meaning that it does not add any assumptions on top of what are assumed in the instantiated LBV implementation, thus provides a generic liveness boosting for both byzantine and crash-failure SMRs. In order to instantiate ACE with a specific SMR algorithm, all a system designer needs to do is alter the code of a single view to support LBV’s API; this should not be too complicated as the view logic must already implicitly satisfy the required API’s properties.

In addition to boosting liveness, ACE is designed in a way that inherently provides fairness due to its randomized election of leaders in retrospect. Moreover, ACE provides a clear separation between safety, which relies on the LBV implementation, and liveness, which is given by the framework. As a result, a system designer that chooses to instantiate ACE gets a modular SMR implementation that is easier to prove correct and maintain – if a better agreement protocol is published, all the designer needs to do in order to integrate it in the system is to alter the LBV implementation accordingly.

To demonstrate the power of ACE we implemented it, instantiated it with the state of the art HotStuff (hotstuff) protocol, and compared its performance to the base HotStuff implementation. Our results show that while ACE suffers a 2x performance degradation in the optimistic, synchronous, failure-free case, it enjoys absolute superiority during asynchronous periods and network attacks.

Acknowledgements.
We thank Ittai Abraham for helpful initial discussions and Dahlia Malkhi for reviewing drafts and suggesting valuable improvements. We are also grateful to Guy Golan-Gueta, Guy Goren, Idit Keidar, Eleftherios Kokoris-Kogias, and David Tennenhouse for their useful feedback.

References