Making Speculative BFT Resilient with Trusted Monotonic Counters

Consensus mechanisms used by popular distributed ledgers are highly scalable but notoriously inefficient. Byzantine fault tolerance (BFT) protocols are efficient but far less scalable. Speculative BFT protocols such as Zyzzyva and Zyzzyva5 are efficient and scalable but require a trade-off: Zyzzyva requires only 3f + 1 replicas to tolerate f faults, but even a single slow replica will make Zyzzyva fall back to more expensive non-speculative operation. Zyzzyva5 does not require a non-speculative fallback, but requires 5f + 1 replicas in order to tolerate f faults. BFT variants using hardware-assisted trusted components can tolerate a greater proportion of faults, but require that every replica have this hardware. We present SACZyzzyva, addressing these concerns: resilience to slow replicas and requiring only 3f + 1 replicas, with only one replica needing an active monotonic counter at any given time. We experimentally evaluate our protocols, demonstrating low latency and high scalability. We prove that SACZyzzyva is optimally robust and that trusted components cannot increase fault tolerance unless they are present in greater than two-thirds of replicas.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

09/29/2021

Fast B4B: Fast BFT for Blockchains

Low latency is one of the desired properties for partially synchronous B...
04/19/2022

Basilic: Resilient Optimal Consensus Protocols With Benign and Deceitful Faults

The problem of Byzantine consensus has been key to designing secure dist...
02/03/2021

TBFT: Understandable and Efficient Byzantine Fault Tolerance using Trusted Execution Environment

While the requirements for reliability increasing rapidly in distributed...
10/03/2020

DuoBFT: Resilience vs. Efficiency Trade-off in Byzantine Fault Tolerance

This paper presents DuoBFT, a Byzantine fault-tolerant protocol that pro...
05/21/2021

Classifying Trusted Hardware via Unidirectional Communication

It is well known that Byzantine fault tolerant (BFT) consensus cannot be...
02/03/2022

Dissecting BFT Consensus: In Trusted Components we Trust!

The growing interest in secure multi-party database applications has led...
03/08/2022

RAPTEE: Leveraging trusted execution environments for Byzantine-tolerant peer sampling services

Peer sampling is a first-class abstraction used in distributed systems f...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Distributed ledger technology [23, 6, 9] and cryptocurrencies [39, 10] have become the great motivators for distributed consensus protocols today. These applications demand scalability and performance over high-latency networks such as the Internet. Current approaches range from proof-of-work [39, 10] to Byzantine fault tolerance (BFT) [30, 7, 36, 11, 40, 22, 34].

Both approaches have significant drawbacks. Proof of work derives its Sybil-resistance from the magnitude of its power consumption [9]. Furthermore, its scalability comes at the cost of eschewing transaction finality [46, 47]. Conversely, BFT protocols [30] are computationally efficient, but scale poorly. As traditionally formulated, these require two phases [29] and a quadratic number of messages [17]. However, a wide variety of improvements can be obtained over classical results [11] by varying cryptographic [43], failure-mode [40, 32], timing [15, 38], and safety [28] assumptions.

Zyzzyva [28, 5] is the simplest and most compelling of the BFT protocols. It takes a speculative approach that optimizes for the common case where no replicas are faulty. MinZyzzyva [40] improves on Zyzzyva by assuming that each replica contains a trusted monotonic counter, whose integrity is guaranteed by hardware. In particular, it reduces the total number of replicas needed to tolerate faults from to .

Total # of Monotonic counters
Protocol Replicas Resilience Total Active
Zyzzyva 0 - -
Zyzzyva5 - -
MinZyzzyva 0
SACZyzzyva 1
TABLE I: Comparison of speculative BFT protocols tolerating faults. Resilience refers to the maximum number of replicas that can be non-responsive without falling back to non-speculative operation.

This assumption, that every replica is equipped with a trusted component, is often unrealistic. In the real world, only some devices will have the necessary hardware, especially when new hardware is being rolled out. Even if eventually all replicas have the necessary hardware support, over time some hardware platforms will become obsolete either because have become outdated in comparison to newly-released hardware, or because trust in them has been revoked in response to some vulnerability. Protocols that require trusted components in every participant are thus fragile.

Speculative BFT protocols have extremely simple and efficient speculative execution paths when there are no faults or delays. In the event of a fault, Zyzzyva and MinZyzzyva have the client execute a non-speculative fallback sacrificing performance. This results in a major drawback: if even a single replica fails to respond to the client, the protocols immediately fall back to non-speculative execution, unlike non-speculative protocols which concern themselves mainly with faulty primaries [14]. Realistic communication networks like the Internet are only partially synchronous. In such networks, a single slow—not necessarily faulty—replica can trigger the non-speculative execution for each protocol run, thereby undermining the efficiency promise of the speculative approach. Speculative variants like Zyzzyva5 [28] minimize the need for non-speculative fallback, but have lower fault-tolerance, requiring replicas to tolerate faults.

In this paper, we present Single Active Counter Zyzzyva (SACZyzzyva), which overcomes these drawbacks. It requires only a single replica, the primary, to have an active monotonic counter, and eliminates the need for a non-speculative fallback (as in Zyzzyva5), thus allowing SACZyzzyva to tolerate a subset of replicas being slow, while requiring only replicas (as in Zyzzyva). We compare SACZyzzyva to other speculative BFT protocols in Table I. The same principles that we use in SACZyzzyva can be applied in other settings: other BFT protocols can be adapted to use our single active counter approach, resulting in lower latency while avoiding the need to equip all replicas with hardware-supported monotonic counters.

The cost of supporting this heterogeneity—of not requiring that all replicas have trusted components—is the need for more replicas to tolerate the same number of faults : in SACZyzzyva, compared to in MinZyzzyva. We show that SACZyzzyva is optimally fault tolerant. Specifically, it is not possible to tolerate more than failures—as SACZyzzyva does—unless more than two thirds of parties have access to a trusted component (as MinZyzzyva does).

In summary, our main contributions are as follows:

  • We propose SACZyzzyva (Section IV), a Zyzzyva variant that tolerates faults and uses a trusted monotonic counter to eliminate the need for a non-speculative fallback, making it more robust to slow replicas.

  • We implement and evaluate SACZyzzyva over both low- and high-latency networks (Section V), showing that SACZyzzyva transaction latencies increase at a rate of less than per additional replica.

  • We show that the use of trusted components in a consensus protocol cannot increase fault-tolerance unless more than two thirds of parties have a trusted component (Section VI).

Ii Preliminaries

Ii-a Zyzzyva

Zyzzyva [28] is an efficient Byzantine-fault-tolerant state-replication protocol which uses speculation to reduce the replication overhead, at the cost of requiring rollback in some instances. The replicas receive requests ordered by the primary, and immediately reply to clients without running an expensive consensus protocol. Based on the received replies, clients are able to detect inconsistencies and can help the replicas achieve a consistent state. In fault-free executions with network delays that do not trigger protocol timeouts, no further action is required by clients, thereby making the protocol simple and fast.

The protocol works as follows: The client sends a request to the primary, which in turn proposes an order and forwards it to the other replicas. The replicas speculate that the primary’s proposal is consistent and reply to the client. If the client receives matching replies from all replicas, then speculative execution is successful and the request is guaranteed persistent. However, if after some timeout the client receives between and matching replies, the client executes a non-speculative fallback: it broadcasts the responses that it has received to all replicas, and waits for acknowledgements. The replicas acknowledge the commit certificate if it is consistent with the local history of ordered requests. This non-speculative fallback allows for operation in the presence of faults, but comes with significant latency costs.

Finally, if acknowledgements are not received, the client broadcasts the request to all replicas, who communicate with the primary to assign a sequence number and execute it.

Zyzzyva is efficient and scalable, but this efficiency comes at a price, in the form of fragility. If even a single replica is faulty, or network conditions cause a single message to be delayed beyond the timeouts, speculative execution fails and the client must execute its non-speculative fallback, requiring at least two additional rounds of communication, in addition to the time spent waiting for the timeout. This negates Zyzzyva’s main contribution—its high performance—especially over the internet where Zyzzyva’s small communication footprint would otherwise be most useful.

A variant, Zyzzyva5 [28, Section 4.1], was introduced along with Zyzzyva, which avoided this non-speculative fallback at the cost of fault-tolerance by increasing the number of replicas from to , and allowing requests to complete after responses. With these thresholds, all requests complete speculatively, but at the cost of Zyzzyva5 only tolerating faults in comparison to Zyzzyva’s .

Ii-B Hybridization and trusted components

Another way to improve on classical BFT results is to use hybridization [45], in which replicas contain several components of different failure modes. Under this model, failed replicas cannot behave completely arbitrarily; instead, they are limited by their non-Byzantine components.

A common approach is to design the replicas around a trusted component, whose output can be authenticated by other parties and is subject only to crash-failures. This can be achieved with the aid of the hardware-assisted trusted execution environments (TEEs) that exist in many modern CPUs.

TEEs protect the execution of a security-critical piece of application from potentially-compromised applications, system administrators and the operating system itself. By the process of remote attestation, they can securely communicate the existence of such trusted components to an external verifier, allowing other parties to rely on the security guarantees provided by the hardware. Examples of such hardware mechanisms are Intel SGX [24] and ARM TrustZone [2].

TEEs are highly general; in concrete protocols, we generally do not consider their full functionality, but instead use them to implement more limited trusted functionality that can be effectively reasoned about. An especially popular such functionality is the trusted monotonic counter.

A trusted monotonic counter uses these hardware security features to realize a verifiably monotonically increasing counter.

Let indicate that a message has been signed by some entity . A trusted monotonic counter component TC is assumed to have a well-known public key and provide the following interface:

  • : Create a new trusted monotonic counter instance , with initial state and public key .

  • : Update the counter state , returning a signed tuple linking a message to this particular increment operation and trusted monotonic counter instance.

Trusted monotonic counters are used in BFT protocols such as MinBFT [40] to prevent message equivocation. A trusted monotonic counter value can be attached to a message in order to detect whether the sender communicated the same data to all recipients. If the sender equivocates, different messages will have different counter values, this being detectable as a ‘hole’ in the set of counter values [31, 37, 42, 40]. Persistent hardware-backed versions of such counters are available within TPMs [20] and the Intel SGX [24] platform; alternatively, a TEE can be used to implement a memory-backed monotonic counters that offers high performance at the cost of ephemerality or replication [37].

Iii Model and Problem Statement

Iii-a Network Model

In this paper we consider the weak-synchrony model [11, 38]. Messages and computation can be arbitrarily delayed, but the delay of a message sent at time cannot grow faster than the timeout period—which may vary adaptively—indefinitely.

This model permits polynomially-increasing delays when exponential backoff is used to increase message timeouts [38, §3.1]. However, it does not allow an adversary to continually delay messages so that they arrive after the exponentially-increasing timeouts, thereby achieving eventual synchrony [28]. This model enables us to analyze liveness during a period of synchrony that will eventually occur. This also avoids the well-known FLP impossibility result [19], which showed that it is not possible to achieve consensus in a fully asynchronous system [33].

Iii-B System Model

We consider a distributed system of replicas, of which up to may be faulty. We suppose that some, but not all, replicas are equipped with a trusted component and, in particular, with a trusted monotonic counter. The result is that out of replicas can, if faulty, behave completely arbitrarily, whereas the other replicas, if they fail, are assumed to be limited in their behavior by the trusted component.

Iii-C Problem Statement

Our goal is to build an efficient state-machine replication protocol that allows the replicas to complete a request

  • in a linear (in ) number of messages, and

  • without significant performance reductions in the event of up to faults.

We borrow from Zyzzyva [28] the properties that our BFT protocol must satisfy to be correct. The first one is safety: suppose a request completes with a response indicating a history —a sequence of ordered and completed requests—then the history of any other completed request is a prefix of , or vice-versa. Hence, from the perspective of the client, the state machine history never diverges, even if that of individual replicas might. The second one is liveness: any request issued by a correct client eventually completes.

Iv SACZyzzyva

In the original Zyzzyva with replicas, a request is included in a new view only when it appears in out of view-change messages. Since up to of these view-change messages may be from faulty replicas, this means that every correct replica must execute the request in order to guarantee that a speculatively-executed request will be included in the history of future views.

The MinZyzzyva [40] protocol uses a trusted monotonic counter in each replica to order requests and prevent equivocation. In doing so, it reduces to the number of replicas needed to tolerate faults, but does not change the protocol in a fundamental way. However, the MinZyzzyva view-change protocol differs from that of Zyzzyva, with the initial state of a view being determined as in MinBFT [40, p. 8]: a request is included in the history of a new view when it appears in any view-change message. This means that MinZyzzyva needs only one copy of a completed request to appear in any set of view-change messages in order to guarantee that speculatively-executed requests are not lost. By modifying Zyzzyva to order requests within a view using a trusted monotonic counter in the primary, we can use the same inclusion criteria during view-changes as in the MinBFT protocols, allowing requests to safely complete after only responses, eliminating the need for a non-speculative fallback. We dub this protocol Single Active Counter Zyzzyva (SACZyzzyva).

The basic principle of SACZyzzyva is to use a trusted monotonic counter in the primary to bind a sequence of consecutive counter values to incoming requests, ordering requests while avoiding the need for communication between replicas, whether directly or via the client. It does this by signing a tuple consisting of the cryptographic hash of the request and a fresh (i.e. has not been used before) counter value. This is then sent to all replicas in an order-request message. Because the primary is the only replica that actively maintains a counter, we call this counter the “Single Active Counter” (SAC) construct. We therefore require only that replicas have a trusted component, enough that there will always be at least one correct replica that can function as primary.

Figure 1 shows the communication pattern of SACZyzzyva. As in the original Zyzzyva, the primary gathers the requests from clients and sends them to all replicas in a order-request message.

Fig. 1: The communication patterns of Zyzzyva and SACZyzzyva with one faulty replica. Without faults or network delays, Zyzzyva and SACZyzzyva have identical communication patterns, but if any replicas are faulty, as illustrated, Zyzzyva requires two extra rounds of communication, shown in gray.

The main difference is that the order-request message is bound to a monotonic counter value to prevent equivocation by the primary. All replicas execute the requests and reply to the client directly if the trusted monotonic counter value is sequential to those that the primary has previously sent. If the client receives replies with consistent values and histories, it considers the request complete. Otherwise, it repeatedly sends the requests directly to the replicas, so that they can detect misbehavior by the primary and so elect a new one.

The protocol is described below in greater detail. The basic steps are shown in boxes below; further explanation and specifics appear beneath each step. We assume that there is some well-known mapping from view-numbers to primary replicas. One such mapping is , where the replicas are numbered such that the first replicas possess a trusted monotonic counter. For simplicity of the protocol description, when a replica broadcasts a message to all replicas, this includes itself.

Iv-a Agreement protocol

Requests are initiated similarly to the original Zyzzyva. However, unlike with Zyzzyva, only replies are needed before an operation is accepted as complete. After receiving a request from the client, the primary binds a counter value to the request and then sends it to the replicas for execution, who reply directly to the client.

C-1. The client sends a request to the primary.

Explanation: Since all requests must pass through the primary in order to be executed, the client can initially send the request to the primary only.

Details: The client sends a message

to the primary, where is the requested operation, is the address of the client to which the replicas must reply, and is a monotonically increasing identifier used to identify whether a request has already been executed.

R-1. Upon receiving a valid request message from a client, the primary binds the request to a counter value and then broadcasts it to all replicas as an order-request message.

Explanation: Note that the primary only needs to act on valid requests; a request might be invalid if it is not syntactically correct, but there may be other cases, such as if the state-machine being replicated includes client authentication or replay protection functionality.

Details: After receiving a request

the primary verifies that is valid and then binds a request number to it, using its trusted monotonic counter to obtain an ‘ordering certificate’

which it includes in the order-request message

that is broadcast to all replicas.

R-2. Upon receiving a order-request message for the current view, each replica executes the request contained in the message and responds directly to the client.

Explanation: SACZyzzyva replicas execute requests immediately. However, since the primary uses its trusted monotonic counter to number the requests within each view, the agreement protocol ends here and no ‘commit certificate’ subprotocol is needed as in Zyzzyva [28, Section 3.2, Steps 4.b.*].

Details: Replicas execute requests immediately if they have executed all previous requests, and store the largest identifier of each executed request from each client. If any previous requests have not yet been executed, the replica demands a copy of the corresponding order-request messages from the primary using a fill-hole message. If the replica does not receive a response within time , the primary is deemed to be faulty and so the replica requests a view-change.

C-2a. The client waits for consistent replies from distinct replicas; it then accepts the response contained in these replies.

Explanation: During periods of synchrony, when the primary is correct and at least replicas are correct overall, the client will receive sufficient reply messages to accept a response. This is in contrast to Zyzzyva, which can only accept at this point only after receiving responses from all replicas, thus necessitating additional steps as a fallback.

Details: The client receives messages

from distinct replicas for some valid order-request message in view and response . The client then accepts the value as the response to the request contained in .

C-2b. After each time interval that the client has not received consistent responses from distinct replicas, the client broadcasts the request to all replicas.

Explanation: If the client does not receive a timely quorum of responses, then it is possible that the replicas did not all receive the request from the primary. In this case, the client sends the request to the replicas directly, so that they can determine whether the primary is willing to order the request, and initiate a view-change if not.

Details: The client broadcasts to all replicas the message , previously sent to the primary in step C-1.

R-3. Upon receiving a request message whose is greater than the last cached identifier for that client, a replica will send it to the primary, and then wait for time to receive a order-request message that will allow it to execute step R-2, otherwise requesting a view-change and broadcasting the request to all replicas.

Explanation: Routing requests through the primary makes it into a single-point-of-failure. In order to prevent the primary from dropping requests—and thus violating liveness—the client rebroadcasts its request to the replicas so that they can submit the request on the client’s behalf, giving the replicas the opportunity to observe the primary’s misbehavior first-hand and then trigger a view-change. As a side-effect, this also allows request processing to continue when the client does not know the current primary.

Details: In addition to the above, a replica receiving a request from another replica responds with the corresponding order-request if it has it.

Iv-B View-change protocol

In the Zyzzyva protocol, a request is included in the history of a new view if and only if there are view-change messages available containing the request. As there might be only view-change messages from correct replicas, to be certain that a request will be included in any new view, the client therefore needs to ensure that every correct replica has responded. In the SACZyzzyva view-change protocol, the canonical ordering provided by the trusted monotonic counter allows us to safely include requests whose ordering exists in even a single view-change message. The client therefore needs only replies in order to be certain that a request will persist across the next view-change.

VC-1. When a replica requests a view-change, it broadcasts a req-view-change message to all other replicas and increases its timeout in some implementation-defined way.

Explanation: This part of the view-change protocol remains unchanged from Zyzzyva.

Details: The replica that has witnessed misbehavior of the primary of view broadcasts a message to all replicas.

VC-2. Upon receiving req-view-change messages for the current view , a replica stops processing requests in the current view and broadcasts a view-change message to all replicas. If the view-change does not complete within time , the replica requests a new view-change.

Explanation: Since there is no prima-facie evidence of misbehavior by the primary, before committing to a view-change each replica waits until misbehavior has been reported by at least replicas, so that it can prove to others with its view-change message that at least one report is genuine.

Details: More specifically, replica sets its current view-number to and broadcasts a message , where is the new view-number, is the most recent view- or checkpoint-certificate, is the set of requests that it has executed in view , and is a set of req-view-change messages requests for view .

VC-3. Upon receiving view-change messages for a new view , the primary for instantiates a trusted monotonic counter instance and broadcasts a new-view message to the replicas.

Explanation: With view-change messages, any request that has been accepted by a correct client in the last view must be present in at least one of them. This means that the primary can now safely propose a new view. Rather than directly including the view’s initial state, the new-view message includes the view-change messages directly, so that the other replicas can themselves verify that all completed requests are included in the history of the new view.

Details: If the view-number in these messages is less than that of this replica’s current view number, then this step is ignored. Otherwise, the new primary runs , yielding a new trusted monotonic counter with corresponding public key , then broadcasts the message to all replicas, where are the valid view-change messages that has been received.

VC-4. Upon receiving the first valid new-view message for view , each replica broadcasts a view-confirm message containing a hash of the new-view message it has received.

Explanation: Though a valid new-view message is guaranteed to contain every completed request, a faulty primary can provide a different set of view-change messages to each replica, causing them to disagree on whether uncompleted requests are included. This step ensures that all completed requests will build on the same new-view message.

Details: For the new-view message to be valid, it must contain view-change messages from distinct replicas, and a public key that has been verified to belong to a trusted monotonic counter instance. If the view-number in the new-view message is less than that of this replica’s current view, then this message can be ignored. Otherwise, after receiving a new-view message for view for the first time, each replica broadcasts a message to all replicas.

VC-5. Upon receiving consistent view-confirm messages from distinct replicas confirming the new-view message from step VC-4, each replica begins to process requests in the new view.

Explanation: After receiving consistent view-confirm messages, a correct replica can be certain that no other correct replica will process requests in this view with a different starting state.

Details: Consistency in this case means that all messages have identical view-numbers and new-view hashes . The starting state for this view is taken to be that of highest-numbered view with a certificate in any of the view-change messages in the confirmed new-view message, extended with the longest consecutive sequence of requests in any of the same view-change messages containing this view. Putting a replica into this state may require rolling-back some previously-executed requests, and making it necessary to maintain enough information to roll back to the last checkpoint, or in extreme cases to carry out state transfer as in [11]. These view-confirm messages are stored as a view-change certificate.

Iv-C Checkpointing Protocol

Since it is possible that a view-change might require a replica to roll-back some already-executed requests in the latest view. Zyzzyva includes a checkpoint protocol [28, Section 3.1] taken from that of PBFT [11, Section 4.3]; we do not reproduce all of the details, but sketch it here.

CP-1 (sketch). Every requests, each replica broadcasts a checkpoint message containing the current view-certificate, the most recently-executed request number, and a hash of the current state to all replicas.

Explanation: Since a correct replica will include in its view-change messages every request that it has executed, a checkpoint message is a commitment to include all of these requests in future view-change messages.

CP-2 (sketch). After receiving consistent checkpoint messages for the current view, a replica considers the checkpoint to be stable, and discards all order-request messages from before the checkpoint.

Explanation: Once replicas have commit to including a request in their future view-change messages, then it is guaranteed that at least one correct replica from among them will have their view-change message appear in any future successful view-change.

In this sketch we do not include e.g. low- and high-water marks; full details can be found in [11, Section 4.3].

We include arguments for the safety and liveness of SACZyzzyva in Appendix A.

V Performance evaluation

To assess the performance impact of our protocols, we created an experimental setup that runs proof-of-concept implementations of Zyzzyva5 and SACZyzzyva in a fault-free scenario. Note that SACZyzzyva cannot be meaningfully compared with regular Zyzzyva here, as they differ only in the presence of faults; we might induce a fault ourselves, but in this case the performance of Zyzzyva is mainly determined by the client timeout—that is, the time that the client waits before broadcasting a commit certificate when it does not receive responses from all replicas. We therefore use Zyzzyva5 as a baseline for our experiments.

The trusted monotonic counter is implemented in an Intel SGX enclave. All protocols are implemented using the same BFT platform, and so share networking and cryptographic code.

We made our measurements using Amazon EC2 [1] running a single replica per instance, and a separate instance used by the client. Because EC2 does not support SGX, the software was compiled in simulation mode [3]. Separately, on a standalone SGX-enabled machine, we confirmed that measurements in SGX simulation mode are similar to measurements using SGX.

We report medians rather than mean and standard deviation, as the measured latencies are non-normal.

V-a Performance within a single datacenter

In order to test performance on low-latency networks, we carry out measurements on a set of replicas placed within a single EC2 region, Frankfurt. The test setup consists of a cluster of 50 m4.large and m5.large EC2 instances [1].

For each protocol, we measure the time it takes for a transaction to complete, for increasing numbers of replicas, averaged over 50 transaction attempts.

(a) Latency vs tolerated faults within the Frankfurt AWS region.
(b) Latency vs tolerated faults across the internet.
Fig. 2: Latency vs tolerated faults. Each latency is the median of 50 measurements. The number of tolerated faults is varied by modifying the number of replicas— faults are tolerated by replicas for Zyzzyva5, and replicas for SACZyzzyva.

These results are shown in Figure 1(a). SACZyzzyva requires fewer replicas than Zyzzyva5 for a given level of fault-tolerance, and therefore completes requests in less time. While the number of replicas has a significant effect on latency—on average, a marginal increase of 35s/replica (SACZyzzyva) and 37s/replica (Zyzzyva5)—the latency is still relatively small in an absolute sense. We will see in Section V-B that latency is dominated by network delays even with a larger number of replicas.

V-B Performance across the internet

To assess the performance over high-latency networks such as the internet, we measured the performance of SACZyzzyva and Zyzzyva5 using the replicas divided between between three EC2 regions, Ohio, Frankfurt, and Sydney in order to approximate the performance of the protocols when organically deployed across the internet.

In each test region we provision EC2 instances of type m4.large and m5.large—50 in Frankfurt and Ohio, and 42 in Sydney, the maximum number available to us.

As in Section V-A, we measure the response latency at the client as a function of the number of tolerable faults. The results are shown in Figure 1(b). Here latencies are dominated by speed-of-light delays, and increase linearly at rates of 25s/replica (SACZyzzyva) and 8s/replica (Zyzzyva5) respectively.

In this particular geographic configuration, SACZyzzyva and significantly reduce its latency by reducing the number of replies needed: Zyzzyva5 needs responses from four fifths of replicas for requests to complete, but SACZyzzyva requires only two thirds of replicas to respond. This means that SACZyzzyva does not need to wait for responses to arrive across the slow trans-Pacific link as Zyzzyva does. Another surprising effect is that the rate of latency increase per replica is less than when the protocol is run on a low-latency network. We hypothesize that this is because the large network latencies mean that only the processing time of responses from the most distant replicas affects the overall latency.

Vi Optimality in the hybrid fault model

Existing consensus protocols, to tolerate faults, require either parties with a trusted component or of any kind, as shown in Figure 3; SACZyzzyva still requires replicas despite the use of trusted monotonic counters, and it is reasonable to ask whether these trusted components might allow us to obtain a similar protocol that requires some smaller number of nodes.

Fig. 3: The level of fault tolerance achievable according to the total number of nodes and the number of nodes that cannot fail fully-Byzantine. Existing algorithms fall on the boundary of this space, for which the optimum fault tolerance is shown also in the interior.

We show here that this is not the case; specifically, that it is impossible to achieve both safety and liveness without either nodes in total, or nodes with trusted components. This theoretical limit is shown graphically in Figure 3.

Vi-a Failure model

We elaborate on the system model in Section III-B by introducing some new terminology.

Partially-Byzantine failures. A party with a trusted component can be split into two parts, as shown in Figure 4:

  1. An untrusted part, which either behaves correctly or suffers a Byzantine failure.

  2. A trusted part, which communicates via the untrusted part and either behaves correctly or suffers a crash failure.

The result is that failures of a trusted-component-equipped party are partially-Byzantine: though their untrusted component can behave arbitrarily, the trusted component will follow its programming, and thus other parties can remain assured of at least some aspects of the behavior of the party.

Fig. 4: Hybrid model of trusted-component-equipped parties to the consensus protocol. Some parties will contain a trusted component that is immune from Byzantine failure: an attacker can make it crash or interfere with its communications, but cannot access its internal state.

Fully-Byzantine failures. We refer to the failures of a party without a trusted component as fully-Byzantine: there are no restrictions on the behavior such a party in the event of a failure.

Crash failures. In this failure mode, nodes simply crash. We refer to crash and partially-Byzantine failures together as non-fully-Byzantine failures.

More formally, we consider a set of parties executing a protocol , and let some subset be fully-Byzantine in the event of a failure, and its complement be ‘non-fully-Byzantine’.

We allow up to parties to fail according to their respective failure modes: those failed parties that happen to be in in act under the full control of an adversary, whereas those failed parties that are in only give control of their untrusted parts to the adversary.

Vi-B Impossibility result

We describe here our main result; the proof appears in Appendix B.

Theorem 1.

Let be a consensus protocol amongst parties in the partial synchrony model, of which, when they fail, fail fully-Byzantine, and of which, when they fail, either crash or fail partially-Byzantine. Then, to tolerate failures, at least one of the following must be true:

(1)
(2)

Therefore, it is impossible to outdo the usual requirement of replicas without parties having access to some component that cannot fail Byzantine.

Vii Related Work

As SACZyzzyva is motivated by recent blockchain-based distributed systems [39, 10], in this section we review some research work that aims at scalability and efficiency for distributed consensus protocol involving large populations.

Consensus protocols in blockchain scenarios. Fabric [6] and Sawtooth [23] are two recent examples of distributed ledgers, which support the execution of smart contracts. Both use a consensus module to coordinate multiple parties. In particular, Fabric can use a fault-tolerance protocol like such as BFT-SMaRt [8, 41], while Sawtooth is mostly known for its Proof-of-Elapsed-Time protocol, which is vastly more scalable than BFT protocols but provides only eventual consistency. Protocols with message complexity such as SACZyzzyva and CoSi allow for high scalability, as in Sawtooth, but without sacrificing finality. Among other BFT protocols, there are also asynchronous protocols such as Honey Badger [38] and BEAT [18], which do not make any synchrony assumptions. However, this requires relatively expensive primitives such as reliable broadcast and threshold cryptography, and so such protocols are less efficient.

Byzcoin [27] is a hybrid Nakamoto/BFT consensus protocol that uses the Bitcoin consensus protocol to select a group of verifiers that is small enough in size to run a traditional BFT algorithm. SACZyzzyva would serve well in this role, as a replacement for the multisignature-based protocol used by [27].

Protocols that reduce replica count. Several research works recognize the importance of tackling the equivocation problem (malicious replicas sending out different conflicting messages to different recipients) in BFT protocols, since this allows the reduction of the replica count to . MinBFT [40, 15] proposes the use of a trusted monotonic counter to tag the messages, making equivocation detectable. Similarly, [4] shows how to implement a weak sequenced broadcast primitive using a TPM. SACZyzzyva’s use of trusted monotonic counters is closely related to MinBFT’s approach. A2M [12] provides an abstraction for attested append only memory. This is used to implement a hardware-based secure log for outgoing messages; while incoming messages are accepted only after the verification of a log attestation. CheapBFT [25] and ReBFT [16] provide a way to reduce further the number of replicas by making of them passive, and activating them only when it is required to handle faults and make progress. SACZyzzyva puts a bridge between the world where all replicas have a trusted component and the world where only some of them have it, ultimately showing a protocol for the heterogeneous setting.

Protocols with low communication complexity. Several protocols have been proposed to reduce the message count. Zyzzyva [28] and variants [21] avoid all-to-all broadcasts by using speculative execution. Chain replication [44] has a low message complexity since replicas are organized on a chain-like communication topology and only use broadcasts in the case of faults. Byzcoin [27] similarly uses a tree-like communication topology, and uses collective signing to aggregate messages from multiple nodes. FastBFT [32] improves on that approach by means of a lightweight TEE-based message aggregation technique. SACZyzzyva belongs to the former category, using speculative execution to reduce the number of messages, but without needing to make the trade-off between fault-tolerance and robust performance as with Zyzzyva.

Lower bounds. BFT protocols suffer from several fundamental limitations. First, it has been shown [29] that asynchrnous protocols require two phases to terminate. Speculative protocols like Zyzzyva or SACZyzzyva are able to terminate in one phase since they make additional assumption (namely, that rollback is possible). Second, BFT protocols typically require a quadratic number of messages to terminate [17]

. The workaround for many protocols is to use cryptographic constructions which can err with positive probability 

[26]. Finally, in [13] it has been shown that achieving non-equivocation is actually insufficient for reducing the number of replicas, and that transferable authentication of messages (e.g., using digital signatures) is additionally necessary. In SACZyzzyva the trusted monotonic counter ensures non-equivocation, with an attestation that is publicly verifiable (and so transferable) with the aid of digital certificates from the hardware manufacturer.

Viii Discussion and Conclusions

By incorporating a trusted monotonic counter into Zyzzyva’s ordering process, we can eliminate its non-speculative fallback without sacrificing fault-tolerance as previous solutions have. This removes one of the main disadvantages of the Zyzzyva family of protocols, namely that without sacrificing fault-tolerance they are unable to perform speculative execution in the presence of even a single fault.

SACZyzzyva achieves the resilience of Zyzzyva5 while reducing the replica count from to . MinZyzzyva uses trusted monotonic counters in every replica, and so in principle we might expect that MinZyzzyva’s non-speculative fallback can be similarly eliminated. This is not entirely straightforward, as we need to ensure that even a faulty replica will disclose the requests that it has seen. We will address this topic in an extended version of this paper.

Our approach does not only apply to Zyzzyva-like protocols. For example, PBFT uses an all-to-all broadcast to provide a canonical ordering of requests; when a trusted monotonic counter is available, this step can be eliminated, as in MinBFT [40, Figure 1], but without requiring a trusted monotonic counter in every replica.

We have also shown that more than two-thirds of replicas must have a trusted component in order to tolerate more than faults. This means that our protocols achieve optimal fault-tolerance, but shows that there is an important part of the design space that remains unexplored.

Ix Acknowledgements

This work is supported in part by the Academy of Finland (grant 309195) and by Intel (ICRI-CARS).

References

  • [1] Amazon EC2. https://aws.amazon.com/ec2/.
  • [2] ARM security technology: Building a secure system using TrustZone technology. White paper, ARM, 2009.
  • [3] Intel Software Guard Extensions SDK for Linux OS: Developer reference. Technical report, 2016.
  • [4] Ittai Abraham, Marcos K Aguilera, and Dahlia Malkhi. Fast asynchronous consensus with optimal resilience. In International Symposium on Distributed Computing, pages 4–19. Springer, 2010.
  • [5] Ittai Abraham, Guy Gueta, Dahlia Malkhi, and Jean-Philippe Martin. Revisiting fast practical Byzantine fault tolerance: Thelma, Velma, and Zelma. arXiv preprint arXiv:1801.10022, 2018.
  • [6] Elli Androulaki, Artem Barger, Vita Bortnikov, Christian Cachin, Konstantinos Christidis, Angelo De Caro, David Enyeart, Christopher Ferris, Gennady Laventman, Yacov Manevich, et al. Hyperledger fabric: a distributed operating system for permissioned blockchains. In Proceedings of the Thirteenth EuroSys Conference, page 30. ACM, 2018.
  • [7] Michael Barborak, Anton Dahbura, and Miroslaw Malek. The consensus problem in fault-tolerant computing. ACM Computing Surveys, 25(2):171–220, 1993.
  • [8] Alysson Bessani, João Sousa, and Eduardo EP Alchieri. State machine replication for the masses with BFT-SMART. In Dependable Systems and Networks (DSN), 2014 44th Annual IEEE/IFIP International Conference on, pages 355–362. IEEE, 2014.
  • [9] Joseph Bonneau, Andrew Miller, Jeremy Clark, Arvind Narayanan, Joshua A Kroll, and Edward W Felten. Sok: Research perspectives and challenges for Bitcoin and cryptocurrencies. In Security and Privacy (SP), 2015 IEEE Symposium on, pages 104–121. IEEE, 2015.
  • [10] Vitalik. Buterin. A next-generation smart contract and decentralized application platform, 2014. https://github.com/ethereum/wiki/wiki/White-Paper.
  • [11] Miguel Castro and Barbara Liskov. Practical Byzantine fault tolerance. In Proceedings of the Third Symposium on Operating Systems Design and Implementation, OSDI ’99, pages 173–186, Berkeley, CA, USA, 1999. USENIX Association.
  • [12] Byung-Gon Chun, Petros Maniatis, Scott Shenker, and John Kubiatowicz. Attested append-only memory: Making adversaries stick to their word. In Proceedings of Twenty-first ACM SIGOPS Symposium on Operating Systems Principles, SOSP ’07, pages 189–204, 2007.
  • [13] Allen Clement, Flavio Junqueira, Aniket Kate, and Rodrigo Rodrigues. On the (limited) power of non-equivocation. In Proceedings of the 2012 ACM symposium on Principles of distributed computing, pages 301–308. ACM, 2012.
  • [14] Allen Clement, Edmund Wong, Lorenzo Alvisi, Mike Dahlin, and Mirco Marchetti. Making Byzantine fault tolerant systems tolerate Byzantine faults. In Proceedings of the 6th USENIX Symposium on Networked Systems Design and Implementation, pages 153–168, 2009.
  • [15] Miguel Correia, Giuliana S Veronese, and Lau Cheuk Lung. Asynchronous Byzantine consensus with processes. In Proceedings of the 2010 ACM symposium on applied computing, pages 475–480. ACM, 2010.
  • [16] Tobias Distler, Christian Cachin, and Rüdiger Kapitza. Resource-efficient Byzantine fault tolerance. IEEE Transactions on Computers, 65(9):2807–2819, 2016.
  • [17] Danny Dolev and Rüdiger Reischuk. Bounds on information exchange for Byzantine agreement. J. ACM, 32(1):191–204, January 1985.
  • [18] Sisi Duan, Michael K. Reiter, and Haibin Zhang. BEAT: Asynchronous BFT made practical. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, CCS ’18, pages 2028–2041, New York, NY, USA, 2018. ACM.
  • [19] Michael J Fischer, Nancy A Lynch, and Michael S Paterson. Impossibility of distributed consensus with one faulty process. J. ACM, 32(2):374–382, 1985.
  • [20] Trusted Computing Group. Trusted Platform Module specification, 2016. https://trustedcomputinggroup.org/resource/tpm-library-specification/.
  • [21] Rachid Guerraoui, Nikola Knežević, Vivien Quéma, and Marko Vukolić. The next 700 BFT protocols. In Proceedings of the 5th European conference on Computer systems, pages 363–376. ACM, 2010.
  • [22] Guy Golan Gueta, Ittai Abraham, Shelly Grossman, Dahlia Malkhi, Benny Pinkas, Michael K Reiter, Dragos-Adrian Seredinschi, Orr Tamir, and Alin Tomescu. SBFT: a scalable decentralized trust infrastructure for blockchains. arXiv preprint arXiv:1804.01626, 2018.
  • [23] Hyperledger. Sawtooth. www.hyperledger.org/projects/sawtooth.
  • [24] Intel. Software Guard Extensions (Intel SGX) Programming Reference, 2013. https://software.intel.com/sites/default/files/managed/48/88/329298-002.pdf.
  • [25] Rüdiger Kapitza, Johannes Behl, Christian Cachin, Tobias Distler, Simon Kuhnle, Seyed Vahid Mohammadi, Wolfgang Schröder-Preikschat, and Klaus Stengel. CheapBFT: Resource-efficient Byzantine fault tolerance. In Proceedings of the 7th ACM European Conference on Computer Systems, EuroSys ’12, pages 295–308, New York, NY, USA, 2012. ACM.
  • [26] Valerie King and Jared Saia. Breaking the bit barrier: scalable Byzantine agreement with an adaptive adversary. Journal of the ACM (JACM), 58(4):18, 2011.
  • [27] Eleftherios Kokoris Kogias, Philipp Jovanovic, Nicolas Gailly, Ismail Khoffi, Linus Gasser, and Bryan Ford. Enhancing Bitcoin security and performance with strong consistency via collective signing. In 25th USENIX Security Symposium (USENIX Security 16), pages 279–296, Austin, TX, 2016. USENIX Association.
  • [28] Ramakrishna Kotla, Lorenzo Alvisi, Mike Dahlin, Allen Clement, and Edmund Wong. Zyzzyva: Speculative Byzantine fault tolerance. In Proceedings of Twenty-first ACM SIGOPS Symposium on Operating Systems Principles, SOSP ’07, pages 45–58, New York, NY, USA, 2007. ACM.
  • [29] Leslie Lamport. Lower bounds for asynchronous consensus. Distributed Computing, 19(2):104–125, 2006.
  • [30] Leslie Lamport, Robert Shostak, and Marshall Pease. The Byzantine generals problem. ACM Transactions on Programming Languages and Systems (TOPLAS), 4(3):382–401, 1982.
  • [31] Dave Levin, John R Douceur, Jacob R Lorch, and Thomas Moscibroda. TrInc: Small trusted hardware for large distributed systems. In Proceedings of NSDI, volume 9, pages 1–14, 2009. Boston, MA, USA.
  • [32] Jian Liu, Wenting Li, Ghassan O Karame, and N. Asokan. Scalable Byzantine consensus via hardware-assisted secret sharing. IEEE Transactions on Computers, 2018.
  • [33] Nancy A Lynch. Distributed Algorithms. Morgan Kaufmann, 1996.
  • [34] Dahlia Malkhi. Blockchain in the lens of BFT. In USENIX Annual Technical Conference, 2018. Boston, MA, USA.
  • [35] Dahlia Malkhi and Michael Reiter. Byantine quorum systems. Distributed Computing, 11(4):203–213, 1998.
  • [36] J-P Martin and L Alvisi. Fast Byzantine consensus. In Proceedings of the International Conference on Dependable Systems and Networks (DSN), pages 402–411. IEEE, 2005.
  • [37] Sinisa Matetic, Mansoor Ahmed, Kari Kostiainen, Aritra Dhar, David Sommer, Arthur Gervais, Ari Juels, and Srdjan Capkun. ROTE: Rollback protection for trusted execution. IACR Cryptology ePrint Archive, 2017:48, 2017.
  • [38] Andrew Miller, Yu Xia, Kyle Croman, Elaine Shi, and Dawn Song. The honey badger of BFT protocols. In Proceedings of the 23rd ACM Conference on Computer and Communications Security (CCS ’16), 2016.
  • [39] Satoshi Nakamoto. Bitcoin: A peer-to-peer electronic cash system, 2009. http://www.bitcoin.org/bitcoin.pdf.
  • [40] Giuliana Santos Veronese, Miguel Correia, Alysson Neves Bessani, Lau Cheuk Lung, and Paulo Verissimo. Efficient Byzantine fault-tolerance. IEEE Transactions on Computers, 62:16–30, 2013.
  • [41] Joao Sousa, Alysson Bessani, and Marko Vukolić. A Byzantine fault-tolerant ordering service for the Hyperledger Fabric blockchain platform. arXiv:1709.06921, 2017.
  • [42] Raoul Strackx and Frank Piessens. Ariadne: A minimal approach to state continuity. In USENIX Security, volume 16, 2016. Austin, TX, USA.
  • [43] Ewa Syta, Iulia Tamas, Dylan Visher, David Isaac Wolinsky, Philipp Jovanovic, Linus Gasser, Nicolas Gailly, Khoffi, Ismail, and Bryan Ford. Keeping authorities “honest or bust” with decentralized witness cosigning. In 37th IEEE Symposium on Security and Privacy, 2016.
  • [44] Robbert Van Renesse and Fred B Schneider. Chain replication for supporting high throughput and availability. In OSDI, volume 4, pages 91–104, 2004.
  • [45] Paulo Veríssimo. Travelling through wormholes: a new look at distributed systems models. ACM SIGACT News, 37(1):66–81, 2006.
  • [46] Marko Vukolić. The quest for scalable blockchain fabric: Proof-of-work vs. BFT replication. In International Workshop on Open Problems in Network Security, pages 112–125. Springer, 2015.
  • [47] Marko Vukolic. Eventually returning to strong consistency. IEEE Data Eng. Bull., 39(1):39–44, 2016.

Appendix A Correctness

The safety and liveness properties of SACZyzzyva are defined from the point of view of the client: the states of the replicas may diverge, so long as the histories returned with completed requests do not diverge.

We recall that SACZyzzyva uses replicas in order to tolerate faults, with replicas having a trusted monotonic counter. We are therefore guaranteed that any set of replicas will always contain at least one correct replica, and that any two sets of replicas will always contain at least one correct replica in their intersection.

We suppose as well that there exists some well-known mapping from view-numbers to trusted monotonic counter-equipped replicas such that at least one correct replica will be chosen infinitely many times. One suitable mapping is , where the replicas are numbered such that the first replicas possess a trusted monotonic counter.

A-1 Safety

We show that the histories of completed requests can never diverge. A history is a sequence of requests that are executed in turn. We use the notation to indicate that is a non-strict prefix of .

Lemma 1 (Consistency of the initial view state).

Let and be the request histories and and the trusted-monotonic-counter public keys held any two correct replicas that execute any two requests in view ; these requests may or may not be distinct. Then, , and and are identical with respect to requests prior to view .

Proof.

If corresponds to the first view, then the history of prior views is empty in both cases, and the public keys form part of the initial state, and so the lemma is trivially true.

Otherwise, each correct replica executing a request in view must have received at least view-confirm message from distinct replicas for view containing the same hashed new-view message (Step VC-5). The such messages received by each replica must have in common at least one correct sender. Each correct replica produces only a single view-confirm message—Step VC-4—so the consistent set of view-confirm messages must confirm the same new-view, and thus both replicas accept the same public keys and history as the initial state of the view. ∎

Lemma 2 (Histories of completed requests do not diverge within a single view).

Let requests and complete in view , and let and be the request histories of any two correct replicas immediately after they executing and respectively. Then, one history is a prefix of the other—that is, either or .

Proof.

A correct replica responds only after having received messages from the primary with sequential ordering certificates for every in its history of this view (Step R-2). As can be obtained only by TC.Increment and includes , for any there is at most one request for each such that any replica has received , and therefore the histories of all correct replicas within view are identical except for partial truncation of a common suffix. Hence, the history of any correct replica is either prefixed by or a prefix of any other. ∎

Lemma 3 (Completed requests are never omitted from history by a view-change).

Let be the history of all completed requests up to and including view . Then, for all views , a correct replica executing a request in view includes in its history.

Proof.

Let have primary . By Lemma 1, all correct replicas executing requests in view will have identical histories for views prior to .

We proceed by strong induction to show that this history is prefixed by .

Base case. Let . For a correct replica to respond to a request in view , it must receive a new-view message containing view-change messages from distinct replicas. At least one of these view-change messages must be from a replica that is correct and has executed the last—and hence all prior—completed requests in . Therefore will be a prefix of the history that this replica computes for view , and so will be in the history of any correct replica that begins executing requests in view .

Inductive case. Let the supposition hold for all such that .

From step VC-5, the history of view as confirmed by any correct replica is prefixed by the history of the most recent view for which a view-change certificate—or a checkpoint-certificate, which contains the corresponding view-change certificate—is available in one of the view-change messages being confirmed, along with all subsequent requests in view for which an order-request message is available in one of the same view-change messages.

We will always have that , as at least one of the view-change messages must be from a correct node that executed , and therefore has a view-change certificate for view .

If , then the result is trivial: any set of view-change messages in a valid new-view will include one from a correct node that executed the final request in view , and therefore a view-change certificate for view and the order-request messages for and its predecessors are included.

If , then by supposition and its history are a prefix of the history of , whose history is itself a prefix of the history of , which is what we wanted. ∎

The history of any correct replica that executes a request in view is prefixed the computed history of view , and is therefore prefixed by . ∎

Theorem 2 (Safety).

Let requests and complete with histories and at any two replicas that have just executed requests and respectively. Then, one history is a prefix of the other—that is, either or .

Proof.

Suppose and complete in views and respectively. If , then the theorem follows trivially from Lemmas 2—for the part of the history in —and 1—for the history of earlier views.

Otherwise, suppose without loss of generality that . Then, by Lemma 3, the history of completed requests up to view is a prefix of . Since completes in view , we therefore have that . ∎

A-2 Liveness

We show that a request by a correct client eventually completes. We say a view is stable if the primary is correct and enough time has passed that network delays are less than the timeout period of the protocol. The proof follows similarly to that of [32].

Lemma 4.

During a stable view, a request by a correct client will complete.

Proof.

Since the primary is correct, a valid order-request message will be sent to all replicas. Since the network is in a period of synchrony, the request will eventually complete, the client receiving at least replies. ∎

Lemma 5.

For an unstable view , either all requests will complete, or the view will eventually change to a stable one.

Proof.

Suppose a client makes a request during an unstable view. Then, two things may happen: the primary provides a consistent ordering to replicas that respond to the client before the client times out, in which case the request completes, or it does not.

Suppose the client times out. Then, then all correct replicas will eventually receive the request directly from the client (step C-2B), and those that have not already replied to the client will forward it to the primary (step R-3), setting a timeout. If no correct replicas receive the corresponding order-request, then all of them will request a view change, leading to all correct replicas initiating a view-change. Otherwise, if at least one correct replica receives the corresponding order-request, then it will receive the requests forwarded by the other replicas in step R-3, and respond with the order-request. Thus all correct replicas will eventually receive the order-request and respond to the client if they have not already begun a view-change.

Therefore, either the request completes or all correct replicas eventually begin a view-change.

If any correct replica commits to a view change, then there are three possible outcomes:

  1. All correct replicas change to a stable view.

  2. All correct replicas change to an unstable view: the client resends its request, which either completes or results in a further view-change (as above).

  3. At least one correct replica does not change view: if any correct replica commits to a view-change, eventually so will all others. If at least correct replicas do not receive confirmation of the new view before timing out, then a further view-change will occur. Otherwise, when the client resends its request, it will either complete or result in a further view-change.

This cycle can repeat itself until the protocol reaches a period of synchrony; at this point, view-changes will continue to occur until either the faulty replicas allow the client’s requests to complete, or a correct replica becomes primary. ∎

Theorem 3 (Liveness).

All valid requests by a correct client will eventually complete.

Proof.

We proceed by exhaustion. Suppose the view is stable. Then, by Lemma 4, a request will eventually complete.

Now suppose the view is not stable. Then, by Lemma 5, the view will eventually become stable. If the request completes before this occurs, then we are done. Otherwise, because the client retries its request continuously, the request will eventually arrive during a stable view, at which point by Lemma 4 it will complete. ∎

Appendix B Proof of optimality of SACZyzzyva

B-a Quorum properties

We will proceed by a quorum-intersection argument, deriving some properties of the quora of a consensus protocol, and then finding the conditions under which they conflict. However, we must re-examine this approach with the knowledge that some nodes may only be partially-Byzantine. For the avoidance of doubt, when we refer to an execution of a protocol by a set of parties, this means that the correct parties execute the protocol correctly, while other parties can behave arbitrarily within the constraints of their failure model.

Definition 1 (Quorum).

A set of parties is a quorum for a consensus protocol if, for any proposition by a proposer , there exists some execution of by in which no correct party receives any messages from parties outside , and some correct party outputs after time at most .

Note that this definition does not require that the status of a quorum be a determined by a simple threshold on the number of parties. In the case of PBFT, any set of parties is a quorum, but some protocol might conceivably give greater weight to nodes with trusted components, or nodes that are known to have a lower probability of failure.

A subtle point here is that for a set to be a quorum, it is required only that there exists an execution of that leads to an output in time at most ; for example, we might obtain some bound by simply observing the consensus protocol in normal operation without introducing any adversarial delays. This differs from the case of Byzantine quorum systems [35], where the set of quora is a design parameter of the protocol.

The result is that, where the network model allows us to delay messages by time , we can delay messages between other nodes and some quorum, and there will be some valid protocol execution that results in the correct parties producing an output. We use this to show that the quora of any consensus protocol with safety must have at least one non-fully-Byzantine node in their intersection, mirroring the D-Consistency property of a dissemination quorum system in  [35, Definition 5.1].

Lemma 6 (Quorum intersections cannot be fully-Byzantine).

Let and be quora of a consensus protocol in the weak-synchrony model. Then, contains at least one non-fully-Byzantine node.

Proof.

By the safety of , if any two correct parties to output values and respectively, then . We show that if contains no non-fully-Byzantine nodes, then it is possible to force two correct parties to output distinct values.

Let us define the sets , , and , and consider three possible runs:

Run 1. Messages between and are delayed for time . Let some propose the value . By the definition of a quorum, there is at least one protocol execution where a correct party in outputs .

Run 2. Messages between and are delayed for time . Let some propose the value . By the definition of a quorum, there is at least one protocol execution where a correct party in outputs .

Run 3. Now suppose messages between and are delayed for time . Let some propose the value , and some propose the value , . Suppose that contains no non-fully-Byzantine nodes; then, we can have them behave arbitrarily. In this case, we have the nodes in behave as in Run 1 with respect to the nodes in , and as in Run 2 with respect to the nodes in . As the correct replicas in and cannot distinguish Run 3 from Runs 1 and 2 respectively, then there is a protocol execution in which at least one correct node in each quorum will output the distinct values and respectively, thereby violating the assumption that the protocol is safe.

Hence, must contain at least one non-fully-Byzantine node. ∎

The previous lemma gave a necessary—but not sufficient—condition for safety in terms of quora. Now, we do the same for liveness, mirroring the D-Availability property from [35, Definition 5.1].

Lemma 7 (Sufficiently large sets must contain a quorum).

Let be a subset of parties to a consensus protocol tolerating crash failures. Then, if , is a quorum for .

Proof.

By the liveness of , if a message is correctly proposed and the parties all crash, then all correct parties will eventually output some value. By the safety of , the value that they output is . Therefore, is a quorum. ∎

With these, we may prove Theorem 1. See 1

Proof.

We show that if neither condition holds, then if the protocol has liveness, it is not safe for at least one allocation of failures.

Consider arbitrary and . We proceed by contradiction. Suppose has liveness, and so Lemma 7 holds, Then, we seek some allocation of failures such that two quora and have only fully-Byzantine failures in their intersection.

Let

Because the numbering of the replicas is arbitrary, let us suppose that parties are subject to fully-Byzantine failure, as shown in Figure 5.

Fig. 5: Constructed quora used in the proof of Theorem 1, and the failure mode of each party. If the entire intersection can fail fully-Byzantine, then the protocol is unsafe.

Both and have cardinality , so by Lemma 7, they are both quora.

Now, . Thus, we can make the entire intersection faulty. For to be safe—and thus Lemma 6 to hold—this intersection must always contain at least one party that does not fail fully-Byzantine. But, this is not the case: , hence

and

Since , this implies .

We therefore have two quora that are not guaranteed to have a non-fully-Byzantine node in their intersection; this contradicts Lemma 6, and thus cannot have both liveness and safety if both and . ∎