As blockchain is a popular abstraction to handle valuable assets, it has become one of the cornerstone of promising solutions for building critical applications without requiring trust. Unfortunately, after a decade of research in the space, blockchain still appears in its infancy, unable to offer the guarantees that are needed by the industry to automate critical applications in production. The crux of the problem is the difficulty of having remote computers agree on a unique block at a given index of the chain when some of them are malicious. The first blockchains 
allow disagreements on the block at an index of the chain but try to recover from them before assets get stolen through double spending: With disagreement, an asset owner could be fooled when she observes that she received the asset. Instead the existence of a conflicting block within a different branch of the chain may indicate that the asset belongs to a different user who can re-spend it. This is probably why most blockchains now build upon some form of Byzantine consensus solutions[23, 12, 13].
Solving the Byzantine consensus problem, defined 39 years ago , is needed to guarantee that machines agree on a common block at each index of the chain. The consensus was recently shown to be necessary in the general scenario where conflicting transactions might be requested from distributed servers . Various solutions to the consensus problem were proposed in the last four decades [16, 35, 47, 36, 6, 22, 38]. Most of these algorithms were proved correct “by hand”, often listing a series of Lemmas and Theorems in prose leading the reader to the conclusion that the algorithm solves agreement, validity and termination in all possible distributed executions. In the worst case, these algorithms are simply described with text on blog post [30, 38]. In the best case, a mathematical specification is offered, like in TLA+, but without machine-checked proofs . Unfortunately, such a formal specification that is not machine-checked remains error prone .
As far we know, no Byzantine fault tolerant consensus algorithms used in blockchains have ever been certified or verified automatically with the help of a program that produces an output to ascertain the specification correctness. We do not claim that certifying blockchain consensus guarantees correct executions, as they are other tools necessary to execute it that may be incorrect. We believe instead that certifying blockchain consensus greatly reduces errors by forcing the distributed algorithm designer to write an automaton sufficiently disambiguated to be systematically evaluated with tools designed by verification experts. While some consensus algorithms have been automatically proved correct [37, 8], these algorithms are mainly state-of-the-art algorithms. They do not necessarily offer the practical properties suitable for blockchains as they are not to be implemented in blockchains.
In this paper, we first survey important problems that recently affected blockchain consensus. In particular, we propose two new counter examples explaining why the Casper FFG algorithm, which should be integrated in phase 0 of Ethereum 2.0 and the HoneyBadger, which is being integrated into one of the most popular blockchain software, called parity, may not terminate. We also list four additional counter examples from the literature to illustrate the amplitude of the problem for blockchains. While there exist alternative solutions to some of these problems that could be implemented it does not prevent other problems from existing. Moreover, proving “by hand” that the fixes solve the bugs may be found unconvincing, knowing that these bugs went unnoticed when the algorithms were proven correct, also “by hand”, in the first place.
We then build upon modern tools and equipments at our disposal to develop certified proofs of blockchain consensus components that do not assume synchrony under the assumption that processes are Byzantine (or faulty) among processes. In particular, we explain how the Byzantine model checker ByMC  can be used by distributed computing scientists to certify the proofs of blockchain consensus components without a deep expertise in formal verification. The idea is to convert the distributed algorithm into a threshold automaton  that represents a state as a group of all the states in which a correct (or non-faulty) process resides until this process receives sufficiently many messages to transition. We offer the threshold automaton specification of a Byzantine fault tolerant broadcast primitive that is key to few blockchains [40, 22, 20]. Finally, we also offer the threshold automaton specification of a variant of the Byzantine consensus of Red Belly Blockchain  that we prove safe and live under the round-rigidity assumption  that helps modeling a fair scheduler , hence allowing other distributed computing scientists to reproduce the certified proofs with this publicly available model checker.
Various specification languages (e.g., [51, 39]) were proposed for distributed algorithms before threshold automata, but they did not allow the simplification needed to model check algorithms as complex as the Byzantine consensus algorithms needed in blockchain. As an example, in Input/Output Automata , the number of specified states accessible by an asynchronous algorithm before the threshold is reached could be proportional to the number of permutations of message receptions. Executing the automated verification of an invariant could require a computation proportional to the number of these permutations. More dramatically, the Byzantine fault model typically allows some processes to send arbitrarily formed and arbitrarily many messages—making the number of states to explore potentially infinite. As a result, this is only with the recent definition of threshold automata reducing this state space that we were able to certify our blockchain consensus components.
The remainder of the paper is organized as follows. Section 2 presents new and existing problems affecting known blockchain Byzantine consensus. In Section 3, we explain how we certified a Byzantine fault tolerant broadcast abstraction common to multiple blockchains. Section 4 presents the related work and Section 5 discusses our certifications and concludes the paper. In Appendix A, we list the pseudocode, specification and certification experiments of a variant of the Byzantine consensus used in Red Belly Blockchain.
2 The Problem of Proving Blockchain Consensus Algorithms by Hand
In this section, we illustrate the risk of trying to prove blockchain consensus algorithms by hand by describing a list of safety and liveness limitations affecting the Byzantine fault tolerant algorithms implemented in actual blockchain systems. These limitations, depicted in Table 1, are not necessarily errors in the proofs but stem from the ambiguous descriptions in prose rather than formal statements and the lack of machine-checked proofs. As far as we know, until now no Byzantine fault tolerant consensus algorithms used in a blockchain had been certified correct.
|Randomized consensus||||liveness||[new]||||HoneyBadger |
|Casper||||liveness||[new]||||Ethereum v2.0 |
|Ripple consensus||||safety||||||xRapid |
|Tendermint consensus||||safety||||||Tendermint |
2.1 The HoneyBadger and its randomized binary consensus
HoneyBadger  builds upon the combination of three algorithms from the literature to solve the Byzantine consensus with high probability in an asynchronous model. This protocol is being integrated in one of the most popular blockchain software, called Ethereum parity.111https://forum.poa.network/t/posdao-white-paper/2208.
First it uses a classic reduction from the problem of multi-value Byzantine consensus to the problem of binary Byzantine consensus working in the asynchronous model. Second, it reuses a randomized Byzantine binary consensus algorithm  that
aims at terminating in expected constant time by using a common coin that returns the same unpredictable value at every process. Third, it uses a common coin implemented with a threshold signature scheme  that requires the participation of correct processes to return a value.
Randomized binary consensus. In each asynchronous round of this randomized consensus , the processes “binary value broadcast”—or “BV-broadcast” for short—their input binary value. The binary value broadcast (detailed later in Section 3.1) simply consists of broadcasting (including to oneself) a value, then rebroadcasting (or echoing) any value received from distinct processes and finally bv-delivering any value received from distinct processes. These delivered values are then broadcast to the other processes and all correct processes record, into the set , the values received from distinct processes that are among the ones previously delivered. For any correct process , if happens to contain only the value returned by the common coin then decides this value, if contains only the other binary value , then
sets its estimate to this value and ifcontains two values, then sets its estimate to . Then moves to the next round until it decides.
The problem is that in practice, as the communication is asynchronous, the common coin cannot return at the exact same time at all processes. In particular, if some correct processes are still at the beginning of their round while the adversary observes the outcome of the common coin for round then the adversary can prevent progress among the correct processes by controlling messages between correct processes and by sending specific values to them. Even if a correct process invokes the common coin before the Byzantine process, then the Byzantine can prevent correct processes from progressing.
Counter example. To illustrate the issue, we consider a simple counter-example with processes and Byzantine process. Let , and be correct processes with input values , respectively, and let be a Byzantine process. The goal is for process to force some correct processes to deliver and another correct process to deliver where is the value returned by the common coin in the current round. As the Byzantine process has control over the network, it prevents from receiving anything before guaranteeing that and deliver . It is easy to see that can force and to bv-deliver so let us see how forces and to deliver . Process sends to so that receives value from both and , and thus echoes . Then sends to . Process then receives value from , and itself, hence echoes and delivers . Similarly, receives value from , and itself, hence delivers . To conclude and deliver . Processes , and invoke the coin and there are two cases to consider depending on the value returned by the coin .
Case : Process receives now 1 from , and itself, so it delivers 1.
Case : This is the most interesting case, as should prevent some correct process, say , from delivering even though is the most represented input value among correct processes. Process sends to and so that both and receive value from and and thus both echo . Due to ’s echo, receives s and delivers .
At least two correct processes obtain and another correct process can obtain . It follows that the correct processes with adopt as their new estimate while the correct process with takes as its new estimate and no progress can be made within this round. Finally, if the adversary (controlling in this example) keeps this strategy, then it will produce an infinite execution without termination.
Alternative and counter-measure. The problem would be fixed if we could ensure that the common coin always return at the correct processes before returning at a Byzantine process, however, we cannot distinguish a correct process from a Byzantine process that acted correctly. We are thankful to the authors of the randomized algorithm for confirming our counter-example, they also wrote a remark in  indicating that both a fair scheduler and a perfect common coin were actually needed for the consensus of  to converge with high probability, however, no counter example motivating the need for a fair scheduler was proposed. The intuition behind the fair scheduler is that it requires to have the same probability of receiving messages in any order  and thus limits the power of the adversary on the network. A new algorithm  does not suffer from the same problem and offers the same asymptotic complexity in message and time as  but requires more communication steps, it could be used as an alternative randomized consensus in HoneyBadger to cope with this issue.
2.2 The Ethereum blockchain and its upcoming Casper consensus
Casper [52, 13] is an alternative to the existing longest branch technique to agree on a common block within Ethereum.
It is well-known that Ethereum can experience disagreement when different processes receive distinct blocks for the same index. These disagreements are typically resolved by waiting until the longest branch is unanimously identified. Casper aims at solving this issue by offering consensus.
The Casper FFG consensus algorithm.
The FFG variant of Casper is intended to be integrated to Ethereum v2.0 during phase 0 .
It is claimed to ensure finality , a property that may seem, at first glance, to result from the termination of consensus. The model of Casper assumes authentication, synchrony
and that strictly less than stake is owned by Byzantine processes.
Casper builds a “blockchain tree” consisting of a partially ordered set of blocks. The genesis block as well blocks at indices multiple of 100 are called checkpoints.
Validator processes vote for a link between checkpoints of a common branch and a checkpoint is
justified if it is the initial, so-called genesis, block or there is a link from a justified checkpoint pointing to it voted by a supermajority of validators.
Note first that Casper executes speculatively and that there is not a single consensus instance per level of the Casper blockchain tree. Each time an agreement attempt at some level of the tree fails due to the lack of votes for the same checkpoint, the height of the tree grows.
Unfortunately, it has been observed that nothing guarantees the termination of Casper FFG  and we present below an example of infinite execution.
Counter example. To illustrate why the consensus does not terminate in this model, let be the level of the highest block that is justified.
Validators try to agree on a block at level () by trying to gather votes for the same block at level (or more precisely the same link from level to ). This may fail if, for example, validators vote for one of three distinct blocks at this level .
Upon failure to reach consensus at level , the correct validators, who have voted for some link from height to and are incentivised to abstain from voting on another link from to , can now try to agree on a block at level (), but again no termination is guaranteed.
The same steps (1) and (2) may repeat infinitely often. Note that plausible liveness [13, Theorem 2] is still fulfilled in that the supermajority ‘can’ always be produced as long as you have infinite memory, but no such supermajority link is ever produced in this infinite execution.
Alternative and counter-measure. Another version of Casper, called CBC, has also been proposed . It is claimed to be “correct by construction”, hence the name CBC. This could potentially be used as a replacement to FFG Casper for Ethereum v2.0 even in phase 0 for applications that require consensus, and thus termination.
2.3 Known problems in blockchain Byzantine consensus algorithms
To show that our two counter examples presented above are not isolated cases in the context of blockchains, we also
list below four counter examples from the literature that were reported by colleagues and affect
the Ripple consensus algorithm, Tendermint and Zyzzyva. This adds to the severity of the problem of proving algorithm by hand before using them in critical applications like blockchains.
The XRP ledger and the quorums of the Ripple consensus. The Ripple consensus  is a consensus algorithm originally intended to be used in the blockchain system developed by the company Ripple. The algorithm is presented at a high level as an algorithm that uses unique node lists as a set of quorums or mutually intersecting sets that each individual process must contact to guarantee that its request will be stored by the system or that it can retrieve consistent information about asset ownership. The original but deprecated white paper  assumed that quorums overlap by about 20%.
Later, some researchers published an article  indicating that the algorithm was inconsistent and
listing the environmental conditions under which consensus would not be solved and its safety would be violated.
They offered a fix in order to remedy this inconsistency through the use of different assumptions, requiring that quorums overlap by stricly more than 40%.
Finally, the Ripple consensus algorithm has been replaced by the XRP ledger consensus protocol  called ABC-Censorship-Resilience under synchrony in part to fix this problem.
The Tendermint blockchain and its locking variant to PBFT. Tendermint  has similar phases as PBFT  and works with asynchronous rounds . In each round, processes propose values in turn (phase 1), the proposed value is prevoted (phase 2), precommitted when prevoted by sufficiently many222‘Sufficiently many’ processes stands for at least among processes. processes (phase 3) and decided when precommitted by sufficiently many processes. To progress despite failures, processes stay in a phase only for up to a timeout period. A difference with PBFT is that a correct process produces a proof-of-lock of at round if it precommits at round . A correct process can only prevote if it did not precommit a conflicting value .
As we restate here, there exists a counter-example  that illustrates the safety issue
with four processes and among which is Byzantine that propose in the round of their index number.
In the first round, correct processes prevote , and lock in this round and precommit it,
decides while and do not decide, before becomes slow.
In the second round, process informs that it prevotes so that
prevotes, precommits and locks in round 2.
In the third round, proposes locked in round 2, forcing to unlock
and in the fourth round, forces to unlock in a similar way.
Finally, does not propose anything and proposes another value that gets decided by all. It follows that correct processes and decide differently, which violates agreement. Since this discovery, Tendermint kept evolving and the authors of the counter example acknowledged that some of the issues they reported were fixed , the authors also informed us that they notified the developers but ignore whether this particular safety issue has been fixed.
Zyzzyva and the SBFT concurrent fast and regular paths. Zyzzyva  is a Byzantine consensus that requires view-change and combines a fast path where a client can learn the outcome of the consensus in 3 message delays and a regular path where the client needs to collect a commit-certificate with responses where is the actual number of Byzantine faults. The same optimization is currently implemented in the SBFT permissioned blockchain  to speed up termination when all participants are correct and the communication is synchronous.
There exist counter-examples  that illustrate how the safety property of Zyzzyva can be violated. The idea of one counter-example consists of creating a commit-certificate for a value , then experiencing a first view-change (due to delayed messages) and deciding another value for a given index before finally experiencing a second view-change that leads to undoing the former decision but instead deciding at the same index. SBFT is likely to be immune to this issue as the counter example was identified by some of the authors of SBFT. But a simple way to cope with this issue is to prevent the two paths from running concurrently as in the simpler variant of Zyzzyva called Azyzzva .
The Quorum blockchain and its IBFT consensus. IBFT  is a Byzantine fault tolerant consensus algorithm at the heart of the Quorum blockchain designed by Morgan Stanley. It is similar to PBFT  except that is offers a simplified version of the PBFT view-change by getting rid of new-view messages. It aims at solving consensus under partial synchrony. The protocol assumes that no more than processes—usually referred by IBFT as “validators”—are Byzantine.
As reported in , IBFT does not terminate in a partially synchronous network even when failures are crashes.
More precisely IBFT cannot guarantee that if at least one honest validator is eventually able to produce a valid finalized block then the transaction it contains will eventually be added to the local transaction ledger of any other correct process.
IBFT v2.x  fixes this problem but requires a transaction to be submitted to all correct validators for this transaction to be eventually included in the distributed permissioned transaction ledger. The proof was made by hand and we are not aware of any automated proof of this protocol as of today.
3 Methodology for Certifying Blockchain Components
In this section, we explain how we certified the binary value broadcast blockchain component using the Byzantine model checker without being experts in verification.333Although we are not experts in verification, we are thankful to verification experts Igor Konnov and Josef Widder for discussions on the syntax of threshold automata and for confirming that our consensus agreement property was certified by ByMC when our initial runs were taking longer than expected. Then we explain how this helped us certify the correctness of a variant of the binary consensus of DBFT used in Red Belly Blockchain.
3.1 Preliminaries on ByMC and BV-broadcast
Byzantine model checker.
Fault tolerant distributed algorithms, like the Byzantine fault tolerant broadcast primitive presented below, are often based on parameters, like the number of processes, the maximum number of Byzantine faults or the number of Byzantine faults .
Threshold-guarded algorithms [33, 32] use these parameters to define threshold-based guard conditions that enable transitions to different states.
Once a correct process receives a number of messages that reaches the threshold, it progresses by taking some transition to a new state.
To circumvent the undecidability of model checking on infinite systems, Konnov, Schmid, Veith and Widder introduce two parametric interval abstractions  that model
(i) each process with a finite-state machine independent of the parameters and (ii) the whole system with abstract counters that quantify the number of processes in each state in order to obtain a finite-state system.
Finally, they group a potentially infinite number of runs into an execution schema in order to allow bounded model checking, based on an SMT solver, over all the possible execution schemas .
ByMC  verifies threshold automata with this model checking and has been used to prove various distributed algorithms, like atomic commit or reliable broadcast. Given a set of safety and liveness properties, it outputs traces certifying that the properties are satisfied in all the reachable states of the threshold automaton.
Until 2018, correctness properties were only verified on one round but more recently the threshold automata framework was extended to randomized algorithms, making possible to verify algorithms such as Ben-Or’s randomized consensus under round-rigid adversaries .
Binary value broadcast. The binary value broadcast , also denoted BV-broadcast, is a Byzantine fault tolerant communication abstraction used in blockchains [40, 23] that works in an asynchronous network with reliable channels where the maximum number of Byzantine failures is . The BV-broadcast guarantees that no values broadcast exclusively by Byzantine processes can be delivered by correct processes. This helps limiting the power of the adversary to make sure that a Byzantine consensus algorithm converges towards a value. In particular, by requiring that all correct processes BV-broadcast their proposals, one can guarantee that all correct processes will eventually observe their proposals, regardless of the values proposed by Byzantine processes. The binary value broadcast finds applications in blockchains: First, it is implemented in HoneyBadger  to detect that correct processes have proposed diverging values in order to toss a common coin, that returns the same result across distributed correct processes, to make them converge to a common decision. Second, Red Belly Blockchain  and the accountable blockchain that derives from it  implement the BV-broadcast to detect whether the protocol can converge towards the parity of the round number by simply checking that it corresponds to one of the values that were “bv-delivered”.
The BV-broadcast abstraction satisfies the four following properties:
BV-Obligation. If at least correct processes BV-broadcast the same value , is eventually added to the set of each correct process .
BV-Justification. If is correct and , has been BV-broadcast by some correct process. (Identification following from receiving more than 0s or 1s.)
BV-Uniformity. If a value is added to the set of a correct process , eventually at every correct process .
BV-Termination. Eventually the set of each correct process is not empty.
3.2 Automated verification of a blockchain Byzantine broadcast
In this section, we describe how we used threshold automaton to specify the binary value broadcast algorithm and ByMC in order to verify the protocol automatically. We recall the BV-broadcast algorithm as depicted in Algorithm 1. The algorithm consists of having at least correct processes broadcasting a binary value. Once a correct process receives a value from distinct processes, it broadcasts it if it did not do it already. Once a correct process receives a value from distinct processes, it delivers it. Here the delivery is modeled by adding the value to the set , which will simplify the description of our variant of DBFT binary consensus in Appendix A.
Specifying the distributed algorithm in a threshold automaton. Let us describe how we specify Algorithm 1 as a threshold automaton depicted in Figure 1. Each state of the automaton or node in the corresponding graph represents a local state of a process. A process can move from one state to another thanks to an edge, called a rule. A rule has the form , where is a guard and an action on the shared variables. When the guard evaluates to true (e.g., more than messages of a certain type have been sent), the action is executed (e.g., the shared variable is incremented).
In Algorithm 1, we can see that only two types of messages are exchanged: process can only send either or . Each time a value is sent by a correct process, it is actually broadcast to all processes. Thus, we only need two shared variables and corresponding to the value and in the automaton (cf. Figure 1). Incrementing is equivalent to broadcasting . Initially, each correct process immediately broadcasts its value. This is why the guard for the first rule is : a process in can immediately move to and send during the transition.
We then enter the repeat loop of the pseudocode. The two if statements are easily understandable as threshold guards. If more than messages with value are received, then the process should broadcast (i.e., incremenent ) since it has not already been done. Interestingly, the corresponding guard is . Indeed, the shared variable only counts the messages sent by correct processes. However, the
faulty processes might send messages with arbitrary values. We want to consider all the possible executions, so the earliest moment a correct process can move fromto is when the faulty processes and correct processes have sent . The other edge leaving corresponds to the second if statement, that is satisfied when messages with value have been received. In state , the value has been delivered. A process might stay in this state forever, so we add a self-loop with guard condition set to true.
After the state , a process is still able to broadcast and eventually deliver after that. After the state , a process is able to deliver and then deliver , or deliver first and then deliver , depending on the order in which the guards are satisfied.
Apart from the self-loops, we remark that the automaton is a directed acyclic graph. On every path of the graph, we can verify that a shared variable is incremented only once. This is because in the pseudocode, a value can be broadcast only if it has not been broadcast before.
Finally, the states of the automaton correspond to the following (unique) situations for a correct process:
locV0. Initial state with value , nothing has been broadcast nor delivered
locV1. Initial state with value , nothing has been broadcast nor delivered
locB0. Only has been broadcast, nothing has been delivered
locB1. Only has been broadcast, nothing has been delivered
locB01. Both and have been broadcast, nothing has been delivered
locC0. Only has been broadcast, only has been delivered
locCB0. Both and have been broadcast, only has been delivered
locC1. Only has been broadcast, only has been delivered
locCB1. Both and have been broadcast, only has been delivered
locC01. Both and have been broadcast, both and have been delivered
Once the pseudocode is converted into a threshold automaton depicted in Figure 1, one can simply write the corresponding specification in the threshold automata language to obtain the specification listed below (Figure 2) for completeness.
Defining the correctness properties and fairness assumptions. The above automaton is only the first half of the verification work. The second half consists in specifying the correctness properties that we would like to verify on the algorithm. We use temporal logic on the algorithm variables (number of processes in each location, number of messages sent and parameters) to formalize the properties. In the case of the BV-broadcast, the BV-Justification property of the BV-broadcast is: “If is correct and , has been BV-broadcast by some correct process”. Given , and with the LTL semantics of ‘eventually’, ‘implies’ and ‘or’, respectively, we translate this property in two specifications:
Liveness properties are longer to specify, because we need to take into account some fairness constraints. Indeed, a threshold automaton describes processes evolving in an asynchronous setting without additional assumptions. An execution in which a process stays in a state forever is a valid execution, but it does not make any progress. If we want to verify some liveness properties, we have to add some assumptions in the specification. For instance, we can require that processes eventually leave the states of the automaton as long as they have received enough messages to enable the condition guarding the outgoing rule. In other words, a liveness property will be specified as:
Note that this assumption is natural and differs from the round rigidity assumption that requires the adversary to eventually take any applicable transition of an infinite execution.
Finally, we wrote a threshold automaton specification whose .ta file is presented in Figure 2 in only 116 lines.
Experimental results. On a simple laptop with an Intel Core i5-7200U CPU running at 2.50GHz, verifying all the correctness properties for BV-broadcast takes less than 40 seconds. For simple properties on well-specified algorithms, such as the ones of the benchmarks included with ByMC, the verification time can be less than one second. This result encouraged us to certify a complete Byzantine consensus algorithm that builds upon the binary-value broadcast.
Debugging the manual conversion of the algorithm to the automaton. It is common that the specification does not hold at first try, because of some mistakes in the threshold automaton model or in the translation of the correctness property into a formal specification. In such cases, ByMC provides a detailed output and a counter-example showing where the property has been violated. We reproduced such a counter-example in Figure 3 with an older preliminary version of our specification. This specification was wrong because a liveness property did not hold. ByMC gave parameters and provided an execution ending with a loop, such that the condition of the liveness was never met. This trace helped us understand the problem in our specification and allowed us to fix it to obtain the correct specification we illustrated before in Figure 2. Building upon this successful result, we specified a more complex Byzantine consensus algorithm that uses the same broadcast abstraction but we did not encounter any bug during this process and our first specification was proved correct by ByMC. By lack of space we defer its pseudocode, threshold automaton specification and experimental results in Appendix A.
4 Related Work
The observations that some of the blockchain consensus proposals have issues is not new [28, 15]. It is now well known that the termination of existing blockchain like Ethereum requires an additional assumption like synchrony . Our Ethereum counter-example differs as it considers the upcoming consensus algorithm of Ethereum v2.0. In , the conclusions are different from ours as they generalize on other Byzantine consensus proposals, like Tangaroa, not necessarily in use in blockchain systems. Our focus is on consensus used in blockchains that are trading valuable assets because these are critical applications.
Threshold automata already proved helpful to automate the proof of existing consensus algorithms . They have even been useful in illustrating why a specification of the King-Phase algorithm  was incorrect  (due to the strictness of a lower symbol), later fixed in . We did not list this as one of the inconsistency problems that affects blockchains as we are not aware of any blockchain implementation that builds upon the King-Phase algorithm. In , the authors use threshold guarded automata to prove two broadcast primitives and the Bosco Byzantine consensus correct, however, Bosco offers a fast path but requires another consensus algorithm for its fallback path so its correctness depends on the assumption that it relies on a correct consensus algorithm.
In general it is hard to formally prove algorithms that work in a partially synchronous model. Part of the reason is that common partially synchronous solutions attempt to give sufficient time to processes in different asynchronous round by incrementing a timeout until the timeout is sufficiently large to match the unknown message delay bound. PSync is a language that helps reasoning formally about partially synchronous algorithms . Here we used the ByMC model checker for asynchronous systems and require the round-rigidity assumption to show a probabilistic variant of the binary consensus of DBFT .
A framework allows to build certified proofs of distributed algorithms with the proof assistant Coq . The tools developed in this framework are for the termination of self-stabilizing algorithms. It is unclear how it can be easily applied to complex algorithms like Byzantine consensus algorithms. Another model for distributed algorithms has been encoded in the interactive proof assistant Isabelle/HOL, and used to verify several consensus algorithms .
In , the authors present TLC, a model checker for debugging a finite-state model of a TLA+ specification. TLA+ is a specification language for concurrent and reactive systems that builds upon the temporal logic TLA. One limitation is that the TLA+ specification might comprise an infinite set of states for which the model checker can only give a partial proof. In order to run the TLC model checker on a TLA+ specification, it is necessary to fix the parameters such as the number of processes or the bounds on integer values. In practice, the complexity of model checking explodes rapidly and makes it difficult to check anything beyond toy examples with a handful of processes. TLC remains useful—in particular in industry—to prove that some specifications are wrong . TLA+ also comes with a proof system called TLAPS. TLAPS supports manually written hierarchically structured proofs, which are then checked by backend engines such as Isabelle, Zenon or SMT solvers . TLAPS is still being actively developed but it is already possible—albeit technical and lengthy—to prove algorithms such as Paxos.
5 Discussion and Conclusion
In this paper, we argued for the certification of blockchain Byzantine fault tolerant algorithms as a way to reduce the numerous issues resulting from non-certified proofs for such critical applications as blockchains. In particular, we illustrated the problem with new counter-examples of algorithms at the core of widely deployed blockchain software.
We show that it is now feasible, for non experts, to certify blockchain Byzantine components on modern machines thanks to the recent advances in formal verification and illustrate it with relatively simple specifications of a broadcast abstraction common to multiple blockchains as well as a variant of the Byzantine consensus algorithm of Red Belly Blockchain.
To certify the Byzantine consensus, we assumed a round rigid adversary that schedules transitions in a fair way. This is not new as in  the model checking of the randomized algorithm from Ben-Or required a round-rigid adversary. Interestingly, we do not need this assumption to certify the binary value broadcast abstraction that works in an asynchronous model.
As future work, we would like to prove other Byzantine fault tolerant algorithmic components of blockchain systems.
We wish to thank Igor Konnov and Josef Widder for helping us understand the syntax and semantics of the threshold automata specification language and for confirming that ByMC verified the agreement1 property of our initial specification. We thank Tyler Crain, Achour Mostéfaoui and Michel Raynal for discussions of the HoneyBadger counter-example, and Yackolley Amoussou-Guenou, Maria Potop-Butucaru and Sara Tucci for discussions on the Tendermint counter-example. This research is supported under Australian Research Council Discovery Projects funding scheme (project number 180104030) entitled “Taipan: A Blockchain with Democratic Consensus and Validated Contracts” and Australian Research Council Future Fellowship funding scheme (project number 180100496) entitled “The Red Belly Blockchain: A Scalable Blockchain for Internet of Things”.
-  I. Abraham, G. G. Gueta, D. Malkhi, L. Alvisi, R. Kotla, and J.-P. Martin. Revisiting fast practical byzantine fault tolerance. Technical report, arXiv, Dec. 2017.
-  K. Altisen, P. Corbineau, and S. Devismes. A framework for certified self-stabilization. In FORTE, pages 36–51, 2016.
-  Y. Amoussou-Guenou, A. D. Pozzo, M. Potop-Butucaru, and S. T. Piergiovanni. Correctness and fairness of tendermint-core blockchains. Technical Report 1805.08429, arXiv, 2018.
-  Y. Amoussou-Guenou, A. D. Pozzo, M. Potop-Butucaru, and S. Tucci-Piergiovanni. Dissecting tendermint. In Proceedings of the 7th Edition of The International Conference on Networked Systems, 2019.
-  F. Armknecht, G. O. Karame, A. Mandal, F. Youssef, and E. Zenner. Ripple: Overview and outlook. In International Conference on Trust and Trustworthy Computing, pages 163–180. Springer, 2015.
-  P.-L. Aublin, R. Guerraoui, N. Knežević, V. Quéma, and M. Vukolić. The next 700 BFT protocols. ACM Trans. Comput. Syst., 32(4):12:1–12:45, Jan. 2015.
-  P. Berman and J. A. Garay. Asymptotically optimal distributed consensus (extended abstract). In ICALP, pages 80–94, 1989.
-  N. Bertrand, I. Konnov, M. Lazic, and J. Widder. Verification of randomized distributed algorithms under round-rigid adversaries. In CONCUR, 2019.
-  M. Biely, U. Schmid, and B. Weiss. Synchronous consensus under hybrid process and link failures. Theor. Comput. Sci., 412(40):5602–5630, Sept. 2011.
-  G. Bracha and S. Toueg. Asynchronous consensus and broadcast protocols. J. ACM, 32(4):824–840, Oct. 1985.
-  B. Brown. xRapid: Everything you need to know about ripple’s crypto service (now live), Jan 2019. https://blockexplorer.com/news/what-is-xrapid/.
-  E. Buchman, J. Kwon, and Z. Milosevic. The latest gossip on BFT consensus. Technical report, Tendermint, 2018.
-  V. Buterin and V. Griffith. Casper the friendly finality gadget. Technical Report 1710.09437v4, arXiv, Jan. 2019.
-  C. Cachin, K. Kursawe, and V. Shoup. Random oracles in constantipole: Practical asynchronous byzantine agreement using cryptography (extended abstract). In PODC, pages 123–132, 2000.
-  C. Cachin and M. Vukolić. Blockchains consensus protocols in the wild. arXiv preprint arXiv:1707.01873, 2017.
-  M. Castro and B. Liskov. Practical byzantine fault tolerance and proactive recovery. ACM Trans. Comput. Syst., 20(4):398–461, Nov. 2002.
-  B. Charron-Bost, H. Debrat, and S. Merz. Formal verification of consensus algorithms tolerating malicious faults. In Stabilization, Safety, and Security of Distributed Systems - 13th International Symposium, SSS 2011, Grenoble, France, October 10-12, 2011. Proceedings, pages 120–134, 2011.
-  B. Chase and E. MacBrough. Analysis of the xrp ledger consensus protocol. Technical Report 1802.07242v1, arXiv, Feb. 2018.
-  J. M. Chase. Quorum whitepaper, Aug 2018. https://github.com/jpmorganchase/quorum/blob/master/docs/Quorum%20Whitepaper%20v0.2.pdf.
-  P. Civit, V. Gramoli, and S. Gilbert. Polygraph: Accountable byzantine agreement. Technical Report 2019/587, ePrint, 2019. https://eprint.iacr.org/2019/587.pdf.
-  D. Cousineau, D. Doligez, L. Lamport, S. Merz, D. Ricketts, and H. Vanzetto. TLA + proofs. In FM, pages 147–154, 2012.
-  T. Crain, V. Gramoli, M. Larrea, and M. Raynal. DBFT: Efficient leaderless Byzantine consensus and its applications to blockchains. In NCA. IEEE, 2018.
-  T. Crain, C. Natoli, and V. Gramoli. Evaluating the Red Belly Blockchain. Technical Report 1812.11747, arXiv, 2018.
-  C. Dragoi, T. A. Henzinger, and D. Zufferey. PSync: a partially synchronous language for fault-tolerant distributed algorithms. In POPL, pages 400–415, 2016.
-  C. Dwork, N. Lynch, and L. Stockmeyer. Consensus in the presence of partial synchrony. J. ACM, 35(2):288–323, Apr. 1988.
-  Ethereum. Ethereum 2.0 (serenity) phases. https://docs.ethhub.io/ethereum-roadmap/ethereum-2.0/eth-2.0-phases/ as of 23 August 2019.
-  G. Golan-Gueta, I. Abraham, S. Grossman, D. Malkhi, B. Pinkas, M. K. Reiter, D. Seredinschi, O. Tamir, and A. Tomescu. SBFT: a scalable decentralized trust infrastructure for blockchains. Technical Report 1804.01626, arXiv, 2018.
-  V. Gramoli. On the danger of private blockchains. In Workshop on Distributed Cryptocurrencies and Consensus Ledgers, 2016.
-  R. Guerraoui, P. Kuznetsov, M. Monti, M. Pavlovič, and D.-A. Seredinschi. The consensus number of a cryptocurrency. In PODC, pages 307–316, 2019.
-  P. K. Igor Barinov, Viktor Baranov. POA network white paper, Sept. 2018. https://github.com/poanetwork/wiki/wiki/POA-Network-Whitepaper.
-  A. John, I. Konnov, U. Schmid, H. Veith, and J. Widder. Parameterized model checking of fault-tolerant distributed algorithms by abstraction. In FMCAD, pages 201–209, 2013.
-  I. Konnov, M. Lazić, H. Veith, and J. Widder. A short counter example property for safety and liveness verification of fault-tolerant distributed algorithms. In POPL, pages 719–734, 2017.
-  I. Konnov, H. Veith, and J. Widder. SMT and POR beat counter abstraction: Parameterized model checking of threshold-based distributed algorithms. In CAV, volume 9206 of LNCS, pages 85–102, 2015.
-  I. Konnov and J. Widder. ByMC: Byzantine model checker. In ISoLA, pages 327–342, 2018.
-  R. Kotla, L. Alvisi, M. Dahlin, A. Clement, and E. Wong. Zyzzyva: Speculative byzantine fault tolerance. ACM Trans. Comput. Syst., 27(4):7:1–7:39, Jan. 2010.
-  J. Kwon. Tendermint : Consensus without mining - draft v.0.6, 2014.
-  M. Lazic, I. Konnov, J. Widder, and R. Bloem. Synthesis of distributed algorithms with parameterized threshold guards. In OPODIS, pages 32:1–32:20, 2017.
-  Y.-T. Lin. Istanbul byzantine fault tolerance - eip 650. https://github.com/ethereum/EIPs/issues/650 as of 21 August 2019.
-  N. Lynch. Input/output automata: Basic, timed, hybrid, probabilistic, dynamic,… In L. D. Amadio R., editor, Proceedings of the Conference on Concurrency Theory (CONCUR), volume 2761 of Lecture Notes in Computer Science, 2003.
-  A. Miller, Y. Xia, K. Croman, E. Shi, and D. Song. The honey badger of BFT protocols. In CCS, 2016.
-  A. Mostéfaoui, H. Moumen, and M. Raynal. Signature-free asynchronous Byzantine consensus with and messages. In PODC, pages 2–9, 2014.
-  A. Mostéfaoui, H. Moumen, and M. Raynal. Signature-free asynchronous binary Byzantine consensus with messages and expected time. J. ACM, 2015.
-  S. Nakamoto. Bitcoin: a peer-to-peer electronic cash system, 2008.
-  C. Newcombe. Why amazon chose TLA +. In ABZ, pages 25–39, 2014.
-  M. C. Pease, R. E. Shostak, and L. Lamport. Reaching agreement in the presence of faults. J. ACM, 27(2):228–234, 1980.
-  R. Saltini. Correctness analysis of IBFT. Technical Report 1901.07160v1, arXiv, Jan. 2019.
-  D. Schwartz, N. Youngs, and A. Britto. The ripple protocol consensus algorithm. Ripple Labs Inc. White Paper, 5, 2014.
-  I. Stoilkovska, I. Konnov, J. Widder, and F. Zuleger. Verifying safety of synchronous fault-tolerant algorithms by bounded model checking. In TACAS, pages 357–374, 2019.
-  P. Sutra. On the correctness of egalitarian paxos. Technical Report 1906.10917, arXiv, Jun. 2019.
-  S. Thomas and E. Schwartz. A protocol for interledger payments. Available at https://interledger.org/interledger.pdf, 2015.
-  Y. Yu, P. Manolios, and L. Lamport. Model checking TLA specifications. In CHARME, pages 54–66, 1999.
-  V. Zamfir, N. Rush, A. Asgaonkar, and G. Piliouras. Introducing the “minimal cbc casper family of consensus protocols. Technical report, 2018. https://github.com/cbc-casper/cbc-casper-paper/blob/master/cbc-casper-paper-draft.pdf as of 21 August 2019.
Appendix A Certifying a blockchain Byzantine consensus algorithm
The Democratic Byzantine Fault Tolerant consensus algorithm  is a Byzantine consensus algorithm that does not require a leader. It was implemented in Red Belly Blockchain  to offer high performance through multiple proposers and was used in Polygraph  to detect malicious participants responsible of disagreements when . As depicted in Algorithm 2, its binary consensus proceeds in asynchronous rounds that correspond to the iterations of a loop where correct processes refine their estimate value.
Initially, each correct process sets its estimate to its input value. Correct processes bv-broadcast these estimates and gather only values proposed by correct processes, called contestants. They broadcast their contestants and extract favorites upon delivery. If the favorites is a singleton containing the parity of the round, then the correct process decides this value. If favorites contains both values, then the estimate becomes the parity of the round. Otherwise the estimate is set to the only value in favorites and a new round starts. As opposed to the original and partially synchronous deterministic version , this variant offers termination in an asynchronous network under round-rigidity that requires the adversary to eventually perform any applicable transition within an infinite execution. This assumption was previously used to show termination with high probability . Below we show the specification of the algorithm in threshold automata.
Experimental results. The Byzantine consensus algorithm has far more states and variables than the BV-broadcast primitive and it is too complex to be verified on a personnal computer. We ran the parallelized version of ByMC with MPI on a 4 AMD Opteron 6276 16-core CPU with 64 cores at 2300 MHz with 64 GB of memory. The verification times for the 5 properties are listed in Figure 4 and sum up to 1046 seconds or 17 minutes and 26 seconds.